3 Aug
Database Shards and CakePHP
As I’ve mentioned in this blog before, I spent 4 years working for an adult dating web site. One of the biggest problems we ran into was a bottleneck involving database replication lag. In a normal replication setup you have your application do writes to a master database and then those changes get replicated to the slaves. That’s okay…until you start dealing with huge amounts of updates.
So, the quick solution was to make sure that we had fast enough hardware on the slaves to handle the huge volume of updates. I remember partition lag in the order of 30 minutes on some of the machines (that’s right, THIRTY MINUTES). But the bigger problem came down to trying to find a way to minimize replication problems. One of the things discussed (and I have no idea if it ever got implemented after I left) was partitioning the data into groups that made sense. Some of the suggestions were to partition based on geographical location of the users, or simply do grouping based on the record ID.
The other day I came across a great blog posting talking about database shards and instantly realized that they were talking about exactly what we were facing. The goal is to spread the data around, denormalize things so that you have all your necessary data in one spot and to try and minimize replication issues. So you’d be reading and writing to a shard depending on whatever criteria you are using.
Now, I know that you could put code into the beforeSave() method on a model to figure out what shard you would be writing to, but how to figure out what shard you would be *reading* from is something I am still mulling over. Suggestions from #cakephp-dev (where I hang out during the work day) seem to point towards using a behavior (go to the Bakery and search for “behavior” to see lots of examples) to make this work. I’ll fool around with some code to see if I can come up with something that works.

Posted by Tarique Sani on 03.08.07 at 2:09 pm
The problem which has me stumped in sharding with cakePHP is how and where to store the associated data of a shraded table. Most often an application has a main model eg: Hotels (or in your case the users) and we can shard that.
I was considering storing the associated on the same shard server…. hasOne, hasMany and to an extent belongsTo can work fine but how does one handle hasAndBelongsToMany?
Posted by pcdinh on 03.08.07 at 2:09 pm
Just be curious on how to implementing sharding in PHP. A Java-based one has come up here: http://www.hivedb.org/
Posted by Chris Hartjes on 03.08.07 at 2:09 pm
@tarique: Unfortunately, to make shards work properly you have to denormalize your data, which means doing stuff with associations is very difficult.
@pcdinh: There is no trick to implementing shards in PHP: I can think of a few ways to do it right off the top of my head, but I was trying to think of where to put the code in the context of CakePHP and it looks like a behavior for reads and in beforeSave() for a model is the way I would go (without having done further research myself)
Posted by wluigi on 03.08.07 at 2:09 pm
shouldn’t it be stored in the model, like useTable, useDBConfig ?
Posted by Chris Hartjes on 03.08.07 at 2:09 pm
@wluigi: some of it could be there, but tell me how you would handle this situation: 4 different servers that you can read or write to depending on what server a particular record is located on?