9 Apr
The Rise of the App Cloud
These are heady times for any web developer. Two of the biggest companies on the web are now offering access to what I call their “app cloud” so you can run your web app in a hugely scalable environment. Amazon was first out of the gate with their great combo of S3, EC2 and SimpleDB. Of course, you didn’t expect Google to stand idly by and they have delivered as well with Google App Engine. So, what do they have in common?
You’re limited in what you are allowed to do. It’s really that simple, because in order for you to gain the advantage of being able to have scalability out of the box you are going to have to follow a bunch of rules. Sandboxes are good, but the problem occurs when you fight against the sandbox rather than embrace it’s constraints. Sort of the same thing I see on the CakePHP mailing list where people are determined to do what they want, and who gives a flying fuck what the conventions are. But in this case, you cannot avoid it.
Having looked into the idea of running an app using Amazon’s EC2 + SimpleDB structure for Rallyhat, it looks like it’s pretty simple to get up and running with those tools. I know I’m grossly simplifying the process, but you create your app, create an image, upload it into the cloud and away you go! With Google App Engine, all you can do is create Django-based apps…but they’ve severely limited what you can do as they require you to use a GAE-specific wrapper to talk to things. That means a lot of the cool Python libraries that are out there WILL NOT BE ABLE TO HELP YOU. Maybe that’s just for now, but that’s a pretty serious limitation.
Secondly, both SimpleDB and BigTable have the same limitation in that you cannot do joins, so that means you have zero hope in hell of using any sort of ORM wrapper on your data. It’s all about the denormalized data. Go look it up if you don’t believe me. Now, this can be a problem if you have a very complex database structure that requires a lot relationships between your tables…like most of the apps that use modern web application frameworks. Maybe my thoughts on using a very simple REST-based framework in PHP isn’t such a crazy one for this.
Rallyhat is using relationships between tables because that makes it easier for me. Games belong to Teams, Locations belong to Games, Teams belong to Sports. Also, I’ve been using CakePHP long enough that all my apps are designed around the idea of data having relationships. If I had to denormalize that data by putting it all in one big table essentially, I lose the use of ORM / data mapping. Django lets me do the same sort of relationships as Cake, so why the hell wouldn’t I use it? It’s about speed of development, after all. Quicker to a usable state is really the key here, as you’re bound to spend all sorts of time in maintenance-and-bug-fix mode no matter what.
So this leads me to what I consider an inevitable conclusion: using an app cloud is only for certain types of applications. What a huge leap in logic, no? All sarcasm aside, this is the same sort of logic that dictates that most web application frameworks are only good for certain types of sites. Once you step up to a certain level of complexity, you have two choices: work really, really hard to simplify things as possible OR understand that you are in the land of Custom Solutions and hope that you are smart enough to come up with one that works. What’s that old adage: simple systems can display complex behaviour? Now that there is a solution for certain types of scalability issues, the trick is to figure out how to build an app to leverage that.
The biggest problem I see with putting your app up in the cloud is the issue of data sharing. Scalability usually requires as little sharing between parts of your application as possible. If you have a central data store, that will forever be your application choke point. Whether it’s clustering or sharding, it is a non-trivial task to build scalable data stores. I haven’t looked at SimpleDB and BigTable enough to form an opinion on whether or not these are viable contenders for this type of solution. Hopefully it is.
It’s ironic to me that while we have a definite rise in the use of frameworks for web application programming, some of the most powerful tools available to build web applications that can handle the complexity of scaling well require you to throw all that stuff away and get back to really simple things. Perhaps this is a good thing, as all these magic methods in frameworks give the developer simplicity, but you become dependent on those magic methods.
Heady days indeed. Time to look at whether or not Rallyhat could run in those environments…just because I want to see if it’s possible. Learning Python is one thing. Learning Python + Django + SimpleDB, or Google App Engine (if I could ever get an invitation) is another. Seems to me I could whip something together in PHP that could run on EC2 with SimpleDB, but is that really much of a challenge?
<
Article Tags >> Amazon AWS || Google App Engine
Posted by derek martin on 09.04.08 at 9:36 am
You *could* use an ORM for working with objects IF they supported triggers. Then you could have triggers update the big flat tables, if you wanted. Think of those as cache tables.
I suppose you could also roll your own object-based ORM that updates the relevant columns of the big flat tables.
Or maybe you could just pass several objects into the DAO for the flat table, and have the FlatTable populate itself from them.
Exciting times, anyway.
Posted by Bob Tabor on 09.04.08 at 9:36 am
Nice post. I would say in regards to no table joins, I don’t think that is a complete deal breaker. You can still have two tables “related” insomuch that one has something that represents a primary key and the other table has something to represent the foreign key. Then, you just have to do two SQL statements, which they claim is lightening fast. So, the relationships are not enforced and there is no join … but you can “relate” the two tables in code. I haven’t tried this, sadly, because I got in after the 10,000 user limit. If anyone wants to sell their account, please contact me.
Posted by Chris Hartjes on 09.04.08 at 9:36 am
@Bob
Yeah, I see what you’re getting at. It’s cool that you could do that sort of thing manually (via the two SQL statements) but I wonder how easily you could do that in some sort of ORM/associative-data-mapping scheme.