I had to miss the second day of ICSOC, but was back for the morning of the third, and another great keynote, on Web-Scale Computing, from Peter Vosshall – a VP and Distinguished Engineer at Amazon. Amazon needs a highly reliable and scalable infrastructure internally to run its retail business, but has also been selling web services infrastructure to third parties. Peter spoke about EC2 (compute), SQS (messaging), S3 (storage of blobs with metadata), SimpleDB (storage of lightly-structured data with indexed queries), and EBS (storage for EC2 when you need a traditional filesystem or database).
As an example of how companies are using and benefitting from these services, he talked about a company called Animoto. On their website you can upload a song and some photos, and they automatically build a video montage, matching transitions to beats. They started with around 5000 customers in total, but after they built a facebook app and got some viral awareness, they shot up to 5000 to 10000 users per hour. They had deployed on EC2 and ramped up to 3500 – 5000 instances. It looked like a neat story.
The business benefits of using the web services are having a capability for fast incremental infrastructure growth, and turning what would have been fixed capital expenses into variable operating expenses. (Coincidentally I had also mentioned this latter benefit of web services in a podcasted interview I was in on Monday.)
As well as supplying web services, Amazon’s using them internally too. Peter briefly reviewed how Amazon started as what looked like a 2-tier+web client-server web-application, but then refactored that incrementally (and painfully over 2002-2003) into a collection of services. They’ve seen reliability benefits – he said they can lose an entire data centre with no impact on the customer experience. They’ve also had product management benefits – each service maintains its own data and operating responsibility, which lets them each evolve at their own pace. Amazon’s key NFPs are security, incremental scalability, availability (systems fail not by stopping, and failures aren’t independent), performance (not just mean performance but also performance in outlying cases), and cost-effectiveness.
Intriguingly, despite the claimed product management benefits, he said that 70% of development time was spent on “undifferentiated heavy lifting” delivering updated services – dealing with non-functional, administrative service management issues. So only 30% of their effort is spent improving the customer experience. I think their delivered experience could certainly use some extra work, especially for their non-US customers!