I couldn’t sleep last night. I’m worried we’ll lose this client.
So just to be clear. I wasn’t part of the crew responsible for scaling this site. I had already set up a scalable architecture for the site, that would automatically and horizontally scale at Amazon. That idea got shot down for legal reasons that to my surprise haven’t been in play for awhile. Can we say, “Office politics?”
I totally recommend Amazon’s Autoscaling to anybody that’s new to this.
Instead of auto-scaling, the site was architected by a local San Francisco firm who I won’t mention here.
Let’s just hope enough people read this so that they won’t even have to know the name of the company and will just know the smell of an un-scaleable architecture.
Scalability requirement: 100,000 concurrent users
This is how they set it up:
- two web servers
- one database
- four video transcoders that hits the master database
- one more app server that hits the master database
- no slave db 😀
If they had even googled ‘building scalable websites’ they would have come across a book that would have avoided all of this, Cal Henderson’s Building Scalable Websites. It should be mandatory reading for anybody working on a large website, and it just scratches the surface.
So, how did we get to 600 concurrent users?
We tweaked mysql by putting this in /etc/m.cnf:
max_connections=10000
query_cache_size=50000000
thread_cache_size=16
thread_concurrency=16 # only works on Solaris and is ignored on other OSes
We ran siege and were able to get to about 300 concurrent users without breaking a sweat, but now apache was dying.
So we tweaked apache. We started out with this:
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
And ended up with this:
MinSpareServers 50
MaxSpareServers 200
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
RAM and CPU were doubled.