We were discussing some of the hardware behind google.com today, and so I found these statistics.
- Over four billion Web pages, each an average of 10KB, all fully indexed.
- Up to 2,000 PCs in a cluster.
- Over 30 clusters.
- One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
- Sustained transfer rates of 2Gbps in a cluster.
- An expectation that two machines will fail every day in each of the larger clusters.
- No complete system failure since February 2000.
I also remembered reading this PDF article on the distributed Google File System.
It would be interesting to compare similar figures for amazon.com, microsoft.com, etc. Also found some info here on bbc.co.uk from Sun.
As much as I think Microsoft executes really well, I do like this side of google where Ben Rathbone painted one of their data centers.
Welcome to this blog, we are two London based .NET developers who spend most of our time writing software for financial institutions. During our long tenure in development we have done things like create business language compilers, create secure internal solutions for instant messaging and generally try to solve problems in as fast and pragmatic a fashion as possible (whilst of course making the front end look nice).
Jon and I decided to create a blog because we wanted to be able to talk about our ideas in a place where the community could read and comment on them, and anyway, everyone else has got a blog!
We want the blog to communicate our ideas and hopes for future software development and to present our thoughts on computing in general, as well as provide a place to hang the urls for sites we've seen and liked.
Quite grand aims, but if we improve the blog one step at a time then this will be a great success.