Thoughts on the Amazon outage

Disaster Recovery needs to be a primary objective when planing and implementing any IT project, outsourced or not. The ‘Cloud’ isn’t magic, the ‘Cloud’ isn’t fail-proof, the ‘Cloud’ requires hardware, software, networking, security, support and execution – just like anything else.

All the fancy marketing speak, recommendations and free trials, can’t replace the need to do obsessive due diligence before trusting any provider no matter how big and awesome they may seem or what their marketing department promise.

Why do Data Centers have UPS and Diesel Generators on-site? They know electricity can and does fail.

Why do we buy servers will dual power supplies? We know they can and do fail.

Why do we implement RAID? We know hard drives can and do fail.

Prepare for the worst, period.

Putting all of your eggs in one cloud, so to speak, no matter how much redundancy they say they have seems to be short-sighted in my opinion. If you are utilizing an MSP, HSP, CSP, IAAS, SAAS, PAAS, et all to attract/increase/fulfill a large percentage of your revenue or all of your revenue like many companies are doing nowadays then you need to assume that all vendors will eventually have an issue like this that affects your overall uptime, brand and churn rate. A blip here and there is tolerable.

Amazon’s downtime is stratospherically high, and their prices are spectacularly inflated. Their ping times are terrible and they offer little that anyone else doesn’t offer. Anyone holding them up as a good solution without an explanation has no idea what they’re talking about.

The same hosting platform, as always, is preferred: dedicated boxes at geographically disparate and redundant locations, managed by different companies. That way when host 1 shits the bed, hosts 2 and 3 keep churning.

Nobody who has even a rudimentary best-practice hosting setup has been affected by the Amazon outage in any way other than a speed hit as their resources shift to a secondary center.

Stop following the new-media goons around. They don’t know what they’re doing. There’s a reason they’re down twice a month and making excuses.

Personally, I do not use a server for “mission critical” applications that I cannot physically kick. Failing that, a knowledgeable SysAdmin that I can kick.