Cloud Pricing

It's been a long time since I posted here last. I'll admit it, I f'd up the server just before I flew out to SC12 so I lost most of my posts and generally screwed the pooch. Oh well... that's what I get for trying to switch three systems from inittab to systemd for no reason in four hours before my flight. Anyway, I'm back. Or at least whenever you see this I will be. But my travails with managing my own system beg an interesting question: why manage my own hardware?

In my Operating Systems class, my professor took some time on the last day of class for which he had material scheduled to talk about doing a cost analysis for a buisness and how to figure out what machinery was required for provisioning some new (presumably web based) buisness's datacenter and he lead us to the conclusion that the best way to do such scaling is to define some parameters for the system on which you are working, and use those to build a model of the system in its entirity.

Having derived such a software model for his fictional web service, he continued to point out to us that for most applications, traffic is non-constant. Consequently, a decision has to be made as to how to provision the buisness: Do you provision servers for the worst case load and overspend on hardware (as he put it go out of buisness), do you underspend and go out of buisness because your clients and customers hate latency or do you try to strike a balance where you only anger some fraction of your customers by provisioning for say 85% of peak load. Any way you turn, you loose under this model.

Then a ray from heaven as it were: Enter the infamous "Cloud" and dynamic provisioning. Rather than buy real metal and pay the price of keeping it online, the modern world offers me the option of renting virtual machine instances (or even part of a VM) from "hosting as a service" companies such as Heroku and Amazon. These companies offer dynamic scaling of resources so rather than have my site die in flames should I ever make it to the front page of Slashdot or Hacker News my hosting service could automagically create more instances of my server and load-ballance keeping my service online and sane in the face of the onslaught.

Now this argument for cloud computing makes good sense. In a buisness such as professinal sports which may see huge swings in traffic depending on the time of year, or if you are a e-sales company and Black Friday hits, it makes a lot of sense to be able to engage rented, temporary resources to help cope with the peek demand rather than planning for the peek. However, I am faced with the real issue of getting my resume and blog back on line both so that I resume writing and so that I can continue to build "my brand" as it were.

Unfortunately, for now I am a broke college student, so I need to do the cost-benifit analysis for myself of how to get my blog online, and keep costs to a reasonable minimum both now that I'm on Dorm power and Dorm internet, and ultimately when I move into my own place and have to pay my own power and internet bills directly. Cloud hosting would seem to be the answer... right?

Okay. So this blog is built in Clojure and MongoDB for better or worse. I have sworn off any really major rewrites (features don't count as rewrites) for the forseeable future, so I know my stack and associated needs. Heroku offers me a free web "dyno" (VM instance) on which I can run my code, but requires that I do my databasing on some other VM. Since my heart is set on Mongo, I can get a small MongoHQ instance for $15/mo for an annual pricetag of ~$180 per year, if I don't blow out a 2GB database size which (for a blog with no comments) seems possible if I cut down on my traffic logging. This is nice because lets be honest, this blog is pretty small. Judging by my LinkedIn page (which web crawlers exempted is probably a good indicator for this blog's traffic) I get about oh ten actual visits from other people a week. So here, I'm really just paying for the storage I'm using.

Amazon EC2 will sell me a small Linux instance on which I can add whatever software I bloody well please for $0.065 per hour... giving me an annual worst case of ~$569.79 for full load 24/7/365. This is an unrealistic load... based on numbers from my server before I broke it I suspect that my server would have about a 12% load, so a real predicted price tag of ~$67 for which I get a lot more flexibility in terms of software I use, and I get 160GB all to myself so I can run Mongo and not have to worry about paying through the nose for a measly 2GB. Something else that's nice is that I get a fully fledged server, so if I want to run an email server or other service alongside my webserver doing so is really easy.

However, I do have to abide by the Amazon TOS and AUP, which means that whatever I put up has to be well-behaved. No botnets, no illicit FTP sites and the list goes on and on. Now while I have no need for such facilities, I have need for such facilities. Back at the IEE COMSOC wargame I attended, I was impressed that the entire victim LAN was built from hardware donated for the duration by COMSOC members. In the next such wargame, my cloud server would be out both because it would be a TOS violation to involve it, and because nobody would want to touch a box I didn't entirely own even as part of a wargame.

So what would it cost me to run real metal, since there seems to be a pretty compelling argument based on my personal usage for real metal. I'll examine two cases: the infamous Raspberry Pi server, and a more realistic server which could be of great value to me. The Raspberry Pi is an ARM based PC on a chip, which costs about $35-$50 depending on where you buy it. It has no memory, an SD card is required ($10-$35), and sucks about 2W of power, at idle, lets guess a peak of 5W. Austin area power companies charge about $0.134/KwH, so 24/7/365.25 driving an RPi is going to set me back about $0.10 for the whole year. Startup cost of about $100 gives me a first year cost of $100.1, so after just over two years, the RPi will be a better investment than the EC2 server, and will be cheaper than Heroku for all time.

Now, I'll be fair. the RPi is next to useless, and there is a reason that it costs so little to buy and operate. What if I were to invest in some "real" iron? Turns out that Rackable Systems Inc. was purchased by SGI back in '09 and people (companis really) are surplussing their old Rackable gear at significant discounts (%60 or so on ebay), so a decent server with 16GB of RAM and a Intel Xeon DC (Sept. 2007 release) goes for about $200 with no drives. If I had the option I would probably use such a machine for a secure backup host, so add a pair of 1TB drives at about $120 each bringing my "big iron" to a pricetag of $440 up front, I estimate about ~12W for disks, 80W for CPU and ~20W for chassis I have an annual power consumption of ~112W, for a pricetag of $440 startup and $6.46 for annual power. After three years, having my own metal is cheaper than Heroku (and far more useful, because I don't have to deal with multiple VMs or services), and after eight years is cheaper than EC2.

So what's the end of the day here? I suppose it depends on what you need.

Judging by their pricing scheme, Heroku is clearly targetet towards fast-growing startups (or startups who think they'll grow fast) who want really crazy easy scaling. Getting started is easy, but the long-term price is steep. (in fairness, this is examining only low-end plans, bigger plans presumably offer better ecconomy). Amazon EC2 is a solid option, and its pricing is only slightly steeper than that of real metal once you go out several years. Again, this is examining only the consumer grade products that I would consider for this blog or other projects. The RPi... is a joke. You couldn't really do any serious heavy lifting with it, but I threw it in as a "devil's advocate" to see just how low I could go with a home server both in startup cost and ongoing cost.

Despite my professor's claims, I think that the math speaks for itself here. While it may be obsolete by anyone's standards, the Rackable server is highly cost effective for a low-load home and personal setup such as that I wish to build once you game it out several years. Note that I say serveral years. According to the IRS, a computer is for tax purposes considered fully depreciated as a hardware invenstment after five years, so the fact that I have to play out this math game eight years for the year-end of running my own server to surpass that of purchasing an EC2 instance is more than interesting. It's important. See, I'm a hardware nerd. The oldest machine which I own (and I'm new so this isn't yet representative but I'll be fair here) is now three years old. It's a laptop I got back in High School which at the time I swore up and down would carry me through college. Now, as a college sophomore, I find that I'm wiriting this on a shiny new Samsung Series 9 Ultrabook class machine since the stresses of being carried around daily made my old lapop almost unusable. I expect that after three years I will also want to rebuild my gaming computer... anyway point is that at some point I will probably update my hardware weather out of neccecity or vanity and when I do, I will re-incur the "startup" cost of the home server, which will entirely nullify the end-of-year price advantage of the self-managed hardware against cloud hardware.

It takes Three years for the home server to prove a worthwhile investment against the Heroku machine, and that's the point at which I expect to replace the home machine. The home machine didn't even beat the EC2 VM until eigth years out... I think you can see where I'm going with this. By the time I've replaced my hardware, the hosting companies will have replaced theirs too and because they amortize that startup cost over multiple users and I don't, ultimately the math will out. Cloud computing is in fact cheaper than running your own metal.

Despite that fact, I will probably choose to run my own metal just because I want to get the added utility of more disk space and the ability to do things which a hosting company may not like. However this mathematical analysis is compelling, and I owe Prof. Mootaz for reminding me that modeling things is actually very important.

Edit: See linked spreadsheet for exact model, but it looks like once I use exact calculations (well as exact as this back-of-the-napkin math gets) the home server option played out ten years looses to all the cloud computing solutions when hardware replacement is considered, even when one considers the increased price of an "XL" EC2 instance with 1.68TB of "attached" storage as being equivalent to the 2TB of disk home server. I suppose the next analysis I should do since disk capacity seems to be my primary issue is to compare the price of maintaining a large personal archive vs storing such data on Amazon's Glacier but it's late and that's material enough for its own entire article :D