On-demand 1280 CPU cluster using EC2

We are using Amazon EC2 to quickly deploy clusters and run searches. We’ve been stress testing this with CloVR over the past few months to make sure our platform scales out as expected. We’ve been very pleased with the results. In one test, we launched a cluster with 160 c1.xlarge instances to run a BLASTX search. This gave us 1280 CPUs for processing.

Here is a screenshot from Ganglia during the scale out

We stopped the pipeline early to save credits after gathering some stats. Scale down looked good too leaving just a master instance up at the end

We used rsync over HPN-SSH to transfer the NCBI nr database out to each instance. We’ve set this up to run peer-to-peer so that any instances can send a copy of the database once it is ready. Using this, we saw network throughputs top 1GB/sec on our cluster.

The graph shows the throughput step up as we additional instances came online. A single c1.xlarge instance has been giving us <30MB/sec.

One other interesting observation during our tests is that a single request can move the spot market price, at least for our tests on m1.xlarge in us-east coast. We’ve been using the Amazon spot market instances for testing since they are usually ~1/3 the price of on-demand instances (~$0.22-$0.25 versus on demand price of $0.68). During one of our tests in July, we were monitoring the market closely and submitted a single request for 150 m1.xlarge instances. We saw the price spike to $0.68 immediately after our request was submitted. Two days later we repeated the same experiment and got the same outcome.

The price dropped back to $0.23 after our run

This entry was posted in Blog. Bookmark the permalink.