CloVR Architecture

CloVR is a virtual machine that runs on an individual desktop or laptop or as a network attached virtual appliance. Analysis pipelines can run locally or on remote compute clouds such as Amazon EC2. As a virtual appliance, one instance of CloVR can be launched on a network to serve a group or laboratory. Users and developers can interact with CloVR through the web interface or web services API.

The interaction with the Cloud is designed to be automated and seamless so users do not have to deal with the many intricacies of managing machine instances and data on the Cloud. The CloVR VM acts as a broker for communication with the Cloud, handling all data transfer, scaling, and clustering of multiple Cloud instances.

Architecture highlights:

  • VM provided in VMware, VirtualBox, Xen, EC2 AMI, and Eucalyptus VMI formats. Pipelines can be run locally or on remote compute clouds
  • Scalable data storage using local data staging or HDFS. CloVR does not use a shared filesystem or NFS
  • Fast data transfer protocols including HPN-SSH and GridFTP for transfer to/from Cloud
  • Support for on-demand, resizable Sun Grid Engine clusters. Data staging and local storage are used instead of NSF to avoid scalability and performance problems associated with shared filesystems.
  • Support for on-demand, resizable Hadoop clusters and the HDFS file system
  • Ganglia grid monitoring
  • Web service API for starting and resizing clusters and querying status
  • Ergatis workflow system for pipeline automation
  • Hundreds of software packages from the Ubuntu universe and Biolinux package repositories

For more information see,

Angiuoli SV, Matalka M, Gussman G, Galens K, Vangala M, Riley DR, Arze C, White JR, White O and Fricke WF. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics. 2011 Aug;12:356.