CloVR-Metagenomics v1.0 Walkthrough (CloVR dashboard)

Getting Started

The CloVR-Metagenomics pipeline provides a robust comparative metagenomics workflow, complete with cluster auto-scaling and parallelization.

Although use of the Cloud for CloVR-Metagenomics as for all other CloVR pipelines is entirely optional, it is recommended for this pipeline, as local executions can be very time-consuming. Especially the BLAST search steps of the CloVR-Metagenomics pipeline are computationally extensive and benefit from parallelization across multiple processors on the Cloud.

If you want to use the Cloud to run CloVR-Metagenomcis, you must obtain credentials from your Cloud provider and CloVR must be configured to use these credentials. If you want to use the Amazon Elastic Compute Cloud (EC2), be sure to have configured your Amazon EC2 credentials. Usage on Amazon EC2 is charged per hour and care must be taken to terminate instances after a protocol has completed. See vp-terminate-cluster command below.

Inputs

Multiple fasta files (1 file per sample) & a CloVR-formatted mapping file

Download a Dataset

FILE: CloVR-Metagenomics mini example dataset

Pipeline Execution

1. Starting the CloVR web interface

This step is identical on all platforms, please choose instructions for the corresponding platform you are running CloVR on:

VMware
VirtualBox
Amazon EC2
DIAG

2. Adding datasets to the CloVR VM

Before starting a pipeline, you must add your datasets to the CloVR VM as “Tags”. The easiest way to add data to the VM is by copying them into the “user_data” folder, which is a shared folder between the VM and your local computer. Check Troubleshooting CloVR on VirtualBox, if you have problems accessing files in the shared folders.

To add files, click “Add” on the web interface.

If your files are in the user_data directory, click on “Select file from image”, which will open a sub-window where you can selecte one or multiple FASTA files for upload into the VM. Alternatively, you can use “Browse” in the “Upload File” window to find and select files from anywhere on your local computer but multiple files have to be uploaded in separate steps.


Select “Nucleotide FASTA” from the “File Type” drop-down menu and name your dataset, e.g. as “metagenomics_fasta”. Add an optional description of your dataset. Click “Tag” to upload the data to CloVR. “Completed Successfully” window should appear to indicate that your datasets was added to the CloVR VM and the new dataset should be listed under “Data Sets” on the web interface.

Next repeat the same process for the corresponding metadata mapping file. This time select “Metagenomics mapping file” from the “File Type” drop-down menu and name your dataset, e.g. as “metagenomics_mapping”. Click “Tag” again to upload the data to CloVR.


The tagged data sets will appear as a “Tag” on the CloVR web interface. Multiple files will listed under the same “Tag” name.

3. Configuring and starting the pipeline

To initialize a new pipeline run, select the “Tag” corresponding to your FASTA files in the “Data Sets” window and click on the “CloVR Metagenomics” icon.

This will open the pipeline configuration window. Make sure the correct “Tag” is shown as the “Select Sequencing Dataset” and select the “Tag” corresponding to the correct metadata mapping file as the “CloVR Mapping File”. Choose a protocol with or without ORF calling. By default we do not call ORFs.

If cloud credentials have been added to the CloVR VM, they can be selected in the “Account” drop-down menu. Alternatively, “local” can be selected to perform CloVR-Metagenomics supported analysis on the local computer. Provide a name to recognize your pipeline in the web interface “Home” page as “Pipeline Description”, e.g. “Test_Metagenomics1”.

Check your input by clicking “Validate”. If the validation is successful, start the pipeline by clicking “Submit”. After successful pipeline submission, the web interface will change to the “Home” page where the new pipeline will be listed as “Status: running.”

4. Monitoring your pipeline

The pipeline status, i.e. the number of steps completed, is shown for each running pipeline. Further information can be accessed clicking on the pipeline name, which opens the “Pipeline Information” window.

 

Clicking on the [Pipeline #] headers in the “Pipeline Information” window will open the Ergatis “Workflow creation and monitoring interface” in a separate browser window, which provides useful information for troubleshooting of failed pipeline runs.

Downloading Pipeline Output

Once the CloVR-Metagenomics run has completed, multiple results files are created in the “output” directory. If CloVR-Metagenomics is run on the cloud, all results files will be downloaded into the same folder. The path to this folder should look like this:

/clovr-standard-2011-08-25-05-13-27/shared/output/

All results files are created as compressed archives (.tar.gz), which can be extracted using the Finder in Mac OS X, the Tar utility in Unix or programs, such as WinZip or WinRAR, in Windows.

EXAMPLE OUTPUT TARBALL: CloVR-Metagenomics output

The CloVR-Metagenomics pipeline outputs several different files for the user:

Output Description
read_mapping A text file displaying the one-to-one mapping of sequence names created in the pipeline.
uclust_clusters Raw text output from uclust runs.
artificial_replicates A list of read names wthat were found to be artificial replicates from the 454 platform.
blast_functional Raw output of blast hits of representative sequences to a functional DB.
tables_functional Summary tables of functional categories for each sample.
piecharts_functional Visualized piecharts for functional groups.
skiff_functional Output of skiff clusterings for different functional levels.
metastats_functional Output of Metastats analysis comparing subject groups or samples at different functional levels.
histograms_functional Visualized stacked histograms of functional annotations.
blast_taxonomy Raw output of blast hits of representative sequences to a taxonomic DB.
tables_taxonomy Summary tables of taxonomy groups for each sample.
piecharts_taxonomy Visualized piecharts for taxonomic groups.
skiff_taxonomy Output of skiff clusterings for different taxonomic levels.
metastats_taxonomy Output of Metastats analysis comparing subject groups or samples at different taxonomic levels.
histograms_taxonomy Visualized stacked histograms of taxonomic annotations.