Getting Started
The CloVR-Metagenomics pipeline provides a robust comparative metagenomics workflow, complete with cluster auto-scaling and parallelization.
Although use of the Cloud for CloVR-Metagenomics as for all other CloVR pipelines is entirely optional, it is recommended for this pipeline, as local executions can be very time-consuming. Especially the BLAST search steps of the CloVR-Metagenomics pipeline are computationally extensive and benefit from parallelization across multiple processors on the Cloud.
If you want to use the Cloud to run CloVR-Metagenomcis, you must obtain credentials from your Cloud provider and CloVR must be configured to use these credentials. If you want to use the Amazon Elastic Compute Cloud (EC2), be sure to have configured your Amazon EC2 credentials. Usage on Amazon EC2 is charged per hour and care must be taken to terminate instances after a protocol has completed. See vp-terminate-cluster command below.
Inputs
Multiple fasta files (1 file per sample) & a CloVR-formatted mapping file
Download a Dataset
FILE: CloVR-Metagenomics mini example dataset
Pipeline Execution
1. Starting the CloVR web interface
This step is identical on all platforms, please choose instructions for the corresponding platform you are running CloVR on:
VMware
VirtualBox
Amazon EC2
DIAG
2. Adding datasets to the CloVR VM
Before starting a pipeline, you must add your datasets to the CloVR VM as “Tags”. The easiest way to add data to the VM is by copying them into the “user_data” folder, which is a shared folder between the VM and your local computer. Check Troubleshooting CloVR on VirtualBox, if you have problems accessing files in the shared folders.
To add files, click “Add” on the web interface.
If your files are in the user_data directory, click on “Select file from image”, which will open a sub-window where you can selecte one or multiple FASTA files for upload into the VM. Alternatively, you can use “Browse” in the “Upload File” window to find and select files from anywhere on your local computer but multiple files have to be uploaded in separate steps.
Select “Nucleotide FASTA” from the “File Type” drop-down menu and name your dataset, e.g. as “metagenomics_fasta”. Add an optional description of your dataset. Click “Tag” to upload the data to CloVR. “Completed Successfully” window should appear to indicate that your datasets was added to the CloVR VM and the new dataset should be listed under “Data Sets” on the web interface.
Next repeat the same process for the corresponding metadata mapping file. This time select “Metagenomics mapping file” from the “File Type” drop-down menu and name your dataset, e.g. as “metagenomics_mapping”. Click “Tag” again to upload the data to CloVR.
The tagged data sets will appear as a “Tag” on the CloVR web interface. Multiple files will listed under the same “Tag” name.
3. Configuring and starting the pipeline
To initialize a new pipeline run, select the “Tag” corresponding to your FASTA files in the “Data Sets” window and click on the “CloVR Metagenomics” icon.
This will open the pipeline configuration window. Make sure the correct “Tag” is shown as the “Select Sequencing Dataset” and select the “Tag” corresponding to the correct metadata mapping file as the “CloVR Mapping File”. Choose a protocol with or without ORF calling. By default we do not call ORFs.
If cloud credentials have been added to the CloVR VM, they can be selected in the “Account” drop-down menu. Alternatively, “local” can be selected to perform CloVR-Metagenomics supported analysis on the local computer. Provide a name to recognize your pipeline in the web interface “Home” page as “Pipeline Description”, e.g. “Test_Metagenomics1”.
Check your input by clicking “Validate”. If the validation is successful, start the pipeline by clicking “Submit”. After successful pipeline submission, the web interface will change to the “Home” page where the new pipeline will be listed as “Status: running.”
4. Monitoring your pipeline
The pipeline status, i.e. the number of steps completed, is shown for each running pipeline. Further information can be accessed clicking on the pipeline name, which opens the “Pipeline Information” window.
Clicking on the [Pipeline #] headers in the “Pipeline Information” window will open the Ergatis “Workflow creation and monitoring interface” in a separate browser window, which provides useful information for troubleshooting of failed pipeline runs.
Downloading Pipeline Output
Once the CloVR-Metagenomics run has completed, multiple results files are created in the “output” directory. If CloVR-Metagenomics is run on the cloud, all results files will be downloaded into the same folder. The path to this folder should look like this:
/clovr-standard-2011-08-25-05-13-27/shared/output/
All results files are created as compressed archives (.tar.gz), which can be extracted using the Finder in Mac OS X, the Tar utility in Unix or programs, such as WinZip or WinRAR, in Windows.
EXAMPLE OUTPUT TARBALL: CloVR-Metagenomics output
The CloVR-Metagenomics pipeline outputs several different files for the user:
Output | Description |
---|---|
read_mapping | A text file displaying the one-to-one mapping of sequence names created in the pipeline. |
uclust_clusters | Raw text output from uclust runs. |
artificial_replicates | A list of read names wthat were found to be artificial replicates from the 454 platform. |
blast_functional | Raw output of blast hits of representative sequences to a functional DB. |
tables_functional | Summary tables of functional categories for each sample. |
piecharts_functional | Visualized piecharts for functional groups. |
skiff_functional | Output of skiff clusterings for different functional levels. |
metastats_functional | Output of Metastats analysis comparing subject groups or samples at different functional levels. |
histograms_functional | Visualized stacked histograms of functional annotations. |
blast_taxonomy | Raw output of blast hits of representative sequences to a taxonomic DB. |
tables_taxonomy | Summary tables of taxonomy groups for each sample. |
piecharts_taxonomy | Visualized piecharts for taxonomic groups. |
skiff_taxonomy | Output of skiff clusterings for different taxonomic levels. |
metastats_taxonomy | Output of Metastats analysis comparing subject groups or samples at different taxonomic levels. |
histograms_taxonomy | Visualized stacked histograms of taxonomic annotations. |