Introduction
Runtimes of CloVR-16S version 1.1 depend largely on whether the optional chimera check of all sequences with UCHIME is being performed. Up to 500K sequences can be easily processed without the chimera checking step in several hours on a local computer with a single processor, 2 GB of RAM and 15 GB of free disk space. The same analysis including chimera check should still complete in less than 24 hours.
In contrast to older versions of CloVR-16S, version 1.1 has lower RAM requirements and parallelizes several steps of the protocol, e.g. the chimera check with UCHIME, calculation of rarefaction curves with Mothur and identification of differentially abundant OTUs with Metastats.
Input
1. A single FASTA file with multiplex barcodes & a QIIME-formatted metadata mapping file
OR
2. Multiple FASTA files (1 file per sample) & a CloVR-formatted metadata mapping file
To process multiple samples with CloVR-16S, either from a single or from multiple FASTA files, additional metadata associated with each sample needs to be provided in the form of a mapping file. This tab-delimited text file specifies, for example, information about barcodes used for multiplex sequencing or groups of related samples.
Download Test Datasets and Output
- Single FASTA dataset + mapping file: CloVR-16S mini example single FASTA
- Multiple FASTA dataset + mapping file: CloVR-16S mini example multiple FASTAs
- Output of single FASTA dataset run: CloVR-16S single FASTA example output
Test datasets are .tar archives of FASTA and mapping files need to be extracted before they can be used with CloVR-16S.
Pipeline Execution
1. Starting the CloVR web interface
This step is identical on all platforms, please choose instructions for the corresponding platform you are running CloVR on:
VMware
VirtualBox
Amazon EC2
DIAG
2. Adding datasets to the CloVR VM
Before starting a pipeline, you must add your datasets to the CloVR VM as “Tags”. The easiest way to add data to the VM is by copying them into the “user_data” folder, which is a shared folder between the VM and your local computer. Check Troubleshooting CloVR on VirtualBox, if you have problems accessing files in the shared folders.
To add files, click “Add” on the web interface.
If your files are in the “user_data” directory, click on “Select file from image”, which will open a sub-window where you can selecte one or multiple FASTA files for upload into the VM. Alternatively, you can use “Browse” in the “Upload File” window to find and select files from anywhere on your local computer but multiple files have to be uploaded in separate steps.
Select “Nucleotide FASTA” from the “File Type” drop-down menu and name your dataset, e.g. as “CloVR16S_Test_FASTA”. This name will appear as a “Tag” on the CloVR web interface. Multiple files will listed under the same “Tag” name. Add an optional description of your dataset. Click “Tag” to upload the data to CloVR.
A “Completed Successfully” window should appear to indicate that your datasets was added to the CloVR VM and the new dataset should be listed under “Data Sets” on the web interface.
In the CloVR VM, each dataset is listed with the “Tag” name that was specified as “Name” when the dataset was added. Clicking on the newly added “Tag” opens a new window with information about the associated dataset.
If multiple FASTA files are being processed with CloVR-16S, they all need to be assigned to the same “Tag”. This can be done either by uploading multiple files simultaneously from the “user_data” directory, by adding new FASTA files to an existing “Tag” or by uploading multiple FASTA files as uncompressed (.tar) or compressed (.tar.gz) archives. To add a new FASTA file to an existing “Tag” click on “Add” in the “Files” window, which will open a new “Upload File” window.
Compressed or uncompressed archives are uploaded in the same way as single FASTA files.
To upload a metadata mapping file, click on “Add” in the “Data Sets” window, browse and select the mapping file in the “Upload File” window, similar to the steps described above, but choose “Metagenomics mapping file” as File type and provide a name that will appear as “Tag” on the CloVR “Home” web interface, e.g. “CloVR16S_Test_mapping”.
3. Configuring and starting the pipeline
To initialize a new pipeline run, select the “Tag” corresponding to your FASTA files in the “Data Sets” window and click on the “CloVR 16S” icon.
This will open the pipeline configuration window. Make sure the correct “Tag” is shown as the “Select Sequencing Dataset” and select the “Tag” corresponding to the correct metadata mapping file as the “CloVR Mapping File”. Choose a protocol with or without chimera check.
If cloud credentials have been added to the CloVR VM, they can be selected in the “Account” drop-down menu. Alternatively, “local” can be selected to perform CloVR-16S supported analysis on the local computer. Provide a name to recognize your pipeline in the web interface “Home” page as “Pipeline Description”, e.g. “CloVR16S_Test_Run1”.
Check your input by clicking “Validate”. If the validation is successful, start the pipeline by clicking “Submit”. After successful pipeline submission, the web interface will change to the “Home” page where the new pipeline will be listed as “Status: running”
4. Monitoring your pipeline
The pipeline status, i.e. the number of steps completed, is shown for each running pipeline. Further information can be accessed clicking on the pipeline name, which opens the “Pipeline Information” window.
Clicking on the [Pipeline #] headers in the “Pipeline Information” window will open the Ergatis “Workflow creation and monitoring interface” in a separate browser window, which provides useful information for troubleshooting of failed pipeline runs.
Downloading Pipeline Output
Once the CloVR-16S run has completed, multiple results files are created in the “output” directory. If CloVR-16S is run on the cloud, all results files will be downloaded into the same folder. The path to this folder should look like this:
/clovr-standard-2011-08-25-05-13-27/shared/output/
All results files are created as compressed archives (.tar.gz), which can be extracted using the Finder in Mac OS X, the Tar utility in Unix or programs, such as WinZip or WinRAR, in Windows.
In addition, all results files are also available for download from the web interface, using the “Output” tab from the “Pipeline Information” window, which is accessible by clicking on the completed pipeline name.
Output
Depending on the characteristics of the data, some results may not be generated due to inherent computational difficulties or poor expected results. The outputs are:
Output | Description |
---|---|
filtered_sequences | Sequences passing the Qiime-based poor-quality filter (filename: seqs.fna) |
chimeras | Sequence names from seqs.fna identified as putative chimeras (filename: allchimeraids.txt) |
uclust_otus | Table showing OTU sample compositions (RDP classifier/Qiime) |
summary_tables | Taxonomic summary tables at various phylogenetic levels. |
rarefactions | Alpha-diversity: rarefaction numerical curves separated by sample (Mothur) |
rarefaction_plots | Visualized rarefaction plots separated by metadata type (Leech/CloVR) |
mothur_summary | Richness and diversity estimators (Mothur) |
PCoA_plots | Beta-dviersity weighted & unweighted UniFrac 3D PCoA plots (Qiime) |
skiff | Taxonomic composition-based sample heatmap clustering (Skiff/CloVR) |
histograms | Taxonomic composition-based stacked histograms (CloVR) |
metastats | Differentially abundant taxonomic groups (Metastats) |