This walkthrough provides a simple example of how to set-up and run the CloVR-16S pipeline using the web-browser accessible CloVR dashboard, as well as analyze the resulting outputs. We shall utilize 16S rRNA amplicon sequences representing microbial communities extracted from 12 hard-palate and 12 attached-keratinized gingiva oral environments.
Getting started with CloVR
Specifying input data
- First download an HMP dataset and uncompress it. The example data used for this walkthrough (10.9 MB) consists of 24 fasta files (1 per sample), and a single metadata mapping file describing important information about each sample. The metadata mapping file is tab-delimited with a series of columns; we have designed it to allow for comparison of groups of interest. In this case, we are comparing the hard palate samples to the attached keratinized gingiva samples to see if there are differentially abundant taxa between the two environments.
- Next, move the entire folder HMPtestset1 into the user_data folder located within the clovr-standard-* image directory (see Figure 4 for an example). This will enable us to easily access the data through the CloVR dashboard.
- To add (or tag as we say) files for input to a pipeline, first select the “Data Sets” tab in the upper left corner of the dashboard, then select “Add” at the bottom of the corresponding left panel. This will bring up a new window to add data.
- Using this new window, first click the “Select file from image button” to access the user_data folder where our data lives. You can easily select all fasta files by clicking the checkbox next to the fastas folder within HMPtestset1 directory. Set the file type as Nucleotide fasta, and name and describe your data. You can pick anything you like for a name, but it may be easier to use the same name in Figure 5. Finally, click the “tag” button to add the data.
- Do the same procedure for the single mapping file (HMP_Oral_Comp1.map.txt) as shown in Figure 6. At this point, both datasets should appear in the left side panel of the CloVR dashboard organized by data type.
Now that the data we want to analyze has been tagged, we can setup and run the CloVR-16S pipeline.
Pipeline setup and execution
- Select the CloVR-16S button in the upper right region of the CloVR dashboard. This will bring up a form in the panel below to choose tagged datasets and parameters in the pipeline. Note that the standard operating procedure and description of the pipeline is available at the URL: http://clovr.org/methods/clovr-16s/. In the set of next steps, you can follow along with Figure 7.
- Choose the HMPtestset1_fastafiles dataset (or whatever you named these fasta files) from the first menu in the form. Similarly, select the corresponding CloVR mapping file by clicking the “Change” button next to the form. In this example, we do not use quality scores, but that is an option for the user.
- The CloVR-16S pipeline allows for execution with and without computationally intensive chimera checking, so if in this example, you want the pipeline to run very quickly, select without chimera checking, otherwise select the button to employ chimera checking.
- In the box next to the Account label, you can select to run the pipeline locally (on your own machine) or if you have the credentials set (see prior section), you can choose to run the pipeline on a cloud (e.g. DIAG). In Figure 7, we have named our DIAG credentials jdiag. Also, give your pipeline a description that makes sense to you.
- To have CloVR check your input files for consistency, select the Validate button at the bottom of the panel. If this succeeds, then select the Submit button to execute the pipeline.
Monitoring the pipeline
Your pipeline should now appear in the Pipelines window in the CloVR dashboard along with its status (Figure 8). Occasionally, the pipeline may idle for a minute or two before running. You can click on the pipeline to get a description, input parameters, and hyperlinks to more advanced workflow interfaces like Ergatis (Figure 9). Additionally, once the pipeline completes, the results can be downloaded from this window by clicking on the Outputs tab. (Figure 10).
Examining the outputs
Let’s take a look at some of the outputs to see what information we can gather.
To initially assess the data it can be helpful to look at the alpha-diversity of each sample using rarefaction curves. CloVR-16S computes and visualizes these curves using information provided in the metadata mapping file. Two of the rarefaction plots output from the pipeline are shown in Figure 11. We see that some samples appear more than twice as diverse than others, and that the number of high-quality sequences per sample varies largely.
A stacked histogram describing the relative abundances of taxonomic groups is in Figure 12. We can tell immediately from this figure that a few phyla dominate all samples including Proteobacteria, Actinobacteria, Bacteroidetes, and Firmicutes. We can see that Actinobacteria tends to be less abundant in attached-keratinized gingiva samples with the exception of one, which may be an outlier.
Finally, a skiff plot showing clustered samples and taxa is shown in Figure 13. In this plot we see that the two sample types separate fairly well, though not perfectly. Skiff plots are output for all taxonomic levels including phylum, class, order, family, and genus.
Additional outputs are described in the CloVR-16S SOP, available at http://clovr.org.