HMP-DACC CloVR Bowtie Aligner Walkthrough


This walkthrough provides a simple example of how to set-up and run the Bowtie Aligner using the web-browser accessible CloVR dashboard, as well as analyze the resulting outputs. We shall align metagenomic WGS reads extracted from the Anterior Nares body site (sample SRS019215), to reference genome Staphylococcus aureus.

In most cases, this pipeline can be run locally. But for huge datasets, you may require additional computational support. This walkthrough demonstrates how to run the pipeline both locally and by using the cloud for computational support.

The Bowtie Manual provides a detailed description of the aligner tool.


Getting started with CloVR

Installing and setting up CloVR is a one-time process. If you have done this before, you may skip to the next step – Setting up input dataset.

Install CloVR

CloVR is run using a local desktop client. Visit the Getting started with CloVR page to download and install the client. Once the CloVR virtual machine is set up and launched, you should see a screen similar to Figure 1.

Figure 1. CloVR desktop client

Start the CloVR web interface

First check the CloVR desktop window for the IP address of your virtual machine (VM). Then enter this IP address in a web browser as shown in Figure 2.

Figure 2. Accessing the CloVR web interface



Add cloud credentials to the pipeline

If you do not need additional computational support, you may skip to the next step – Setting up input dataset.

For additional computational support, visit the Adding Credentials page for steps on how to add DIAG credentials. DIAG is an academic cloud which is free for researchers. Alternatively, you could run the pipeline on Amazon EC2 or using other cloud computing providers. Once the your DIAG credentials are setup, you should see it listed within the credentials tab as shown in Figure 3.

Figure 3. List of credentials

Setting up input dataset

Prepare input datasets

This pipeline requires two sets of files: reads and an index of the reference genome.


The reads file(s) should be FASTQ format. It could be just one file for single-end reads or two files for paired-end reads.

Reference index

Depending on the version of pipeline you choose to use (indices or noindices), the reference index dataset is either:

  • a sequence file of the reference genome (FASTA format) or,
  • a pre-built index dataset (several examples can be downloaded from the Bowtie website)


Next, move the input files into the user_data folder located within the clovr-standard-* image directory. This will enable us to easily access the data through the CloVR dashboard.

For the “no indices” version, move the reads fastq file(s) and reference fasta file into user_data folder as shown in Figure 4.

Figure 4.  Setting up input dataset for the pipeline (“no indices” version)


For the “indices” version, create a folder named bowtie_indices in user_data and move the pre-built index files there as shown in Figure 5. Also move the reads fastq file(s) to user_data.

Figure 5. Setting up input dataset for the pipeline (“indices” version). Create a bowtie_indices folder. Inside it, create a new folder for each reference. The name of the folder should match the bowtie index prefix. E..g. in this case, the Bowtie index prefix is “s_aureus”.


Add input datasets to the pipeline

Before starting a pipeline, you must add your datasets to the CloVR VM as “Tags”.  To add tags, click “Add” on the web interface.

Figure 6. Adding new datasets.



Then click on “Select file from image”, which will open a sub-window where you can select one or multiple FASTQ files for upload into the VM. Alternatively, you can use “Browse” in the “Upload File” window to find and select files from anywhere on your local computer, but multiple files have to be uploaded in separate steps.

Select “Nucleotide FASTQ” from the “File Type” drop-down menu and name your dataset, e.g. as “input_reads”. For single reads select just one file to tag, and for paired reads, select both files. Add an optional description of your dataset. Click “Tag” to upload the data to CloVR. A “Completed Successfully” window should appear to indicate that your datasets was added to the CloVR VM and the new dataset should be listed under “Data Sets” on the web interface.

Figure 7. Adding a FASTQ dataset


Next, if you do not have a pre-built index, repeat the same process for the reference FASTA file. This time select “Nucleotide FASTA” from the “File Type” drop-down menu and name your dataset, e.g. as “ref_fasta”. Click “Tag” again to upload the data to CloVR.

Figure 8. Adding a Reference FASTA tag (no indices version only).

If you are using a pre-built index, you do not need to tag the index files.



The tagged datasets will appear as a “Tag” on the CloVR web interface. Multiple files will listed under the same “Tag” name.

Figure 9. Tagged Datasets



Pipeline setup and execution

To initialize a new pipeline run, click on the “Other Protocols” drop-down as shown in the figure below. Then select “clovr_align_bowtie_indices” if you uploaded bowtie pre-built index or “clovr_align_bowtie_noindices” if you uploaded a reference sequence fasta file.


Figure 10. Setup pipeline

This will open the pipeline configuration window. For the input datasets select the tag corresponding to the input file(s). If you are running the noindices version, also select the appropriate tag for the Reference FASTA sequence field.

Select “local” or “DIAG” credentials from the “Account” drop-down menu. Bowtie runs pretty quickly, so running it locally should be fine if the dataset is not too large.

Provide a name to recognize your pipeline in the web interface home page as “Pipeline Description”, e.g. “HMP_Bowtie_test″.

Figure 11. Configuring the clovr_align_bowtie_noindices pipeline

Figure 12. Configuring the clovr_align_bowtie_indices pipeline


Check your input by clicking “validate”. If the validation is successful, start the pipeline by clicking “submit”.

After a successful pipeline submission, the web interface will change to the “Home” page where the new pipeline will be listed as “Status: running.”


Monitoring the pipeline

Your pipeline should now appear in the Pipelines window in the CloVR dashboard along with its status. Occasionally, the pipeline may idle for a minute or two before running. You can click on the pipeline to get a description, input parameters, and hyperlinks to more advanced workflow interfaces like Ergatis. Clicking on the [Pipeline #] headers in the “Pipeline Information” window will open the Ergatis “Workflow creation and monitoring interface” in a separate browser window, which provides useful information for troubleshooting of failed pipeline runs.


Figure 13. Pipeline status


Accessing the outputs

Once the pipeline completes, the results can be downloaded from this CloVR dashboard by clicking on the Outputs tab (Figure 11). All results files are created as compressed archives (.tar.gz), which can be extracted using the Finder in Mac OS X, the Tar utility in Unix or programs such as WinZipor WinRAR, in Windows.Accessing the outputs

Figure 14. Accessing output files

The CloVR-Human Contaminant Screening pipeline outputs the following files:

Output Description
alignments Alignments in SAM format
unmapped File(s) containing reads that could not be aligned
stats Alignment statistics