This walkthrough provides a simple example of how to set-up and run the HMP Unified Metabolic Analysis Network (HUMAnN) PipelineÂ using the web-browser accessible CloVR dashboard. TheÂ HUMAnN Pipeline isÂ used to efficiently and accurately determine theÂ presence/absence and abundance of microbial pathways in a community from metagenomicÂ data.Â For this walkthrough, we shall utilize genes from the Anterior NaresÂ body site (sample SRS019215).Â This walkthrough also demonstrates how to run the pipeline using the cloud for computational support.Â The HUMAnN SOPÂ provides aÂ detailed description of this pipeline.
Note: This pipeline is not yet available on the lastest CloVR release (clovr-1.0-RC5, Nov. 2012). For now, you can access this pipeline by requesting an account at www.diagcomputing.org. After you login, select “Start CloVR” from the “My Account” drop-down. This will launch a new CloVR VM. Continue with the walkthrough from the Add input datasets to the pipelineÂ step.
Getting started with CloVR
Installing and setting up CloVR is a one-time process. If you have done this before, you may skip to the next step â€“Â Setting up input dataset.
CloVR is run using a local desktop client.Â Visit theÂ Getting started with CloVRÂ page to download and install the client. Once the CloVR virtual machine is set up and launched, you should see a screen similar to Figure 1.
Figure 1. CloVR desktop client
Start the CloVR web interface
First check the CloVR desktop window for the IP address of your virtual machine (VM). Then enter this IP address in a web browser as shown in Figure 2.
Figure 2. Accessing the CloVR web interface
Add cloud credentials to the pipeline
If you do not need additional computational support, you may skip to the next step â€“Â Setting up input dataset.
For additional computational support, visit theÂ Adding CredentialsÂ page for steps on how to add DIAG credentials.Â DIAGÂ is an academic cloudÂ which is free for researchers. Alternatively, you could run the pipeline on Amazon EC2 or using otherÂ cloud computing providers.Â Once the your DIAG credentials are setup, you should see it listed within the credentials tab as shown in Figure 3.
Setting up input dataset
Prepare input datasets
This pipeline requires an input file in FASTA format.Â Move the input fileÂ into theÂ user_dataÂ folder located within the clovr-standard-* image directory. This will enable us to easily access the data through the CloVR dashboard.
Before starting a pipeline, you must add your datasets to the CloVR VM as â€œTagsâ€.Â To add tags, click â€œAddâ€ on the web interface.
Then click on â€œSelect file from imageâ€, which will open a sub-window where you can select a FASTA file for upload into the VM. Alternatively, you can use â€œBrowseâ€ in the â€œUpload Fileâ€ window to find and select files from anywhere on your local computer.
Select â€œNucleotide FASTAâ€ from the â€œFile Typeâ€ drop-down menu and name your dataset, e.g. as â€œinput_readsâ€. Add an optional description of your dataset. Click â€œTagâ€ to upload the data to CloVR. A â€œCompleted Successfullyâ€ window should appear to indicate that your datasets was added to the CloVR VM and the new dataset should be listed under â€œData Setsâ€ on the web interface.
Figure 6. Setting up input dataset
The tagged datasets will appear as a â€œTagâ€ on the CloVR web interface. Multiple files will listed under the same â€œTagâ€ name.
Figure 7. Tagged Datasets
Pipeline setup and execution
To initialize a new pipeline run, click on the â€œOther Protocolsâ€ drop-down as shown in the figure below. Then select â€œclovr_humannâ€.
This will open the pipeline configuration window. For the input datasets select the tag corresponding to the input file.
Select â€œlocalâ€ or â€œDIAGâ€ credentials from the â€œAccountâ€ drop-down menu.Provide a name to recognize your pipeline in the web interface home page as â€œPipeline Descriptionâ€, e.g. â€œSRS019215_testâ€³.
Check your input by clicking â€œvalidateâ€. If the validation is successful, start the pipeline by clicking â€œsubmitâ€.
After a successful pipeline submission, the web interface will change to the â€œHomeâ€ page where the new pipeline will be listed as â€œStatus: running.â€
Monitoring the pipeline
Your pipeline should now appear in theÂ PipelinesÂ window in the CloVR dashboard along with its status. Occasionally, the pipeline may idle for a minute or two before running. You can click on the pipeline to get a description, input parameters, and hyperlinks to more advanced workflow interfaces like Ergatis.Â Clicking on the [Pipeline #] headers in the â€œPipeline Informationâ€ window will open the Ergatis â€œWorkflow creation and monitoring interfaceâ€ in a separate browser window, which provides useful information for troubleshooting of failed pipeline runs.
Accessing the outputs
Once the pipeline completes, the results can be downloaded from this CloVR dashboard by clicking on theÂ OutputsÂ tab (Figure 10). All results files are created as compressed archives (.tar.gz), which can be extracted using the Finder in Mac OS X, the Tar utility in Unix or programs such asÂ WinZiporÂ WinRAR, in Windows.
See the HUMAnN SOP for a full description of the output files produced.