CloVR-Comparative Walkthrough

Test Input Data Set

FILE: CloVR-Comparative test input data: 5 Neisseria genomes

Pipeline Execution

1) Starting the CloVR web interface

This step is identical on all platforms, please choose instructions for the corresponding platform you are running CloVR on:

VMware
VirtualBox
Amazon EC2
DIAG

2) Add data

Add datasets to CloVR to make them available for use in CloVR-Comparative pipeline. This can be done either by tagging the datasets by, 1) directly uploading one or multiple GenBank files (as compressed archives or *.tgz files) 2) providing a link to the data, or by including genomes by accession numbers or from interactive browser.

a) Adding datasets to the CloVR VM as “Tags”

Click on the “Add” button on the CloVR Dashboard.

CloVR.home.add

This opens a separate “Upload File” window. Click “Browse” to search your computer for GenBank files to upload. Alternatively, copy & paste a link to your GenBank files (*.tgz).

  1. Specify the file type (GenBank).
  2. Name your file(s) (e.g. “CompTest_5neisseria”).
  3. [Optionally] Add description of your dataset.
  4. Click “Tag”.

CComp.upload

The tagged data is now available for analysis with CloVR-Comparative pipeline. Data upload to CloVR will start after CloVR-Comparative is started.

NOTE about providing your own Genbank input - One of the components in the CloVR Comparative pipeline checks and validates Genbank file input to ensure it meets NCBI standards.  The script will apply some basic corrections to the files where applicable, but will return an error state if the “under-the-hood” Biopython library cannot process the input file.  A list of these checks and fixes can be found HERE.

b) Include genomes by accession numbers or from interactive browser

GenBank files can also be included into the CloVR-Comparative analysis by specifying GenBank accession numbers or selection genomes from the interactive browser of RefSeq genomes.

Go to the CloVR-Comparative configuration screen by clicking on the CloVR-Comparative icon on the CloVR Dashboard.

Screen Shot 2015-01-22 at 12.56.40 PM

Select previously added data for inclusion into the CloVR-Comparative analysis by selecting “Input Genbank Tags”.

CComp.inputTags

And/or select genomes from the interactive, searchable genome browser.

CComp.RefSeqBrowser

And/or specify GenBank accession numbers of genomes to be included in the analysis.

CComp.GBacc

3) Configuring and starting a pipeline

  1. If cloud credentials have been added to the CloVR VM, they can be selected in the “Account” drop-down menu. For CloVR-Comparative runs on DIAG choose “local”.
  2. Provide “Output prefix” which will be the prefix for all output files.
  3. Specify a unique value for “Name of Sybil site”, “Username” (lowercase) and “Password” for the Sybil website. Username and password will be used to login to Sybil website to visualize the data.
  4. Provide a “Pipeline Description” to recognize your pipeline on the Dashboard “Home” page, e.g. “CloVR-Comp_test”.
  5. Check your input by clicking “Validate”.
  6. If the validation is successful, start the pipeline by clicking “Run”.

clovr_comp_account3

After successful pipeline submission, the web interface will change to the “Home” page where the new CloVR-Comparative pipeline will be listed as “Status: running” .

clovr_comp_pipeline3

4) Monitoring your pipeline

The pipeline status, i.e. the number of steps completed, is shown for the running pipeline. Further information can be accessed by clicking on the pipeline name, which opens the “Pipeline Information” window. Clicking on the [Pipeline #] headers in the “Pipeline Information” window will open the Ergatis “Workflow creation and monitoring interface” in a separate browser window, which provides useful information for troubleshooting of failed pipeline runs. Each protocol consists of an outer wrapper pipeline, which always runs locally, and an inner pipeline (show in parentheses in the “Pipeline Information” window), which runs locally or on the cloud depending on the pipeline configuration.

 

5) Downloading Pipeline Output

Once the CloVR-Comparative run has completed, multiple result files are created in the “Outputs” tab of the “Pipeline Information” window.

All result files are created as compressed archives (.tar.gz), which can be extracted using the Finder in Mac OS X or the Tar utility in Unix or programs, such as WinZip or WinRAR, in Windows.

clovr_comp_output2

Output

The output files are:

Output Description
circleator_pdf Circular representation of each input genome in a pdf file format
circleator_png Circular representation of each input genome in a png file format
cluster_fasta Protein sequences of all genes organized by clusters of syntenic orthologos genes
mugsyalign_maf_tag Reference free whole genome multiple alignment of all input genomes in maf file format
mugsy_mapped_cogformat Clusters of syntenic orthologos genes based on mugsy alignment
mugsy_mapped_features Details of clusters of syntenic orthologos genes based on mugsy alignment with gene annotations
snps_file A file with SNP calls and their positions in all the input genomes
summary_files A summary of annotated genes and SNPs for each input genome
summary_report A summary of comparative genomic analysis results
sybil_archive Archived file for Sybil instance
validation_changelog Log file with all the changes made to the input GenBank files
wga_tree A phylogenetic tree of all input genomes in newick file format
wga_tree_pdf A phylogenetic tree of all input genomes in pdf file format
wga_tree_svg A phylogenetic tree of all input genomes in svg file format

To login and access interactive Sybil website for comparative genomic analyses, click on the drop-down button “sybil_website” on the left side of the CloVR Dashboard. Click on the link and this will open Sybil home page.

Screen Shot 2015-05-27 at 3.26.18 PM