Test Input Data Set
FILE: CloVR-Comparative test input data: 5 Neisseria genomes
Pipeline Execution
1) Starting the CloVR web interface
This step is identical on all platforms, please choose instructions for the corresponding platform you are running CloVR on:
VMware
VirtualBox
Amazon EC2
DIAG
2) Add data
Add datasets to CloVR to make them available for use in CloVR-Comparative pipeline. This can be done either by tagging the datasets by, 1) directly uploading one or multiple GenBank files (as compressed archives or *.tgz files) 2) providing a link to the data, or by including genomes by accession numbers or from interactive browser.
a) Adding datasets to the CloVR VM as “Tags”
Click on the “Add” button on the CloVR Dashboard.
This opens a separate “Upload File” window. Click “Browse” to search your computer for GenBank files to upload. Alternatively, copy & paste a link to your GenBank files (*.tgz).
- Specify the file type (GenBank).
- Name your file(s) (e.g. “CompTest_5neisseria”).
- [Optionally] Add description of your dataset.
- Click “Tag”.
The tagged data is now available for analysis with CloVR-Comparative pipeline. Data upload to CloVR will start after CloVR-Comparative is started.
NOTE about providing your own Genbank input - One of the components in the CloVR Comparative pipeline checks and validates Genbank file input to ensure it meets NCBI standards.  The script will apply some basic corrections to the files where applicable, but will return an error state if the “under-the-hood” Biopython library cannot process the input file.  A list of these checks and fixes can be found HERE.
b) Include genomes by accession numbers or from interactive browser
GenBank files can also be included into the CloVR-Comparative analysis by specifying GenBank accession numbers or selection genomes from the interactive browser of RefSeq genomes.
Go to the CloVR-Comparative configuration screen by clicking on the CloVR-Comparative icon on the CloVR Dashboard.
Select previously added data for inclusion into the CloVR-Comparative analysis by selecting “Input Genbank Tags”.
And/or select genomes from the interactive, searchable genome browser.
And/or specify GenBank accession numbers of genomes to be included in the analysis.
3) Configuring and starting a pipeline
- If cloud credentials have been added to the CloVR VM, they can be selected in the “Account†drop-down menu. For CloVR-Comparative runs on DIAG choose “local”.
- Provide “Output prefix” which will be the prefix for all output files.
- Specify a unique value for “Name of Sybil site”, “Username” (lowercase) and “Password” for the Sybil website. Username and password will be used to login to Sybil website to visualize the data.
- Provide a “Pipeline Description†to recognize your pipeline on the Dashboard “Home†page, e.g. “CloVR-Comp_test”.
- Check your input by clicking “Validateâ€.
- If the validation is successful, start the pipeline by clicking “Runâ€.
After successful pipeline submission, the web interface will change to the “Home†page where the new CloVR-Comparative pipeline will be listed as “Status: running†.
4) Monitoring your pipeline
The pipeline status, i.e. the number of steps completed, is shown for the running pipeline. Further information can be accessed by clicking on the pipeline name, which opens the “Pipeline Information†window. Clicking on the [Pipeline #] headers in the “Pipeline Information†window will open the Ergatis “Workflow creation and monitoring interface†in a separate browser window, which provides useful information for troubleshooting of failed pipeline runs. Each protocol consists of an outer wrapper pipeline, which always runs locally, and an inner pipeline (show in parentheses in the “Pipeline Information†window), which runs locally or on the cloud depending on the pipeline configuration.
5) Downloading Pipeline Output
Once the CloVR-Comparative run has completed, multiple result files are created in the “Outputs†tab of the “Pipeline Information†window.
All result files are created as compressed archives (.tar.gz), which can be extracted using the Finder in Mac OS X or the Tar utility in Unix or programs, such as WinZip or WinRAR, in Windows.
Output
The output files are:
Output | Description |
---|---|
circleator_pdf | Circular representation of each input genome in a pdf file format |
circleator_png | Circular representation of each input genome in a png file format |
cluster_fasta | Protein sequences of all genes organized by clusters of syntenic orthologos genes |
mugsyalign_maf_tag | Reference free whole genome multiple alignment of all input genomes in maf file format |
mugsy_mapped_cogformat | Clusters of syntenic orthologos genes based on mugsy alignment |
mugsy_mapped_features | Details of clusters of syntenic orthologos genes based on mugsy alignment with gene annotations |
snps_file | A file with SNP calls and their positions in all the input genomes |
summary_files | A summary of annotated genes and SNPs for each input genome |
summary_report | A summary of comparative genomic analysis results |
sybil_archive | Archived file for Sybil instance |
validation_changelog | Log file with all the changes made to the input GenBank files |
wga_tree | A phylogenetic tree of all input genomes in newick file format |
wga_tree_pdf | A phylogenetic tree of all input genomes in pdf file format |
wga_tree_svg | A phylogenetic tree of all input genomes in svg file format |
To login and access interactive Sybil website for comparative genomic analyses, click on the drop-down button “sybil_website” on the left side of the CloVR Dashboard. Click on the link and this will open Sybil home page.