Qemistree

QEMISTREE - a tool to represnt metabolomics data as trees to explore chemical diversity

Introduction: Qemistree is a computation tool to build a tree of mass-spectrometry (LC-MS/MS) features to perform which enables the use of phylogeny-based tools to study chemical composition of samples. The article is available in Nat. Chem. Biol. (2020). This documentation aims to provide a user-guide on how to run a Qemistree workflow on GNPS.

Please use this interim link to run Qemitree on your data after FBMN. This will be updated in the next GNPS release.

STEP 1: Collecting the right input files

Users can perform QEMISTREE analysis in GNPS after processing LC-MS/MS data through Feature-Based Molecular Networking or FBMN using MZmine2. The input files for Qemistree are:

1. Required: A SIRIUS MGF file generated from the MZmine workflow. See the FBMN with MZmine2.

2. Required: A quant table called qiime2_table.qza with all the identified features (an output of FBMN, and found within the folder qiime2_output)

3. Optional: A metadata file called qiime2_metadata.tsv (an output of FBMN, and also found in folder qiime2_output)

4. Optional: A library identification file (TSV) (an output of FBMN, and found in the folder clusterinfo_summary)

Follow the steps below to generate these files:

a. Follow the documentation for Feature-Based Molecular Networking using MZmine2 to generate an aligned feature list for your LC-MS/MS data and export the necessary MGF and CSV files for FBMN GNPS workflow.

b. While still in MZmine2, select your aligned feature list, then click on the tab for feature list methods and select Export/Import, followed by Export for SIRIUS.

c. Choose the Mass list that you used to generate your feature list, and specify a path and filename for your SIRIUS file. Click OK.

d. Go to GNPS server and run a FBMN job using the GNPS quant CSV, GNPS MGF (both generated in step a, above) and ReDU metadata file.

e. The Qemistree task utilizes the output files from FBMN. These can be downloaded by clicking on Download qiime2 Emperor qzv (qiime2_table.qza and qiime2_metadata.tsv from folder qiime2_output and clusterinfo_summary for the tsv file within). You can start a Qemistree task by clicking on Visualize with Qemistree link on your FBMN status page

Note Visualize with Qemistree option is available for FBMN release version >= 20. If your FBMN job was run using an older version, please re-run the job by clicking on Clone to Latest Version to follow the rest of the tutorial.

STEP 2: Running a Qemistree job

1. Go to the status page of your FBMN job:

alt text

Click on Visualize with Qemistree under the Advanced Views - Experimental Views to analyze your data using Qemistree. This method auto-populates the input fields Quantification Table, Metadata Table and Library Identifications from your corresponding FBMN job. Additionally, you need to upload your SIRIUS MGF file exported from Mzmine2 as describes in Step 1.

alt text

2. Make sure the correct files are included as described in the beginning:

Sirius MGF Spectrum file
qiime2_table for Quantification Table
qiime2_metadata for Metadata Table
clusterinfo_summary file for Library Identifications

3. Under the header: Advanced options select the following: 1. Instrument Type: orbitrap or qtof 2. Ionization Mode: positive, negative or auto

If you are signed in to the server the email address will auto-populate. If not, add your email address and click on submit. The runtime depends on the number of features in your dataset -- a typical dataset (few thousand features) can take a few hours. You will get an email once the job finishes and then you are ready to explore your molecular trees!

STEP 3: Analysing the results from a Qemistree job from the status page

Once the job is finished successfully, you will see the status page as below.

alt text

1. View Summary gives a list of the molecules, their structures and the chemical taxonomy assigned by ClassyFire taxonomic levels. It tabulates the annotations for all MS/MS spectra for which we could predict molecular fingerprint using Sirius and CSI:FingerID Dührkop et. al.

alt text

2. View Qemistree iTOL Tree to visualize the chemical tree based on similarity of molecular fingerprints predicted by Sirius and CSI:FingerID. Follow the click here button for Qemistree visualization in iTOL. Qemistree classifies the features/molecules based on ClassyFire chemical taxonomy into chemical kingdom, superclass, class, subclass, and direct_parent which can be visualized as tip labels in the tree.

Note. The tree visualized above is the default visualization which includes all the molecules that had structural annotations from either spectral library matches or CSI:FingerID structural predictions. This tree is decorated with clade color and tip labels based on ClassyFire class assignment. This visualization can be interactively modified using View Qemistree Dashboard link (described in Step 4 below).

Shown below is an example default tree.

alt text

We recommend you to download the tree and tree decoration files by clicking on the boxes: Qemistree, Labels, Colors, and Abundance. These provide the tree file of the features that have smiles (qemistree.tree), the label for each tip of the tree (labels.txt), the color of each clade (colors.txt), and the relative abundance of the feature in the metadata category chosen for the job (barplots.txt). This is because the auto-generated tree is only available for 30 days & therefore it is recommended that you download the associated files and upload it on iTOL using your login credential for permanent storage.

3. Under Advanced Views, click on View qiime2 Emperor Plots to visualize the Principal Coordinate Analysis result using weighted UniFrac distances based on the chemical relationships based on predicted molecular fingerprints.

4. Visualizing chemical trees using the View Qemistree dashboard. You can further explore and modify the Qemistree visualization interactively using the Qemistree dashboard. This dashboard is available as a link on your Qemistree job status page for direct access.

The example Qemistree task on the dashboard provides is based on a subset of a global foodomics dataset.

alt text

For your own data visualization, enter the following information on the dashboard:

Qemistree task ID: This is the unique ID of Qemistree job on GNPS. It can be found in your Jobs page in the Description column. Alternatively, click on View Qemistree Dashboard on the job status page, and this field will be auto-populated.
Column to filter qemistree features: Enter the feature metadata column to prune the tree. This can be a chemical taxonomy level or structural annotation type (all structures, MS2 structures, CSI:FingerID structures).
Column to label tree tips: Enter chemical taxonomic level that should be used to label each feature on the tree (tree tips).
You can include additional options for labelling features by MS2 library match or parent m/z values, when a feature cannot be assigned a chemical taxonomy by ClassyFire.
Choose metadata column by which to visualize the abundance of the features, and whether to normalize the abundance.
Click on Submit. You can quickly generate many Qemistree-iTOL visualizations by changing the values entered in step 2-5 to explore your data.
Click on Datasets tab in iTOL to visualize the relative abundance of each feature in the sample metadata category you chose in Step 5.

You can interactively modify the aethetics of this visualization (such as colors, fonts, sizes etc) interactively by using the control panel in iTOL.

5. You can download the .qzv file generated in the dashboard and drag and drop to https://view.qiime2.org/. Clicking on “View this tree using iTol” will build the tree with all the information contained in the Qemistree, Labels, Colors, and Abundance files. In this way, it will not be possible to trace the branches back to the fingerprints (and, consequently, to the Feature IDs) since the information you can get putting the cursor on the nodes is its branch length and the classification information.

If you want to trace the branches back to the feature IDs, you can download the Qemistree.tree file, and upload it in iTol directly in your account. In this way, you will get a tree without any other information added, just the fingerprint information for each node.

alt text

On the job status window, you can click on “view summary” and download this entire table to check which feature ID corresponds to the fingerprint you are interested in (seen on the tree).

alt text

You can also just drag and drop to iTol the Labels, Colors, and Abundance files to have all this information in the tree again.

STEP 4. Advanced analysis using QIIME2 For advanced users, we encourage you to download all the files generated from this analysis by clicking on Download Qiime2 data on the status page of the job and explore your data using tools available in QIIME2. Qemistree workflow formats all resulting files such that they are compatible with statistical and visualization tools in QIIME2 and we encourage users to leverage QIIME2 tools for data exploration.

Among the folders downloaded, the output folder contains qemistree.qza and qemistree-pruned-smiles.qza which are two tree files. The qemistree.qza file contains all the features for which molecular fingerprints could be predicted using Sirius+CSI:FingerID; qemistree-pruned-smiles.qza which contains the features that were annotated with molecular structures using spectral matching or CSI:FingerID. Moreover, output_folder also contains: 1. merged-feature-table.qza which has the abundances of features per sample and 2. classified-feature-data.qza which has feature metadata (parent mass, retention time, spectral library match, molecular structures and Classyfire chemical taxonomy for each feature).

You can use these files to perform additional analyses in the QIIME2 which can be installed on your computer using the documentation here. Some suggested QIIME2 analytical tools are linked below: 1. Generate heatmap of features per sample 2. Alpha-diversity using Faith's PD for within sample chemical diversity 3. Beta-diversity using UniFrac for between sample chemical comparison 4. Songbird analysis for identifying differentially abundant molecules between sample groupings 5. mmvec analyses for microbe-metabolite interactions

Additionally, we encourage you to explore the plethora of tools available here to guide further analyses of your metabolomics data.

For detailed information on all the steps performed during Qemistree processing, please check out our source code and documentation here

CITATIONS:

Tripathi, A., Vázquez-Baeza, Y., Gauglitz, J.M. et al. Chemically informed analyses of metabolomics mass spectrometry data with Qemistree. Nat Chem Biol (2020)

Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc. Natl. Acad. Sci. U. S. A. 112, 12580–5 (2015)

Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 8, 61 (2016)

Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Research vol. 47 W256–W259 (2019)

Page Contributors¶

Ming Wang