Feature-Based Molecular Networking (FBMN)¶
[NEW !!!] The FBMN allows to generate molecular networks from MSE and Ion Mobility Spectrometry (IMS) data with MS-DIAL, or Progenesis QI, MetaboScape.
Feature-Based Molecular Networking (FBMN) is a computational method that bridges popular mass spectrometry data processing tools for LC-MS/MS and molecular networking analysis on GNPS. The supported tools are: MZmine, OpenMS, MS-DIAL, MetaboScape, XCMS, Progenesis QI, and the mzTab-M format.
The main documentation for Feature-Based Molecular Networking is provided below. See our article and the preprint.
The Feature-Based Molecular Networking (FBMN) workflow is available on GNPS via:
The Superquick Start page for FBMN. This page enables rapid submission of jobs with limited options.
The standard GNPS interface page for FBMN (you need to be logged in GNPS first).
For programmatic access to FBMN, contact Mingxun Wang at firstname.lastname@example.org.
This work builds on the efforts of our many colleagues, please make sure to cite the papers for their processing tools and the following GNPS papers
Nothias, L.-F., Petras, D., Schmid, R. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
The citations from the mass spectrometry processing tools you used [MZmine2, OpenMS, MS-DIAL, MetaboScape,XCMS, and mzTab-M format.
Mass Spectrometry Data Processing for the FBMN¶
In brief, popular mass spectrometry processing programs have been adapted to export two files (feature quantification table and MS/MS spectral summary) files that can be used with the Feature Based Molecular Networking (FBMN) workflow on GNPS. Alternatively, the FBMN supports the mzTab-M format that can be inputted along witht the mzML file(s). The tools supported and their main features are presented in the table below along with a step-by-step documentation to use in FBMN on GNPS:
|Processing tool||Doc.||Data supported||Interface||Platform||Code||Target user|
|MZmine||See doc.||Non-targeted LC-MS/MS||Graphical UI||Any||Open source||Mass spectrometrists|
|MS-DIAL||See doc.||Non-targeted LC-MS/MS, MSE, Ion Mobility||Graphical UI||Windows||Open source||Mass spectrometrists|
|OpenMS||See doc.||Non-targeted LC-MS/MS||Commandline||Any||Open source||Bioinformaticians and developers|
|XCMS||See doc.||Non-targeted LC-MS/MS||Commandline||Any||Open source||Bioinformaticians and developers|
|MetaboScape||See doc.||Non-targeted LC-MS/MS, Ion Mobility||Graphical UI||Windows||Proprietary code||Mass spectrometrists|
|Progenesis QI||See doc.||Non-targeted LC-MS/MS, MSE, Ion Mobility||Graphical UI||Windows||Proprietary code||Mass spectrometrists|
|mzTab-M||See doc.||Non-targeted LC-MS/MS||Standardized format||Multi-systems||Open source||All public|
IMPORTANT: The software used for the LC-MS/MS data processing has to be configured and utilized as recommended by its documentation.
The FBMN Workflow in GNPS¶
There is a dedicated Feature-Based Molecular Networking workflow on GNPS that can be accessed here (you need to be logged in GNPS first).
Requirement for the FBMN workflow¶
After processing your LC-MS/MS data with the prefered software, it is possible to export the results following two methods:
Three type of input files are needed (test files for each software are accessible here):
Option A (RECOMMENDED) - Export the processing results using a feature table and an .MGF file:
- A feature table with the intensities of LC-MS ion features (TXT or CSV format).
- A MS/MS spectral summary file with a list of MS/MS spectra associated with the LC-MS ion features (.MGF File). (.MGF file format).
- [Optional] Metadata table - format described here
- [Optional] Original mzML Files - These are the original files used for feature finding - described here
- [Optional] "Supplementary Pairs" of additional edges - described here
Option B - Export the processing results using an mzTab-M and mzML files:
Export a single mzTab-M file from the processed data. See and cite this publication.
Use the mzML file(s) associated with mzTab-M file.
[Optional] Metadata table - format described here
SuperQuick Feature Based Molecular Networking Workflow¶
A simplified interace for Super Quick web interace for FBMN is available here.
Running the SuperQuick FBMN¶
- Indicate your email and your GNPS Credentials.
- Select the 'Feature Generation tool'.
- Select the parameters preset.
- Option A. Drag and drop your "feature quantification table" and "MS/MS spectral file" (.MGF). See the respective documentation for FBMN each tool.
- Optional. Drag and drop a metadata table.
- Optional. Drag and drop a "Supplementary Pairs" csv file (see format) with additional edges
- Click on "Analyze Uploaded Files with GNPS Molecular Networking".
While this SuperQuick FBMN interface is convenient for quick analysis, we recommend using the standard FBMN workflow. [IMPORTANT] The file uploaded along with the jobs submitted with Superquick start interface are deleted on monthly basis. Use the "standard" interface of the FBMN for persistant jobs and more options.
Overview of the standard interface¶
Select the file source¶
Molecular Networks Options¶
|Precursor Ion Mass Tolerance (PIMT)||Parameter used for MS-Cluster and spectral library search expressed in Daltons. This value influences the aforementioned clustering of nearly-identical MS/MS spectra via MS-Cluster. Note that the value of this parameters should be consistent with the capabilities of the mass spectrometer and the specific instrument method used to generated the MS/MS data. Recommended Values value is ± 0.02 Da for high-resolution instruments (q-TOF, q-Orbitrap) and ± 2.0 Da for low-resolution instruments (ion traps, QqQ).||0.02|
|Fragment Ion Mass Tolerance (FIMT)||Parameters used for MS-Cluster, molecular networking, and MS/MS spectral library searches. For every group of MS/MS spectra being considered for clustering (consensus spectrum creation), this value specifies how much fragment ions can be shifted from their expected m/z values. Recommended Values value is ± 0.02 Da for high-resolution instruments (q-TOF, q-Orbitrap) and ± 0.5 Da for low-resolution instruments (ion traps, QqQ).||0.02|
Advanced Molecular Network Options¶
|Min Pairs Cos||Minimum cosine score for a pair of consensus MS/MS spectra in order for an edge to be formed in the molecular network||0.7||Lower value will increase the size of the clusters by inducing the clustering of less related MS/MS spectra, higher value will do the opposite.|
|Minimum Matched Fragment Ion (Min Matched Peaks)||Parameters used for molecular networking. The minimum number of common fragment ions that are shared by two separate consensus MS/MS spectra in order to be connected by an edge in the molecular network||6||A low value will permit linkages between spectra of molecules with few similar fragment ions, but it will result in many more less-related spectra being connected to the network. An higher value will do the opposite. Default value is 6, but note that this parameters should be adjusted depending on the experimental conditions for mass spectra acquisition (such as mode of ionisation, fragmentation conditions, the mobile phase, etc.), and the collision-induced fragmentation settings of the molecules of interest within the samples. High molecular weight (MW) compounds and compounds with more hetero-atoms generally tend to produce more fragment ions. However, this rule cannot is not absolute. For example, some lipids with high MW generate only a few fragment ions.|
|Maximum shift between precursors||The maximum m/z difference between two precursors permitting their spectra to be considered as direct neighbors in a molecular network||500||The maximum mass difference between two connected nodes in a molecular network.|
|Network TopK||Maximum number of neighbor nodes for one single node||10||The edge between two nodes are kept only if both nodes are within each other's ‘TopK’ most similar nodes. For example, if this value is set at 20, then a single node may be connected to up to 20 other nodes. Keeping this value low makes very large networks (many nodes) much easier to visualize.|
|Maximum Connected Component Size||Maximum number of nodes allowed in a single connected network||100||Nodes within a single connected molecular network will be separated by increasing cosine threshold for that specific connected molecular network. Default value is 100. Use 0 to allow an unlimited number of nodes in a single network. Note that with large datasets, or when a great number of related molecules are in the dataset, this value should be higher (or turn to 0) to retain as much information as possible. Downstream, these larger networks can be visualized using Cytoscape layout algorithms that can increase the intra-network clustering, allowing to visualize spectral groups in the network despite the number of nodes in the network.|
Advanced Spectral Library Search Options¶
|Library Search Min Matched Peaks||Minimum number of common fragment ions that MS/MS spectra should contain in order to be considered for spectral library annotation. Default value is 6, but note that this parameters should be tuned depending on the molecules of interest, and the experimental conditions (such as the ionisation mode and the fragmentation conditions). For example, the collision-induced fragmentation of some lipids produce only a few fragment ions. A lower value will allow clustering of MS/MS spectra containing less fragment ions, however it will also induce clustering of MS/MS spectra from different molecular types to be connected in one network. A higher value will do the opposite||6|
|Score Threshold||Minimum cosine score that MS/MS spectra should get in spectral matching with MS/MS spectral libraries in order to be considered an annotation.||0.7|
|Search Analogs||Whether to search data for analogs to library spectra||Don't Search|
|Maximum Analog Search Mass Difference||Maximum mass shift between the library and putative analogs found||100 (Da)|
|Top results to report per query||Number of matches to report for each feature||1|
Advanced Filtering Options (for Spectra)¶
|Minimum Peak Intensity||All fragment ions in the MS/MS spectra below this raw intensity value will be deleted. By default, the filtering is disabled.||0|
|Filter Precursor Window||All peaks in a +/- 17 Da around precursor ion mass are deleted. Enabled by default. This removes the residual precursor ion, which is frequently observed in MS/MS spectra acquired on qTOFs.||Filter|
|Filter library||Apply peak filters to library||Filter|
|Filter peaks in 50Da Window||Filter out peaks that are not in the top 6 most intense peaks in a +/- 50Da window||Filter|
Advanced Quantification Options¶
There are additional normalization options specifically for the FBMN workflow:
|Normalization Per File||Total Ion Current (TIC) normalization can be applied to the ion intensities (LC-MS1 peak area) per sample (NOT RECOMMENDED AS DEFAULT)||No Norm|
|Aggregation Method for Peak Abundances Per Group||The ion feature intensity (LC-MS1 peak area) can be aggregated by GROUPS from the metadatable with either a Sum or Average (RECOMMENDED, because it is more robust to the difference in the number of samples per GROUPS).||Average|
Advanced Univariate Statistics Options¶
These options will enabling the calculation of statistics as well as draw box plots for exploring.
|Metadata Column to Compare||This is the column name in your metadata (e.g. ATTRIBUTE_condition)||Empty|
|Metadata Field to Compare||This is the first field in the metadata data column to compare (e.g. CONTROL)||Empty|
|Metadata Field to Compare||This is the second field in the metadata data column to compare (e.g. CASE)||Empty|
|Metadata Column to Facet||This is another column where you can facet based on the terms (e.g. ATTRIBUTE_media)||Empty|
"Supplementary Pairs" is an option to add extra edges to the resulting FBMN. It was initially implemented for the Ion Identity Networking (IIN) workflow. The IIN supports currently MZmine, XCMS-CAMERA, and MS-DIAL. However, this approach is designed to stimulate the development and testing of new workflows as the input is an edge file in a generic CSV format. An edge is described by the following table:
|ID1||Node ID 1 matching the row IDs|
|ID2||Node ID 2 matching the row IDs|
|EdgeType||Any string describing the type of edge|
|Score||A numerical value for the score (cannot be empty)|
|Annotation||A string annotation|
Note that if Supplementary Pairs from other software are used, it is mandatory that the as the LC-MS feature identifier (ID) matches the "SCANS=" number in the MGF file.
Example of the Supplementary Pairs used in the IIN workflow.
mzML Files Used for Feature Finding¶
The original mzML/mzXML files that were used for feature finding can also be included. This provides a way for the workflow to link back to the original files and help people explore the data interactively.
Inspecting the Results of FBMN on GNPS¶
After the completion of the FBMN job (this will take from 10 to 10 hours depending on your number of samples and instrument), you will receive an email notification with a link to the results page (see example below).
Spectral Library Match and Network Topology Analysis¶
For more information about the inspection of the molecular networking results, please refer to the main documentation page.
Web-browser Molecular Network Visualization¶
Here is an example of web-browser view of molecular networks. Click on the link to view the interactive molecular networks.
Inspecting the Results of FBMN in Cytoscape¶
Cytoscape is an open source software platform used to visualize, analyze and annotate molecular networks from GNPS. See the documentation here.
Demo GNPS job of FBMN¶
Here is an example FBMN job with files resulting from MZmine2 processing of a subset of the [American Gut Project] (http://humanfoodproject.com/americangut/).
Running the DEREPLICATOR¶
The Insilico Peptidic Natural Products Dereplicator is a bioinformatic tool that allows the annotation of known peptidic natural products in MS/MS data using in silico fragmentation tree. This workflow is also included into the Feature Based Molecular Network workflow, then you have the option to use it by clicking into Advanced External tools. After your job has completed you can explore your results and even clone the Dereplicator job and modify the parameters.
If you use that tool, please cite the DEREPLICATOR papers. See citations in the main DEREPLICATOR documentation.
Running Network Annotation Propagation¶
It is possible to use the results of FBMN to run Network Annotation Propagation (NAP). NAP uses spectral networks to propagate information from spectral library matching, in order to improve in silico fragmentation candidate structure ranking. See the following documentation for NAP
If you use that tool, please cite the NAP paper. See citations in the main NAP documentation.
Running MS2LDA Substructure Discovery¶
The results of FBMN can be directly analyzed with MS2LDA. For this, on the result page click on "Advanced View" > "Analyze with MS2LDA". See the MS2LDA documentation.
MS2LDA is a tool that decomposes molecular fragmentation data derived from large metabolomics experiments into annotated Mass2Motifs or discovers Mass2Motifs from experimental data. Mass2Motifs are fragmentation patterns of often co-occurring mass fragment peaks and/or neutral losses that often represent molecular substructures. Check out the MS2LDA website here where you can find more information, browse through data sets, and sign up for an account to run the Mass2Motif discovery on your own data. At GNPS, we are working with the MS2LDA team to integrate both workflows which allows users to map Mass2Motif occurrences in their Molecular Families.
If you use that tool, please cite MS2LDA papers. See citations in the main MS2LDA documentation.
Running MolNetEnhancer with FBMN¶
MolNetEnhancer is a workflow that enables to combine the outputs from molecular networking, MS2LDA, in silico structure annotation tools (such as Network Annotation Propagation or DEREPLICATOR) and the automated chemical classification through ClassyFire to provide a more comprehensive chemical overview of metabolomics data whilst at the same time illuminating structural details for each fragmentation spectrum. For more information refer to the main MolNetEnhancer publication.
See the main documentation of MolNetEnh for informations about using it for FBMN.
Viewing the PCoA plot with EMPeror in Qiime2¶
EMPeror, is an open source and web browser enabled tool that allows researchers to perform rapid exploratory investigations of 3D visualizations of data. To view the PCoA plot (using Bray-Curtis dissimilarity metrics) with the EMPeror Qiime2 plugin, click on "View qiime2 Emperor Plots".
Citation for EMPeror: Yoshiki Vázquez-Baeza, Meg Pirrung, Antonio Gonzalez, and Rob Knight. Gigascience, 2(1):16, 2013. doi:10.1186/2047-217X-2-16.
Citation for Qiime2: Bolyen, E. et al. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science. (PeerJ Preprints, 2018). doi:10.7287/peerj.preprints.27295v2
Video Tutorial - Analyze FBMN in GNPS¶
See our tutorial on using MZmine2 for FBMN analysis of a cohort from the American Gut Project, and our tutorial on running a FBMN analysis on GNPS.
The FBMN source code can be found on the GNPS_Workflows GitHub repository.
Requirements for the input files for each processing tool are described at https://github.com/CCMS-UCSD/GNPS_Workflows/tree/master/feature-based-molecular-networking and representative input files are provided at https://github.com/CCMS-UCSD/GNPS_Workflows/tree/master/feature-based-molecular-networking/test/reference_input_file_for_formatter.
The "formatter" scripts that convert input files of the supported softwares are accessible in the script/ folder.
Input files requirements¶
The FBMN workflow accepts as input files either:
- A MS/MS spectral summary and a feature quantification table.
- or an mzTab-M file and the associated mzML files.
MS/MS spectral summary and feature quantification table¶
Applicable for MZmine, OpenMS, MS-DIAL, XCMS, MetaboScape, and Progenesis QI.
MS/MS spectral summary¶
The MS/MS spectral summary contains a list of representative spectra in the Mascot Generic Format (MGF file). An MGF file is a plain text file (ASCII) containing peak list information and spectra parameters (more information at http://www.matrixscience.com/help/data_file_help.html. Note that for Progenesis QI, instead of an MGF file, an MSP file (NIST spectral library format) is used instead.
Feature quantification table¶
The feature quantification table (TXT or CSV file) is specific for each supported processing tool and processed internally by GNPS. Yet, it needs to contain a Feature IDentifier (integer) for each LC-MS1 feature that must be unique and match the "SCANS=" header of the corresponding spectrum in the MS/MS spectral summary (MGF file). Note that the number of LC-MS1 features in the feature quantification table can be larger than the number of LC-MS1 features with a spectrum in the MS/MS spectral file, and the Feature IDentifier does not have to be sequencial. As a result, the feature quantification table can contain LC-MS1 feature that does not have an associated MS/MS scan in the MS/MS spectral file. The PCoA generated with qiime2 EMPeror in FBMN workflow uses the entire content of feature quantification table provided.
For all the processing tools supported, the mapping between the "Feature ID" of the feature quantification table should be consistent with the "SCAN" header of the representative spectrum in the MS/MS spectral summary. See this page for example files. Note that internally, the feature quantification table file inputted by the user are converted to a standard internal format prior to FBMN analysis in GNPS. The python scripts used for the conversion feature quantification Table from various software are available here. If you want to add support for another LC-MS processing tool, please contact us.
Support for mz-Tab-M format¶
The mzTab-M format is a standardized output designed for the report of metabolomics MS-data processing results. The FBMN workflow now supports the mzTab-M format (2.0, release 1.0.5). The mzTab-M file has to be inputed along with the related mzML files.
Basically, the mzTab-M file is used to retrieve for each LC-MS/MS feature:
- The abundance of the LC-MS/MS feature in each sample.
- The filename (mzML) and the index of the most intense associated MS/MS spectrum.
Note that currently, the mzTab-M has been tested only with XCMS. More informations in the FBMN with mzTab-M documentation.
Programmatic Access to FBMN¶
For programmatic access to the FBMN workflow, contact Mingxun Wang at email@example.com.
Join the GNPS Community !¶
- For feature request, or to report bugs, please open an "Issue" on the CCMS-UCSD/GNPS_Workflows GitHub repository.
- To contribute to the GNPS documentation, please use GitHub by forking the CCMS-UCSD/GNPSDocumentation repository, and make a "Pull Request" with the changes.