Skip to content

MolDiscovery

molDiscovery - a probabilistic model for in silico database search of MS/MS

Introduction

molDiscovery is a probabilistic model for in silico database search of small molecule MS/MS. It improves DEREPLICATOR+ by

  1. using an efficient fragmentation algorithm
  2. using log likelihood ratio as metabolite spectrum matching score
  3. an efficient implementation for indexing

As a result, molDiscovery improves both efficiency and accuracy of the database search.

molDiscovery model and software are developed in collaboration of Saint Petersburg State University (Russia) and Carnegie Mellon University (PA, USA).

Manuscripts and Citations

Liu Cao et al. "MolDiscovery: Learning Mass Spectrometry Fragmentation of Small Molecules." bioRxiv.

Development

The GitHub repository provides latest source code and release.

Accessing molDiscovery workflow on GNPS

The molDiscovery are currently available on GNPS as a workflow.

img

STEP 1: Go to GNPS and login

Go to http://gnps.ucsd.edu, and create/login to your account.

img

STEP 2: Access molDiscovery

Derectly go to the workflow page.

img

STEP 3: Select input spectra

Select/Import a mass spectrometry file: The input format accepted are mzML/mzXML/MGF. Note that have the option to import an existing GNPS dataset (by clicking on "Share Files") or upload your own data (by clicking on "Upload Files"). See the corresponding GNPS documentation for FTP upload.

After getting done with selecting files, click on Finish Selection.

img

img

img

STEP 4: Set up parameters

Set up the parameters for the molDiscovery job: Select a title for your job, and adjust parameters depending on your data and the mass spectrometer used.

img

BASIC OPTIONS:

-- Precursor Ion Mass Tolerance: This value specifies how much fragment ions can be shifted from their expected m/z values. Default value is ± 0.01 Da.

-- Fragment ion Mass Tolerance: This value specifies how much fragment ions can be shifted from their expected m/z values. Default value is ± 0.01 Da.

ADVANCED OPTIONS:

-- Predifined database: the default small molecule structure database is AllDB, which contains 720K compounds.

-- Custom DB file: if a custom database is provided (via File selection or URL), the predefined DB choice is ignored.

-- Max charge: Maximum charge allowed. The default is 2.

-- Fragmentation Model: the default model is 2-1-3, which indicates there will be at most two bridges, one 2-cut, and three cuts in total.

-- Min significant score: the minimum score to report a significant metabolite spectrum match. The default value is 10.

STEP 5: Submit job

Add an email for notification. Then click on submit.

You will get an email with the link to the results when your job is finished. You can also check the status of your job in "Jobs" section.

img

STEP 6: View results

See the molDiscovery job status: The status of the job can be vizualized in your account job list, and through the link in provided in email you will received when the job is done.

The job can be Cloned, for further parameters modifications. Summary Statistics and Workflow Parameters can be consulted on the respective links.

Click on View Unique Metabolites (recommended), to have get the list of annotated molecules.

Click on the View All MSM (Metabolites-Spectrum Match) to have detailed view of the Metabolites-spectrum matches.

Explore the molDiscovery annotations: Annotations are sorted by molDiscovery score (default), mass, compound name, etc. These fields can also be searched with the filter function.

img

Feedback and Bug Reports

We will be thankful if you help us make molDiscovery better by sending your comments, bug reports, and suggestions to npdtools.support@cab.spbu.ru or posting them on our GitHub repository tracker.

Contribute to the Documentation

Page Contributions

liucaomics


Last update: April 27, 2021 06:23:56