GNPS MS/MS Spectral Libraries List¶
Pardon our dust this is a work in progress!
Machine Learning Ready Datasets¶
If you are looking to use the GNPS libraries for machine learning, cleanup is necessary. Given the multiple contributions from the community as well as imports/aggregation from many sources - there are some inconsistencies in the data. However, we've taken the time and effort to clean it up for machine learning, look to download this in the "Preprocessed Data" section here.
Library List¶
Here are all spectral libraries at GNPS.
To download the spectra, go to here.
View | Title | Description |
---|---|---|
View | All Public Spectra at GNPS | This contains all available spectra available publically for search at GNPS. All interactively explore fragmentation with our new visualizations. |
View | GNPS Community Library | The GNPS library contains compounds from user contributions. |
View | FDA Library Pt 1 | Approved drug library from Selleckchem Part 1 run by Sirenas MD. |
View | FDA Library Pt 2 | This set of reference compounds generated by the Dorrestein Lab contains 535 FDA natural product compounds complements part 1. |
View | PhytoChemical Library | 140 compounds from the Prestwick Phytochemical Library generated by the Dorrestein Lab. |
View | NIH Clinical Collection 1 | 327 compounds from the NIH Clinical Collection 1 generated by the Dorrestein Lab. Further information about the collection can be found here. |
View | NIH Clinical Collection 2 | 164 compounds from the NIH Clinical Collection 2 generated by the Dorrestein Lab. Further information about the collection can be found here. |
View | NIH Natural Products Library Round 1 | 1256 compounds from the NIH Natural Products Library generated by the Dorrestein Lab. Further information about the collection can be found here. |
Positive Negative |
NIH Natural Products Library Round 2 | The NIH Natural Products Library Round 2 was generated by the Dorrestein Lab (UC San Diego) from 4,545 compounds from the NIH ???ACONN??? Natural Products Library provided by Ajit Jadhav (NIH/NCATS) . The 2018 release contains fragmentation spectra collected on qTOF MS in positive (3,386) and negative (1,863) ionisation mode. The 2020 release includes additional MS/MS entries for the positive ionisation mode on qTOF (806) and Q-Exactive mass spectrometers (1,315) that were obtained using the Ion Identity Networking workflow. The spectral library was created by Louis-Felix Nothias, Mingxun Wang and Robin Schmid (UC San Diego). |
View | Pharmacologically Active Compounds in the NIH Small Molecule Repository | 1460 compounds from the Pharmacologically Active Compounds in the NIH Small Molecule Repository generated by the Dorrestein Lab. |
View | Faulkner Legacy Library provided by Sirenas MD | 127 compounds from the Faulkner natural product legacy library. |
View | EMBL Metabolomics Core Facility (EMBL MCF) | Standards run by EMBL Metabolomics Core Facility (EMBL MCF) |
Positive Negative |
Pesticides | Pesticide MS/MS spectral library (204 compounds) generated by the Dorrestein Lab and the CNRS for the 3D-Plant2Cells project, funded under the European Commission Horizon2020 program (MSCA project 704786). Standards were run on Q-Exactive and Maxis (q-TOF) tandem mass spectrometer in both positive and negative ionisation mode. |
Positive Negative |
Medicines for Malaria Venture Pathogen Box | 400 compounds analyzed on an Agilent 6538 Accurate Mass QTOF LC/MS in Positive/Negative ionization mode by Sirenas LLC. Funding generously provided by the Bill and Melinda Gates Foundation. |
Positive Negative |
LDB Lichen Database | MS/MS spectra of 241 small molecules isolated from lichens collected in the chemical library of S. Huneck (Berlin Garden and Botanical Museum) provided by Harrie Sipman and Robert L??cking. The spectra were produced by three different LC-MS machines, ie and Agilent 6530 qToF (Pierre Le Pogam, University of Paris-Sud, University of Paris-Saclay, Joel Boustie, University of Rennes 1), a Waters Xevo G2-XS qToF (David Rondeau, Thomas Delhaye, University of Rennes 1) and a Thermo Q-Exactive Orbitrap (Jean-Luc Wolfender, Pierre-Marie Allard, University of Geneva). The LDB currently contains more than a thousand spectra across different machines, including diverse adduct spectra in addition to the [M+/-H]. The most common scaffolds in lichens are represented : depsidones, dibenzofuranes, pulvinic acids, paraconic acids, chromones, quinones, xanthones, monoaromatic compounds, quinones, xanthones and chromones. These spectra are not single energy but the consensus of spectra produced at different energies. For Agilent, these were 10, 25 and 40 eV in ESI-, (and also 2.5, 5.0 and 7.5 eV for the more fragile depsides), and 5, 20 and 35 eV in ESI+ or APCI+ for terpenes. For Waters, the collision energies were 10, 20, 30 and 40 eV in ESI- and ESI+. The Thermo spectra were acquired using Normalized Collision Energy (NCE) based on the measured m/z value and three energies : 15, 30, 45 eV. |
View | GNPS Matches to NIST14 | 5,763 High confidence matches to NIST14 MS/MS library spectra. |
View | GNPS Collections Miscellaneous | Miscellaneous reference compounds run specifically for GNPS-Collections |
View | GNPS Sigma's Mass Spectrometry Metabolite Library (MSMLS) | This library contains 863 MS/MS from molecules in Sigma's Mass Spectrometry Metabolite Library. MSMLS is a collection of high quality small biochemical molecules that span a broad range of primary metabolism. These standards consisted of lipid-like molecules and water-soluble sugar compounds. The chemical standards were run on a Q-Exactive in positive and negative mode. |
View | PSU Sigma's Mass Spectrometry Metabolite Library (MSMLS) | This library contains 576 MS/MS spectra using small molecules from IROA's Mass Spectrometry Metabolite Library of Standards (MSMLS). These standard MS/MS spectra were generated from SCIEX 5600 TripleTOF with positive and negative mode at Metabolomics facility The Pennsylvania State University |
View | GNPS Collections Bile Acid Library 2019 | Reference MS2 spectra for 21 commercial bile acids and 4 synthesized amino acid conjugated bile acids run in positive and negative mode, including spectra of observed adducts. |
View | Dereplicator Identified MS/MS Spectra | MS/MS spectra identified in GNPS Public data automatically by dereplicator tool. Searching various compound databases, including marinlit, etc. and best matching MS/MS spectra with significant p-values. |
Positive Negative | Pacific Northwest National Lab (PNNL) Lipids | MS/MS spectra from 1790 Lipids provided by Thomas Metz's group at the Pacific Northwest National Lab (PNNL), collected in positive/negative ionisation mode and multiple collision energy. Most compounds were annotated using LIQUID tool in real samples. |
View | MIADB Spectral Library | MS/MS spectra from 422 monoterpene indole alkaloids provided by Mehdi Beniddir and coll. (Universite Paris-Saclay). Compounds were collected from the chemical libraries of Universite Paris-Saclay, Universite Paris-Cite, Institut de Chimie des Substances Naturelles, Universite d'Angers, Museum National d'Histoire Naturelle, Universite de Reims Champagne-Ardenne, Universite de Tours (France), Universite de Liege (Belgium), University of Malaya (Malaysia), Institute of Chemical Research of Catalonia (Spain), Leiden University (The Netherlands) and University of Lagos (Nigeria). The first 172 MS/MS spectra (uploaded in 2019) represent an average MS/MS spectrum of three collision energies (30, 50 and 70 eV) and were acquired with an Agilent qTof 6530. The additional 250 MS/MS spectra were obtained at a collision energy of 50 eV using an Agilent qTof 6546. Both datasets were acquired in positive ion mode |
View | HCE Cell Lysate Lipids Spectral Library | The spectra correspond to cell lipids extract of Human Corneal Epithelium cell. This lipids belong to phospholipids and sphingolipids classes. |
Positive Negative |
Nutri-Metabolomics | This library is made with Orbitrap LTQ-XL mass spectrometer. Scan for full scan was of 30,000 while, MSMS were acquired at 7500 resolution. The CID of 35 eV was applied. References for operational conditions of Orbitrap and LC columns are here: DOI 10.1007/s11306-015-0935-z and DOI: 10.1002/mnfr.201901137.This library consists of different analytical standards which were purchased form Sigma Aldrich for Fondazione Edmund Mach (Italy), for Nutri-metabolomics research. Compounds that are included in this library are: small phenolic acids, organic acids, fatty acids, carnitines, indoles, some aroma compounds, endogenous human metabolites. They were purchased for two purposes: i) the confirmation at level I of some metabolites, ii) look for matching substructures between biomarkers in studies and analytical standards. |
View | GNPS Sciex Library | The Sciex dataset contains authentic standards from different compound libraries and single reference standards were used. Specifically, the Agilent LC/MS Pesticide Comprehensive mix, Sigma Aldrich Bile Acid/Carnitine/Sterol Metabolite Library of Standards, Sigma Aldrich Fatty Acid Metabolite Library of Standards and Sigma Aldrich Acid Metabolite Library of Standards were used. Standards were dissolved in suitable solvents and mixed in 43 mixtures in such way to avoid overlap of isomeric and isobaric substances. Standard mixtures were analyzed using a Sciex Exion AD liquid chromatography system coupled to a Sciex X500R QTOF-MS. Separation was achieved on a Phenomenex Kinetex F5 column (150 mm ?? 2.1 mm ID, 2.6 ??m particle size) with a gradient from eluent A (100 % H 2 O + 0.1 % formic acid) to eluent B (100 % acetonitrile + 0.1 % formic acid) using the following gradient: 100/0 at 0 min, 100/0 at 2.1 min, 5/95 at 14 min, 5/95 at 16 min, 100/0 at 16.1 min, and 100/0 at 20 min. Column temperature was set to 30 ??? C and flow rate to 200 ??L/min. Data was acquired by data dependent acquisition of MS/MS spectra using a collision energy ramp from 20 to 50 eV. The MS was automatically recalibrated every five injections in MS1 and MS/MS mode. |
View | Lipids of human conjonctivals cells (IOBA-NHC) | This library contains 200 MS/MS spectra of lipids from human conjonctival cells (IOBA-NHC cell line). The lipids belong to the sphingolipids, glycerophospholipids and glycerolipids subclasses.This library was generated by Orlane Lectez, Romain Magny and Nicolas Auzeil from Universit?? de Paris on a Waters Synapt G2 mass spectrometer operated in positive and negative ion mode in the Olivier Laprevote Lab. |
View | Berkeley Lab MS/MS spectral library | Berkeley Lab MS/MS spectral library of endogenous metabolites v0.1 (24,563 spectra) compiled and analyzed by the Northen Lab and DOE JGI with funding from the United States Department of Energy Office of Biological and Environmental Research under Contract No. DE-AC02-05CH11231 to Lawrence Berkeley National Laboratory. |
View | IQAMDB IsoQuinoline and Annonaceous Metabolites Database | MS/MS spectra of 220 isoquinoline alkaloids and some further annonaceous metabolites provided by Beniddir, Le Pogam and coll. (Universite Paris-Saclay, France). The spectra were generated using an Agilent 6546 qTof (Universite Paris-Saclay). Compounds were collected from the in-house chemical libraries of Universite Paris-Saclay (France), accumulated during decades of phytochemical investigations on annonaceous plants by Pr. M. Leboeuf and co-workers, and also from the Universite of Angers (France), and the Amazonas State University (Manaus, Amazonas, Brazil). The most common isoquinoline scaffolds are comprised in the library: benzylisoquinolines, bisbenzylisoquinolines featuring diverse assembly modes, protoberberines, berberines, proaporphines, aporphines, morphinanes, promorphinanes, benzophenanthridines, protopines, cularines, spirobenzylisoquinolines etc. |
View | Sam Sik Kang Legacy Library | MS/MS spectra of 233 phytochemicals (184 positive and 153 negative) from Prof. Sam Sik Kang (Seoul National University)'s legacy chemical library. The spectral library was created by Sang Hee Shim (Seoul National University) and Kyo Bin Kang (Sookmyung Women's University). |
View | GNPS D2 Amino Lipid Library | This library contains lipoamino acids, this class of biosurfactants was retrieved in an endophytic marine species belonging to the Pantoea genus. This spectral library was acquired and created by Giovanni Andrea Vitale. |
View | GNPS Drugs of Abuse Library | Drugs of Abuse Library |
View | GNPS IIMN Propogated | Spectral library entries extracted from experimental datasets by the IIMN workflow. Ion Identity Molecular Networking connects ions of the same molecule. Note: This library is by default not selected for search in GNPS and must be explicitly selected. |
View | GNPS Suspect List | Suspect list of propogated compounds - See preprint |
View | MassBank Spectral Library (3rd Party) | ESI Positive MS/MS spectra from Massbank. |
View | MassBank EU | MS/MS spectra from Massbank EU that are not included in Massbank JP. |
View | Massbank NA Spectral Library (3rd Party) | MS/MS spectra from Massbank NA that were not present in Massbank JP. |
View | ReSpect Library (3rd Party) | ESI Positive MS/MS spectra from ReSpect for Phytochemicals |
View | HMDB Spectral Library (3rd Party) | MS/MS spectra from HMDB |
View | CASMI Spectral Library (3rd Party) | MS/MS spectra from the CASMI challenges 2014 and 2016. The CASMI is an open contest on the identification of small molecules. |
View | Sumner Spectral Library (3rd Party) | The Bruker MetaboBASE Plant Library contains ~1300 spectra of ~300 plant metabolites, including flavonoids, phenolics, sapogenins and organic acids. This library was generated by Prof. Lloyd W. Sumner and Drs. Dennis Fine and Daniel Wherritt from the Plant Biology Division of the Noble Foundation, Ardmore, OK, USA. |
Positive Negative |
Birmingham UHPLC MS Library | - |
View | ECG acyl amides C4-C24 Library | - |
View | ECG acyl esters C4-C24 Library | - |
View | Bile Acid Modifications Library | This library consists of 21,549 MS/MS spectra of highest intensity from from the clustered spectra of public data on GNPS/MassIVE recovered using MassQL queries designed for non-, mono-, di-, tri-, tetra- and penta-hydroxylated bile acids. Based on delta masses from unmodified bile acids, the library accounts for 5,576 modified bile acids. Molecular formulae for the delta masses are provided as predicted by two in silico tools, BUDDY and SIRIUS, along with putative explanations when available. In addition, this library also contains MS/MS spectra from synthetic standards of 38 amino acids and 28 polyamines conjugated to bile acids. If the end-users are interested in using this spectral library for structure annotation, they are encouraged to carefully inspect the MS/MS spectrum, perhaps in comparison to the MS/MS spectrum of the conjugate portion proposed to be appended onto the bile acids (if available), to match reporter ions between the two spectra. A very careful manual interpretation of the presence/absence and ratio of ions by bile acid MS/MS fragmentation experts, along with retention time inspection, may facilitate distinction between regioisomers or stereoisomers. This spectral resource can provide unprecedented insights into the new biology of bile acids. Note: This library is not selected by default for search in GNPS and must be explicitly selected in the GNPS workflow from CCMS_SpectralLibraries/GNPS_Propogated_Libraries/GNPS-BILE-ACID-MODIFICATIONS |
View | LEAFBOT | The Library Enabling Annotation of Botanical Natural Products (LEAF Bot) contains MS/MS spectra of plant metabolites comprising a diverse range of chemical classes. Data were collected on high-resolution instruments in both positive and negative modes. Funding was provided in part by the National Institutes of Health through the Center for High Content Functional Annotation of Natural Products (HiFAN). We welcome additions to this library of MS/MS spectra for additional plant metabolites. Those wishing to contribute should contact Nadja Cech (nadja_cech@uncg.edu; University of North Carolina at Greensboro) or Joshua Kellogg (jjk6146@psu.edu; Pennsylvania State University). |
View | XANTHONES-DB | - |
View | CMMC-Library | This library contains MS/MS spectra of standards that are synthesized as part of the Collaborative Microbial Metabolite Center (CMMC). |
View | NEO-MSMS | The NEO-MSMS, described on Elloumi et al. (https://doi.org/10.26434/chemrxiv-2023-561x5-v2), comprises three distinct libraries. The first library (i.e. NEO-MSMS) includes standards of oxylipins and non-enzymatic oxygenated metabolites of polyunsaturated fatty acids (NEO-PUFAs). The second library (i.e. NEO-MSMS_putative_lib) incorporates putative oxylipin and NEO-PUFA MSMS spectra derreplicated from oxidation experiments. Lastly, the collection features a version of the Watrous et al. oxylipin MSMS standards library (https://doi.org/10.1016/j.chembiol.2018.11.015) (i.e. NEO-MSMS_Watrous_lib). |
View | PHENOLICSDB | Our database includes flavonols, flavonoids, anthocyanins, dihydrochalcones, phenolic acids, organic acids, and sugars. Fragmentation patterns for 73 authentic standards were collected using electrospray ionization in positive and negative polarity, and at multiple collision energies (20, 40, 60, and 80 eV CE) using a LC-QTOF. The resulting 320 MS/MS spectra are presented in PhenolicsDB , publicly available in GNPS and for download in .mgf and .nist format and use also at https://cooperstonelab.github.io/PhenolicsDB/. This library was created by Cristian Quiroz-Moreno (quirozmoreno.1@osu.edu) and Jessica Cooperstone (cooperstone.1@osu.edu). |
Positive Negative |
MSNLIB | This spectral library contains 4 different compound libraries: MCEBIO: bioactive compound library (MedChemExpress), MCESAF: 5k scaffold library (MedChemExpress), NIHNP: NIH NPAC ACONN collection of NP (NIH/NCATS), OVATAPEP: alpha-helix peptidomimetic library (OTAVAchemicals), resulting in 16,391 unique compounds (14,008 in positive, 10,091 in negative). |
View | GNPS N-ACYL LIPIDS MASSQL Library | This library consists of 3,063 MS/MS spectra of highest intensity from from the clustered spectra of public data on GNPS/MassIVE recovered using MassQL queries designed for N-acyl lipids. This library contains MS/MS spectra of 851 unique N-acyl lipids (46 different headgroups and fatty acid chain length ranging from 2 to 30 carbons and up to 4 unsaturations). Note: This library is not selected by default for search in GNPS and must be explicitly selected in the GNPS workflow from CCMS_SpectralLibraries/GNPS_Propagated_Libraries/GNPS-N-ACYL-LIPIDS-MASSQL |
View | ELIXDB | This library contains the MS/MS spectra of 534 lichen-derived compounds from the library of Jack Elix, housed at the Australian National Herbarium (Canberra, Australia). This included 399 unique metabolites that are not in the LDB Lichen Database. |
View | GNPS-DRUG-ANALOG | Drug analogs propagated from GNPS Drug Library. Drug analogs were compiled by mining the GNPS/MassIVE public repository based on MS/MS spectra alignment using repository-scale molecular networking and fastMASST with analog search, followed by rigorous post-filtering. Contact haz072@health.ucsd.edu for metadata. |
View | MCE-DRUG | Drugs in clinical phase 1-4 from the MedChemExpress Bioactive Compound Library. Samples were analyzed by flow injection method to acquire MSn data on an Orbitrap ID-X instrument in positive ionization mode. Pseudo MS2 spectra were generated by merging a full MSn tree into one spectrum per compound ion. |
View | CMMC Food Biomarkers Library | - |
View | Emerging Chemical Risks in Food Safety (ECRFS) Spectral library | This dataset contains MS2 spectral data for 102 compounds identified as emerging chemical risks in the food chain by the European Food Safety Authority (EFSA) or classified as persistent, mobile, and toxic (PMT) compounds. The spectral data was collected at Wageningen Food Safety Research, part of Wageningen University and Research (The Netherlands), using an Orbitrap IQX instrument in positive ionization mode with a collision energy of 30. The pure standards were acquired as part of the project “Screening for emerging chemical risks in the food chain (SCREENER) |
Last update:
October 23, 2024 17:52:50