Dataset Creation/Sharing
MassIVE Datasets¶
MassIVE is an online repository for publicly available datasets. MassIVE provides a location for researchers to access datasets that have been made available by others, oftentimes alongside publication. The datasets available remain alive long after publication. At GNPS, users will be able to
- Browse datasets - Explore all public GNPS datasets
- Download datasets - Download full dataset for offline processing/reanalysis
- Re-Analyze datasets - Online reanalysis with GNPS tools. Compare public data to your own
- Comment on datasets - Make public comments on public datasets to start a community discussion
- Subscribe to datasets - Subscribe to updates to datasets, e.g. Updated data from submitter, new identifications/analogs by Continuous Identification
To enhance the analysis, datasets submitted to MassIVE through GNPS will be periodically searched against the ever growing annotated spectral libraries and new putative identifications within those datasets. Beyond new identifications within a dataset, subscribers will also be made aware of other datasets that exhibit chemical similarities to the subscribed dataset. This allows for users to be connected via their research interest to similar datasets.
Submitting GNPS-MassIVE Datasets¶
Here is a full video tutorial, showing how to convert, and upload raw LC-MS/MS data, and submit a MassIVE data set, and finally how to make the data set public.
Below is a step-to-step description how to upload data into MassIVE specifically for GNPS.
For more detailed information about general MassIVE dataset submission see here.
At the GNPS splash screen, users can click this button
to create a MassIVE dataset.
Login with your GNPS login and hit Submit Data.
GNPS Submission specifics¶
The title is the display that users will use primarily to filter datasets. For GNPS datasets, a specific title format is required.
GNPS - <Title of Paper or Short Description>
The GNPS prefix is required for GNPS datasets. If this is not provided GNPS-MassIVE datasets are not shown to GNPS users.
Example Recommended Description¶
Paper title: Molecular networking as a dereplication strategy.
Author List: Yang JY, Sanchez LM, Rath CM, Liu X, Boudreau PD, Bruns N, Glukhov E, Wodtke A, de Felicio R, Fenner A, Wong WR, Linington RG, Zhang L, Debonsi HM, Gerwick WH, Dorrestein PC.
Citation: J Nat Prod. 2013 Sep 27;76(9):1686-99. doi: 10.1021/np400413s
PubMedID: 24025162
Brief description of the data submitted: RAW Files used to generate Figure 4. Bacterial network with a cosine similarity score cutoff of 0.65. This network was generated from direct infusion of extracts or direct
Recommendations For Data Collection¶
There are several data collections in MassIVE dataset creation. These were originally created to serve proteomics data, and thus we offer the following recommendations for metabolomics/natural products datasets.
Collection | Recommended Data | Extension Types |
---|---|---|
Raw Spectrum Files | Instrument vendor data | .d, .raw, .wiff, etc. |
Peak List Files | Open format raw data | .mzML, .mzXML |
Metadata | gnps metadata describing your study | .tsv |
Supplementary Files | This is other information, that might include cytoscape files, feature finding batch files, feature finding quant, feature finding ms2 mgf files, or basically any supplementary files that are relevant to your data | .tsv, .mgf, .xml, etc. |
Info
All other collections are optional and we recommend to generally not provide them unless you specifically desire to and understand their semantics. This can be found here.
Making Dataset Public¶
After submitting your dataset to MassIVE, you must explicity make the dataset public, by clicking the make public button:
Continuous Identification¶
A unique feature at GNPS is the continuous and automated reanalysis of public datasets. GNPS analyzes these datasets with molecular networking and library to make new identifications as the public community spectral libraries increase due to community contributions.
Users may subscribe to datasets to receive email notifications of new identifications made to datasets of interest.
Browsing Datasets¶
To browse all public datasets, click the "Datasets" link at the top menu bar. This brings users to a list of all public GNPS-MassIVE datasets.
Downloading Dataset Contents¶
To download the contents of a dataset, you can reach the FTP url of each dataset by click the FTP link.
Reanalyze Datasets¶
Dataset data can be re-analyzed with the tools at GNPS. In order to import the dataset into your own workspace to select files, please refer to this documentation.
Social Networking with Datasets¶
Dataset Comments¶
Users may make a comment on a dataset by clicking the "Comment on Dataset Link":
To browse all comments per dataset, a table is shown
Finding Related Datasets¶
Users can find related MassIVE datasets to the current one. Currently relatedness of datasets is determine by the number of shared identified compounds between the two. Users can see a view like this:
Page Contributions¶