Skip to content

Commit

Permalink
More documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
RalfG committed Dec 8, 2023
1 parent 4818da9 commit ba5f0a0
Show file tree
Hide file tree
Showing 13 changed files with 246 additions and 57 deletions.
98 changes: 62 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,12 @@ libraries, other formats that aim to describe fragment ions, and software tools
- Official mzPAF homepage: [psidev.info/mzPAF](https://psidev.info/mzPAF)
- mzPAF documentation: [mzpaf.readthedocs.io](https://mzpaf.readthedocs.io)


## Specification status

Updated: 2023-09-01

The specification has been resubmitted to the PSI Document Process and is undergoing final community review. Ratification to formally become a PSI standard is anticipated near the end of 2023.

Your comments and suggestions are still very much welcome. Please submit an issue at the repo to
provide your feedback and send an e-mail to the HUPO-PSI editor Sylvie Ricard-Blum
([email protected]).


## In short

- mzPAF is a single string of characters, case sensitive, without length limit
- Multiple possible explanations are separated with a comma
- Deltas of observed – theoretical *m/z* values are prefixed with a slash (`/`)
- Confidences can be provided for different annotations prefixed with an asterisk (`*`)
- Multiple possible explanations are comma-separated
- Deltas of observed – theoretical _m/z_ values are prefixed with a slash (`/`)
- Confidence of annotations are prefixed with an asterisk (`*`)

The basic format of each annotation is:

Expand Down Expand Up @@ -55,32 +43,70 @@ b2-H2O/3.2ppm*0.75,b4-H2O^2/3.2ppm*0.25
mzPAF supports:

- Annotations of multiple analytes: `1@y12/0.13,2@b9-NH3/0.23`
- Mass deltas in ppm instead of *m/z* unit: `y1/-1.4ppm`
- Mass deltas in ppm instead of _m/z_ unit: `y1/-1.4ppm`
- Confidence levels per annotation: `y1/-1.4ppm*0.75`
- Advanced ion notation: `[ion type](neutral loss)(isotope)(adduct type)(charge)`, e.g.: `y4-H2O+2i[M+H+Na]^2`:
- Ion types:
- Peptide ion series (a, b, c, x, y, z): `y4`
- Unknown ions: `?`
- Immonium ions: `IY`
- Internal fragment ions: `m3:6`
- Intact precursor ions: `p^2`
- A set of reference ions: `r[TMT127N]`
- Named compounds: `_{Urocanic Acid}`
- Chemical formulas: `f{C16H22O}`
- Smiles: `s{CN=C=O}[M+H]`
- Embedded ProForma annotations: `0@b2{LC[Carbamidomethyl]}`
- Neutral gains and losses: `y2+CO-H2O`
- Isotopes: `y2+2i`
- Adduct types: `y2[M+H]`
- Charge states: `^2`
- Ion types:
- Peptide ion series (a, b, c, x, y, z): `y4`
- Unknown ions: `?`
- Immonium ions: `IY`
- Internal fragment ions: `m3:6`
- Intact precursor ions: `p^2`
- A set of reference ions: `r[TMT127N]`
- Named compounds: `_{Urocanic Acid}`
- Chemical formulas: `f{C16H22O}`
- Smiles: `s{CN=C=O}[M+H]`
- Embedded ProForma annotations: `0@b2{LC[Carbamidomethyl]}`
- Neutral gains and losses: `y2+CO-H2O`
- Isotopes: `y2+2i`
- Adduct types: `y2[M+H]`
- Charge states: `^2`
- Multiple peaks per annotation: `&y7/-0.001` and `y7/0.000*0.95`

Read the [full specificiation](https://mzpaf.readthedocs.io/specification) for more details and
examples.
Read the
[full DRAFT specificiation](https://github.com/HUPO-PSI/mzPAF/blob/main/specification/mzPAF_specification_v1.0-draft14.docx?raw=true)
for more details and examples.

## Getting started

### mzPAF in Python

The [mzPAF Python package](https://mzpaf.readthedocs.io/en/latest/implementations/python/) can
parse mzPAF strings into their components, convert to the JSON representation, or serialize back
to an mzPAF string.

```python
>>> import mzpaf
>>> annotations = mzpaf.parse_annotation("b2-H2O/3.2ppm*0.75,b4-H2O^2/3.2ppm*0.25")
>>> print(annotations[0].to_json())
{'neutral_losses': ['-H2O'], 'isotope': 0, 'adducts': [], 'charge': 1, 'analyte_reference': None, 'mass_error': {'value': 3.2, 'unit': 'ppm'}, 'confidence': 0.75, 'molecule_description': {'series_label': 'peptide', 'series': 'b', 'position': 2, 'sequence': None}}
>>> print(anno[0].serialize())
'b2-H2O/3.2ppm*0.75'
```

Learn more at the
[package documentation](https://mzpaf.readthedocs.io/en/latest/implementations/python/).

### mzPAF regular expressions

[todo]

### mzPAF Lark grammar

[todo]

## Specification status

Updated: 2023-09-01

The specification has been resubmitted to the PSI Document Process and is undergoing final
community review. Ratification to formally become a PSI standard is anticipated near the end of 2023.

Your comments and suggestions are still very much welcome. Please submit an issue at the repo to
provide your feedback and send an e-mail to the HUPO-PSI editor [Sylvie Ricard-Blum](mailto:[email protected]).

### Links

### Available Materials
- The current DRAFT specification: https://github.com/HUPO-PSI/mzPAF/blob/main/specification/mzPAF_specification_v1.0-draft14.docx?raw=true
- The GitHub repo associated with mzPAF: https://github.com/HUPO-PSI/mzPAF
- The GitHub repo associated with the related mzSpecLib standard: https://github.com/HUPO-PSI/mzSpecLib

- HUPO-PSI homepage: https://www.psidev.info/
1 change: 1 addition & 0 deletions docs/_static/img/lark-railroad-diagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@

# Scripts
import json
import shutil
from pathlib import Path

import jsonschema2md
import pandas as pd


def get_jsonschema_docs(input_json, output_markdown):
"""Generate markdown documentation from a JSON schema."""
Expand All @@ -13,10 +18,34 @@ def get_jsonschema_docs(input_json, output_markdown):
with open(output_markdown, "w", encoding="utf-8") as f_out:
f_out.writelines(output_md)


def get_reference_molecules_md(input_json, output_markdown):
"""Generate a markdown table of reference molecules."""
df = pd.read_json(input_json).T
buf = df.to_markdown().replace(' nan ', ' ')
with open(output_markdown, 'wt') as fh:
fh.write(buf)


get_jsonschema_docs(
"../specification/annotation-schema.json",
"../specification/annotation-schema.md"
)
get_jsonschema_docs(
"../specification/reference_data/reference_molecule_schema.json",
"../specification/reference_data/reference_molecule_schema.md"
)

get_reference_molecules_md(
"../specification/reference_data/reference_molecules.json",
"../specification/reference_data/reference_molecules.md"
)

if not Path("_static/img/lark-railroad-diagram.svg").exists():
shutil.copy(
"../specification/grammars/schema_images/Annotation.svg",
"_static/img/lark-railroad-diagram.svg"
)


# Project information
Expand Down Expand Up @@ -65,6 +94,7 @@ def get_jsonschema_docs(input_json, output_markdown):
"python": ("https://docs.python.org/3", None),
"psims": ("https://mobiusklein.github.io/psims/docs/build/html/", None),
"pyteomics": ("https://pyteomics.readthedocs.io/en/stable/", None),
"mzspeclib": ("https://mzspeclib.readthedocs.io/en/latest/", None),
}


Expand Down
17 changes: 17 additions & 0 deletions docs/implementations/lark/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
############
Lark grammar
############


About
=====

[todo]


Railroad diagram
================

.. figure:: ../../_static/img/lark-railroad-diagram.svg
:alt: Lark grammar

12 changes: 12 additions & 0 deletions docs/implementations/python/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@
Python API
**********

.. manually documented as parse_annotation is undocumented (returned by the AnnotationStringParser
class)
.. function:: mzpaf.parse_annotation(annotation_string: str)

Parses an mzPAF string into a list of ion annotations.

:param annotation_string: mzPAF string with peak annotations.
:type annotation_string: str
:returns: A list of annotations.
:rtype: list[mzpaf.IonAnnotationBase]

.. automodule:: mzpaf
:members:
:imported-members:
25 changes: 25 additions & 0 deletions docs/implementations/regex/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
###################
Regular expressions
###################

mzPAF has been defined in several regular expression dialects.

.. tip::

Regex101.com is a great tool to test regular expressions. Try out the mzPAF regex there:
`regex101.com/r/gDPlJu/1 <https://regex101.com/r/gDPlJu/1>`_.

Python
======

.. literalinclude:: ../../../specification/grammars/regex_sre.py
:language: python
:linenos:


Javascript ECMA
===============

.. literalinclude:: ../../../specification/grammars/regex_ecma.js
:language: javascript
:linenos:
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@
:glob:

Home <self>
specification/index
implementations/index
contributing
Specification <specification/index>
Implementations <implementations/index>
Contributing <contributing>
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ sphinx_click
myst-parser
sphinx-autobuild
jsonschema2md
pandas
20 changes: 16 additions & 4 deletions docs/specification/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
####################
Format specification
####################
######################
Specification document
######################

.. toctree::
:hidden:
:glob:

Specification document <self>
./*

..
TODO: Add when released
The latest draft of the specification can be found on
`GitHub <https://github.com/HUPO-PSI/mzPAF/blob/main/specification/mzPAF_specification_v1.0-draft14.docx?raw=true>`_.

[TODO: Add when released]
35 changes: 35 additions & 0 deletions docs/specification/reference-molecules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
###################
Reference molecules
###################

About
=====

.. include:: ../../specification/reference_data/README.md
:parser: myst_parser.sphinx_
:start-line: 2
:end-line: -1
..
skip including title and last line with reference to this page
See :ref:`Reference molecule ions` in the specification document for more information.


Reference molecule table
========================

The following analytes can be annotated as reference molecules with the ``r`` prefix and the
listed name between square brackets (e.g. ``r[TMT127N]``).

.. include:: ../../specification/reference_data/reference_molecules.md
:parser: myst_parser.sphinx_


JSON schema
===========

The ``reference_molecules.json`` file is defined by the following schema:

.. include:: ../../specification/reference_data/reference_molecule_schema.md
:parser: myst_parser.sphinx_
:start-line: 3
17 changes: 10 additions & 7 deletions specification/reference_data/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# mzPAF specification reference data files

The mzPAF specification uses these files as auxiliary reference data so that enumerated values can be extended without altering the specification document.
The mzPAF specification uses `specification/reference_data/reference_molecules.json` as auxiliary
reference data. In this way, the set of reference molecules can be extended without updating the
specification document itself.

- reference_molecules.json - Easily software parsable list of "reference molecules" often seen in peptide fragmentation spectra, but
not normal peptide fragments, including isobaric labeling reagent related molecules, monosaccharides, nucleotides, etc. These
molecules may be inidividual charged ions (typically protonated), or may be used as neutral losses as appropriate.
The following files are available:

- reference_molecules.md - Human-readable markdown tabular version of reference_molecules.json
- `reference_molecules.json`: Software parsable list of "reference molecules" often seen in
peptide fragmentation spectra, but not normal peptide fragments. This includes isobaric labeling
reagent related molecules, monosaccharides, nucleotides, etc. These molecules may be individual
charged ions (typically protonated), or may be used as neutral losses as appropriate.

- reference_molecule_schema.json - JSON schema for reference_molecules.json
- `reference_molecule_schema.json`: JSON schema defining the structure of the JSON file

- reference_mol_to_md.py - Python script to transform reference_molecules.json into a markdown table
A human-readable table with all reference molecules is available on https://mzpaf.readthedocs.io.
7 changes: 0 additions & 7 deletions specification/reference_data/reference_mol_to_md.py

This file was deleted.

34 changes: 34 additions & 0 deletions specification/reference_data/reference_molecule_schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# HUPO-PSI mzSpecLib reference molecule and ion list

*Describe reference molecules or ions found in spectral libraries*

## Pattern Properties

- **`.{1,}`**: Refer to *[#/definitions/molecule](#definitions/molecule)*.
## Definitions

- <a id="definitions/molecule"></a>**`molecule`** *(object)*: A single molecule that may be present as a reporter ion or signature ion, or be a component of a neutral loss.
- **`name`** *(string)*: The formal name for this molecule by which it should be referenced.
- **`cv_term`** *(array)*
- **Items** *(string)*
- **`neutral_mass`** *(number)*: The neutral mass of the molecule not including any charge or charge carrier.
- **`molecule_type`** *(string)*: A categorical label for this molecule.

Examples:
```json
"monosaccharide"
```

```json
"reporter"
```

```json
"reporter+balance"
```

- **`ion_mz`** *(number)*: The m/z of the molecule if it is expected to be reasonably different from the uncharged version.
- **`chemical_formula`** *(string)*: The elemental formula of the neutral molecule.
- **`ion_chemical_formula`** *(string)*: The chemical formula of the charged molecule.
- **`references`** *(array)*: An array of sources and references describing this entity.
- **Items** *(string)*

0 comments on commit ba5f0a0

Please sign in to comment.