Data source: RaMP #69

newgene · 2022-06-01T05:40:31Z

name: RaMP (Relational database of Metabolomic Pathways)
url: https://rampdb.nih.gov/
download:
https://rampdb.nih.gov/about
https://figshare.com/articles/dataset/RaMP_Database_MySQL_Dump_v2_0_7_20220428/19674540
license: GPL v2

andrewsu · 2022-06-24T04:33:27Z

From https://rampdb.nih.gov/about, I'm wondering if we already get many/most of these resources from the primary source?

andrewsu · 2023-02-22T16:54:32Z

@colleenXu pointed out that RaMP has an API. Given Translator interest in this data source, let's look at generating a SmartAPI annotation...

colleenXu · 2023-02-22T18:48:50Z

With earlier post in that issue biothings/biothings_explorer#372 (comment)

colleenXu · 2023-03-17T08:04:31Z

Info

RaMP's github repo has two SmartAPI yamls; it's not clear which one we'd want to edit.

this one appears to be used by https://rampdb.nih.gov/api (since both have one server url, RaMP as info.contact.name)
this is a slightly older version based on the commit history

Also there's an older (alpha-version??) version of RaMP that was registered: http://smart-api.info/registry?q=3bfd9cecbcf799f800539ce24df1d754. Perhaps that registration needs removing / adjusting?

Other endpoints

There are also endpoints that look interesting, but we can't annotate them because they don't use IDs as inputs or as outputs

analytes-from-pathways: takes pathway names as input, not IDs. Output: Gene or SmallMolecule (chemical/metabolite). Should retrieve same info/opposite-direction compared to pathways-from-analytes.
ontologies-from-metabolites / metabolites-from-ontologies: interesting, but the ontologies are human-readable labels with no IDs
- health condition
- found in what subcellular location
- found in what tissue/substructure, organ/component
- perhaps more a node attribute (wouldn't annotate for associations)
  - found in what biofluid/excreta
  - found in what source in the world
  - used in what industrial application

colleenXu · 2023-03-17T08:23:41Z

Issues writing x-bte annotation

There are endpoints that meet our criteria (relationships between entities, entities have IDs), but we encounter issues parsing their responses.

I think these issues can be addressed with post-query processing, perhaps with the api-response-transform module (custom handler) or JQ (which hasn't been incorporated into BTE yet... )

`pathways-from-analytes`

ISSUE 1: the output ID field data.pathwayId value can be a WIKIPATHWAYS, REACT, or KEGG.PATHWAY ID (I'm not sure if it can be others). BTE then has trouble correctly processing this output, similar to CTD processing 3: handling output IDs when multiple ID prefixes are possible biothings_explorer#585
- note: these IDs don't have prefixes, but the value of the data.pathwaySource seems to correspond to the ID-namespace for each record (values seem to be wiki, reactome, or kegg). Perhaps JQ list-filter could help in this particular case
ISSUE 2: would have the batch-querying processing issue similar to CTD processing 2: batch-queries biothings_explorer#584. The matching input is provided in the data.inputID field (using RaMP's format for prefix-spelling / capitalization)

Example of API response with 3 different output ID-namespaces

    {
      "pathwayName": "7q11.23 copy number variation syndrome",
      "pathwaySource": "wiki",
      "pathwayId": "WP4932",
      "inputId": "hmdb:HMDB0000148",
      "commonName": "Glutamate; L-Glutamic acid"
    },
    {
      "pathwayName": "Activation of AMPA receptors",
      "pathwaySource": "reactome",
      "pathwayId": "R-HSA-399710",
      "inputId": "hmdb:HMDB0000148",
      "commonName": "Glutamate; L-Glutamic acid"
    },

    {
      "pathwayName": "Glycine, serine and threonine metabolism",
      "pathwaySource": "kegg",
      "pathwayId": "map00260",
      "inputId": "hmdb:HMDB0000148",
      "commonName": "Glutamate; L-Glutamic acid"
    },

Example of API response starting with 2 different input IDs

    {
      "pathwayName": "Glycine, serine and threonine metabolism",
      "pathwaySource": "kegg",
      "pathwayId": "map00260",
      "inputId": "hmdb:HMDB0000064",
      "commonName": "Creatine"
    },
    {
      "pathwayName": "Glycine, serine and threonine metabolism",
      "pathwaySource": "kegg",
      "pathwayId": "map00260",
      "inputId": "hmdb:HMDB0000148",
      "commonName": "Glutamate; L-Glutamic acid"
    },

`common-reaction-analytes`

The endpoint seems to provide gene -> chem (gene2met) and chem -> gene (met2gene) that are involved in the same reaction. We'd want to confirm this (note that what reaction/pathway they're both in...isn't provided)

ISSUE 3: the output ID field data.rxn_partner_ids value is a ;-delimited string of the entity's IDs in multiple ID-namespaces, all using RaMP's ID-prefix spellings
- would like to separate these IDs by namespace. May involve custom processing with JQ or code
also would have ISSUE 2 (batch-querying). The matching input is provided in the data.input_analyte field (using RaMP's format for prefix-spelling / capitalization)

Example output from two chemical input IDs

    {
      "query_relation": "met2gene",
      "input_analyte": "hmdb:HMDB0000148",
      "input_common_names": "Glutamate; L-Glutamic acid",
      "rxn_partner_common_name": "PPAT",
      "rxn_partner_ids": "ensembl:ENSG00000128059; entrez:5471; gene_symbol:PPAT; hmdb:HMDBP00331; uniprot:A8K4H7; uniprot:D6RCC8; uniprot:D6RE15; uniprot:Q06203"
    },


    {
      "query_relation": "met2gene",
      "input_analyte": "hmdb:HMDB0000064",
      "input_common_names": "Creatine",
      "rxn_partner_common_name": "CKMT2",
      "rxn_partner_ids": "ensembl:ENSG00000131730; entrez:1160; gene_symbol:CKMT2; hmdb:HMDBP00719; uniprot:A0A024RAK5; uniprot:D6R998; uniprot:D6RHV3; uniprot:P17540"
    },

colleenXu · 2023-03-18T06:04:33Z

I have the work I've done so far in this fork: https://github.com/colleenXu/RaMP-Client/blob/x-bte-annotation/libs/features/ramp/ramp-api/src/assets/data/ramp_openapi_with_extensions.yml

I annotated the pathways-from-analytes endpoint for HMDB SmallMolecule -> REACT Pathway and NCBIGene Gene -> REACT Pathway. I tested in with prod/test code and main branch code for BTE and it "works", BUT...

it has ISSUE 1 described in the previous post (so the responses have nodes with incorrectly-formatted IDs because the IDs are actually KEGG.PATHWAY or WIKIPATHWAYS and not REACT). UPDATE: See examples in the collapsed sections of the next post
UPDATE 3-21: I've figured out how to direct BTE to write batch-queries to this API. However, I encounter ISSUE 2 described in the previous post. I explain it in more detail in the collapsed section below

walking through the batch-query processing issue

In a local copy of the yaml, set supportBatch: true for the chemical2pathway_1 operation.
Then set up your local BTE instance to use this yaml.
Query just this API with this TRAPI query that has two chemical IDs:
- HMDB:HMDB0000064 creatine
- HMDB:HMDB0000148 glutamate

BTE query

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["HMDB:HMDB0000064", "HMDB:HMDB0000148"],
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories": ["biolink:Pathway"]
                }
            }
        }
    }
}

BTE correctly sets up the sub-query; these are the console logs I see

  bte:call-apis:query using template builder +0ms
  bte:call-apis:query {
  bte:call-apis:query   url: 'https://rampdb.nih.gov/api/pathways-from-analytes',
  bte:call-apis:query   params: {},
  bte:call-apis:query   data: { analytes: [ 'hmdb:HMDB0000064', 'hmdb:HMDB0000148' ] },
  bte:call-apis:query   method: 'post',
  bte:call-apis:query   timeout: 50000,
  bte:call-apis:query   headers: { 'User-Agent': 'BTE/dev Node/v16.18.0 darwin' }
  bte:call-apis:query } +7ms

That sub-query will return pathways linked to both IDs. When I query RaMP directly for each ID, I can see that there are 12 pathways linked to Creatine (hmdb:HMDB0000064) and 231 pathways linked to glutamate (hmdb:HMDB0000064).

Example of objects in the response, one linked to creatine and the other to glutamate

{
    "data": [

    {
      "pathwayName": "Glycine, serine and threonine metabolism",
      "pathwaySource": "kegg",
      "pathwayId": "map00260",
      "inputId": "hmdb:HMDB0000064",
      "commonName": "Creatine"
    },
    {
      "pathwayName": "Glycine, serine and threonine metabolism",
      "pathwaySource": "kegg",
      "pathwayId": "map00260",
      "inputId": "hmdb:HMDB0000148",
      "commonName": "Glutamate; L-Glutamic acid"
    },

But BTE's response has 234 edges (not 243 = 12 to creatine + 231 to glutamate) and all edges say their input ID is creatine (PUBCHEM.COMPOUND:586 / HMDB:HMDB0000064)...which isn't right.

I think there are fewer edges than expected because some pathways were linked to both chemicals, but after the records for glutamate were incorrectly assigned to creatine, those records were merged (notice how the map00260 from the raw response above shows up in the console logs below as having two records bound to that result).

portion of the console logs:

  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:586_&_n1-REACT:WP1495 has 2 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:586_&_n1-REACT:map00260 has 2 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:586_&_n1-REACT:R-HSA-388396 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:586_&_n1-REACT:R-HSA-500792 has 1 +0ms

colleenXu · 2023-03-21T19:53:59Z

My fork's yaml has been registered https://smart-api.info/registry?q=ac9c2ad11c5c442a1a1271223468ced1, so RaMP is accessible through BTE using an api-specific endpoint.

For now, sending POST-queries to the dev/ci instances of BTE is preferred (for the node label support). To query specifically RaMP through dev-BTE, POST to this url: https://api.bte.ncats.io/v1/smartapi/ac9c2ad11c5c442a1a1271223468ced1/query

Example query for Chemical -> Pathway

In the request-body:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["HMDB:HMDB0000148", "HMDB:HMDB0000064"],
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories": ["biolink:Pathway"]
                }
            }
        }
    }
}

The response will have 242 results. Some nodes will have some incorrect curies (will have the REACT prefix but the ID is actually KEGG.PATHWAY or WIKIPATHWAYS)

Correct prefix (this is a REACT ID)

Incorrect prefix (this is actually a WIKIPATHWAYS ID but has the wrong prefix)

Example query for Gene -> Pathway

In the request-body:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:5241", "NCBIGene:4193"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories": ["biolink:Pathway"]
                }
            }
        }
    }
}

The response will have 114 results. Some nodes will have some incorrect curies (will have the REACT prefix but the ID is actually KEGG.PATHWAY or WIKIPATHWAYS) like REACT:WP4262 (actually WIKIPATHWAYS

Notes:

incorrect curies is related to ISSUE 1 above
edges are missing a primary_knowledge_source. This will be fixed when RaMP is added to the API_LIST with the primarySource tag (like below)

        {
            id: 'ac9c2ad11c5c442a1a1271223468ced1',
            name: 'RaMP API v1.0.1',
            primarySource: true
        },

colleenXu · 2023-03-22T04:39:47Z

Note that I've updated this post because I figured out how to get BTE to generate batch-queries and I was able to test how BTE processed the responses

(the yaml was updated colleenXu/RaMP-Client@456c022)

andrewsu · 2023-06-21T15:57:43Z

More info below from the RaMP developers:

The updated analytes-from-pathways endpoint is now in our RaMP production API. Below is information on using the analytes-from-pathways endpoint.

Here’s the url for the endpoint.

https://rampdb.nih.gov/api/analytes-from-pathways

It’s a post. Here’s a sample post body:
{
  "pathway": [
    "WP1601", "WP4846"   
  ],
  "analyte_type": "both",
  "names_or_ids": "ids",
  "match": "exact",
  "max_pathway_size": 500
}
The pathway argument can be an array of pathway ids, for Wikipathways, or Reactome pathways.

We don’t license KEGG. We do have some KEGG ‘maps’ (map ids), but it’s not comprehensive.

The analyte_type can be set to ‘metabolite’, ‘gene’ or both. The geneOrCompound field in the return json will be either ‘gene’ or ‘compound’ (compound is the value on metabolites).

The ‘names_or_ids’ parameter specifies if you are searching by IDs or by pathway names.

The ‘match’ paramether is set to ‘exact’ internally if the search is working on an id list. Otherwise, the ‘match’ parameter can be set to ‘exact’ for and exact pathway name match or ‘fuzzy’.

Here fuzzy really just indicates that you can have a partial match on the names. That’s so that people might look for pathways related to TCA Cycle and just want to search on TCA.

For instance, if you wanted to get all pathways related to covid, and you wanted all genes and metabolites you could use this query body:
{
  "pathway": [
    "covid"   
  ],
  "analyte_type": "both",
  "names_or_ids": "names",
  "match": "fuzzy",
  "max_pathway_size": 500
}
The list of returned entities would be structure like this example:
        {
            "analyteName": "ACE2",
            "sourceAnalyteIDs": "ensembl:ENSG00000130234; entrez:59272; gene_symbol:ACE2; hmdb:HMDBP08177; hmdb:HMDBP13364; hmdb:HMDBP13365; uniprot:A0A7I2V2E9; uniprot:A0A7I2V3N4; uniprot:A0A7I2V3X6; uniprot:A0A7I2V4H0; uniprot:A0A7I2V5W5; uniprot:Q56NL1; uniprot:Q5EGZ1; uniprot:Q9BYF1",
            "geneOrCompound": "gene",
            "pathwayName": "COVID-19 adverse outcome pathway",
            "pathwayId": "WP4891",
            "pathwayCategory": "",
            "pathwayType": "wiki"
        }
The input parameter of max_pathway_size limits the pathway size (number of genes + metabolites) to be returned.

For instance, some pathways have an all-encompassing pathway called ‘Metabolism’ which really isn’t informative and contains a few thousand analytes.

The default if no limit is set is that pathways with up to 1000 analytes will be return.

API swagger documentation is here:

https://rampdb.nih.gov/api

*Note that this endpoint’s documentation has to be updated on this api swagger page to add the names_or_ids field, match, and max_pathway_size parameter descriptions.

The query bodies shown above will work on the swagger page, but we don’t have these new parameters described there.

colleenXu · 2023-08-18T19:55:46Z

Closing in favor of biothings/biothings_explorer#705, because (1) we seem to have decided to NOT make a pending BioThings API from this data and (2) it's easier to track this effort using the BioThings Explorer repo's tags.

However, the info in this issue are the basis of that issue.

newgene added the data source Data source pending to create a new API label Jun 1, 2022

andrewsu assigned colleenXu Feb 22, 2023

colleenXu mentioned this issue Mar 22, 2023

support JQ for API response transformation biothings/biothings_explorer#489

Closed

colleenXu mentioned this issue Aug 18, 2023

RaMP: process of x-bte annotation + adding to BTE biothings/biothings_explorer#705

Open

colleenXu closed this as not planned Won't fix, can't repro, duplicate, stale Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data source: RaMP #69

Data source: RaMP #69

newgene commented Jun 1, 2022

andrewsu commented Jun 24, 2022

andrewsu commented Feb 22, 2023

colleenXu commented Feb 22, 2023

colleenXu commented Mar 17, 2023 •

edited

Loading

colleenXu commented Mar 17, 2023 •

edited

Loading

colleenXu commented Mar 18, 2023 •

edited

Loading

colleenXu commented Mar 21, 2023 •

edited

Loading

colleenXu commented Mar 22, 2023 •

edited

Loading

andrewsu commented Jun 21, 2023

colleenXu commented Aug 18, 2023

Data source: RaMP #69

Data source: RaMP #69

Comments

newgene commented Jun 1, 2022

andrewsu commented Jun 24, 2022

andrewsu commented Feb 22, 2023

colleenXu commented Feb 22, 2023

colleenXu commented Mar 17, 2023 • edited Loading

Info

Other endpoints

colleenXu commented Mar 17, 2023 • edited Loading

Issues writing x-bte annotation

pathways-from-analytes

common-reaction-analytes

colleenXu commented Mar 18, 2023 • edited Loading

colleenXu commented Mar 21, 2023 • edited Loading

colleenXu commented Mar 22, 2023 • edited Loading

andrewsu commented Jun 21, 2023

colleenXu commented Aug 18, 2023

colleenXu commented Mar 17, 2023 •

edited

Loading

colleenXu commented Mar 17, 2023 •

edited

Loading

`pathways-from-analytes`

`common-reaction-analytes`

colleenXu commented Mar 18, 2023 •

edited

Loading

colleenXu commented Mar 21, 2023 •

edited

Loading

colleenXu commented Mar 22, 2023 •

edited

Loading