Disease (MONDO:0005737) <subclass_of> Disease #173

sstemann · 2022-01-31T22:36:50Z

Query: subclass.json
PK: 24506c4-b3e4-464e-89e9-eb4e456b9ef9
Control: Query looking for what diseases Ebola is a subclass of. Controls should be things like viral infection, filoviridae, or viral hemorrhagic fever

finnagin · 2022-02-01T03:19:19Z

@MarkDWilliams I made a workflowitized version of this query which overlays ngd values and fisher exact test scores. It then scores the results based on those values and ranks them.

Workflow Query: subclassWorkflow.json
ARAX Results

MarkDWilliams · 2022-02-02T19:44:56Z

Expected result

Subject	Predicate	Object
MONDO:0005737	biolink:subclass_of	MONDO:0018087
MONDO:0005737	biolink:subclass_of	MONDO:0005108
MONDO:0005737	biolink:subclass_of	MONDO:0005762

edeutsch · 2022-02-02T23:35:04Z

@cbizon we are doing a little testing to reranking messages from other ARAs and are hitting some snags. We tried reranking the Aragorn results above, but ARAX had trouble with it. The main reason seems be bindings that don't go anywhere. In the Aragorn message: https://arax.ncats.io/api/arax/v1.2/response/99adc6b5-0803-4c52-972e-fe587744a7aa

we see:

    "query_graph": {
      "edges": {
        "e00": {
          "constraints": [],
          "object": "n01",
          "predicates": [
            "biolink:subclass_of"
          ],
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": [
            "MONDO:0005737"
          ],
          "is_set": false
        },
        "n01": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": null,
          "is_set": false
        }
      }
    },
    "results": [
...
      {
        "edge_bindings": {
          "e00": [
            {
              "attributes": [
                {
                  "attribute_source": null,
                  "attribute_type_id": "biolink:has_numeric_value",
                  "attributes": null,
                  "description": null,
                  "original_attribute_name": "weight",
                  "value": 0.2679491924311228,
                  "value_type_id": "EDAM:data_1669",
                  "value_url": null
                }
              ],
              "id": "16888608"
            }
          ],
          "s14": [
            {
              "attributes": [
                {
                  "attribute_source": null,
                  "attribute_type_id": "biolink:has_numeric_value",
                  "attributes": null,
                  "description": null,
                  "original_attribute_name": "weight",
                  "value": 0.06359757610223138,
                  "value_type_id": "EDAM:data_1669",
                  "value_url": null
                }
              ],
              "id": "1dfbc4c2-3884-4455-9b55-6e7433e7059b"
            }
          ]
        },

So it seems that a result has an edge binding from a KG edge to s14, which does not exist in the QG? So binds to nothing. Did we ever decide whether that is legal TRAPI? ARAX doesn't do that and got very unhappy when it saw it here. We're working to make ARAX more robust in the face of such things. This will be a wider issue where we try to operate on each others' TRAPI and have different assumptions.

We're fixing ARAX to be tolerant of this, but this might cause problems for others as well.

Perhaps this is an issue that we should face head on: is the above semantically valid TRAPI? or should a semantic validator of TRAPI (which we don't have today but should have) flag that as an error?

edeutsch · 2022-02-02T23:54:58Z

Current TRAPI documentation would seem to imply that all edge_binding keys must resolve in the QueryGraph:

        edge_bindings:
          type: object
          description: >-
            The dictionary of Input Query Graph to Result Knowledge Graph edge
            bindings where the dictionary keys are the key identifiers of the
            Query Graph edges and the associated values of those keys are
            instances of EdgeBinding schema type (see below). This value is an
            array of EdgeBindings since a given query edge may resolve to
            multiple knowledge graph edges in the result.

edeutsch · 2022-02-03T01:01:55Z

Here's a different question related to this test. Maybe we should all be on the same page on this.

Are all entities always a subclass of themselves?

I thought I heard it stated once that this was on a stone tablet? And Aragorn's results state it. But it is not in Mark's list.

cbizon · 2022-02-03T13:42:24Z

Yeah, pretty sure that according to the data modeling group, A is a subclass of A.

In terms of Mark's list, I don't think it's meant to be exhaustive, but a list of what a particular educated user might expect to be highly ranked answers.

edeutsch · 2022-02-03T16:47:02Z

But isn't the list what we intend to test against with your testing framework? If so, the scoring on the test should be quite different on whether the self subclass is there. With the current list, Aragorn should be penalized substantially because its number one answer (Ebola itself) is not in the list. If we were to add Ebola itself to the list, then ARAX would be penalized because the number one answer (Ebola itself) is nowhere in our results.

cbizon · 2022-02-04T13:51:28Z

Yes the goal is to capture some good answers to build benchmarks. But (IMO) each benchmark represents one view of what makes a good answer, and a different benchmark might emphasize different qualities.

This set of benchmarks is driven by what a user might expect for simple one hops. I would argue (based on this particular result and the discussion on the call) that no user particularly wants to see the self-subclass result, even if it's useful at the modeling level in maintaining logical consistency.

In other words, I think it's a useful datapoint for ARAGORN to know that it's returning an uninteresting answer (defined in one way) to a user.

edeutsch · 2022-02-04T17:33:07Z

Sounds good to me. I would agree that is the best way forward.

So it seems like the proposed formal policy is: From a modeling perspective every concept is also a subclass of itself, but when users of Translator request items with a subclass relationship, they should not be shown the self class as a result.

Should we document that?

sstemann added the standup label Jan 31, 2022

cbizon mentioned this issue Feb 4, 2022

How to score subclass edges? ranking-agent/aragorn-ranker#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disease (MONDO:0005737) <subclass_of> Disease #173

Disease (MONDO:0005737) <subclass_of> Disease #173

sstemann commented Jan 31, 2022

finnagin commented Feb 1, 2022

MarkDWilliams commented Feb 2, 2022

edeutsch commented Feb 2, 2022

edeutsch commented Feb 2, 2022

edeutsch commented Feb 3, 2022

cbizon commented Feb 3, 2022

edeutsch commented Feb 3, 2022

cbizon commented Feb 4, 2022

edeutsch commented Feb 4, 2022

Disease (MONDO:0005737) <subclass_of> Disease #173

Disease (MONDO:0005737) <subclass_of> Disease #173

Comments

sstemann commented Jan 31, 2022

finnagin commented Feb 1, 2022

MarkDWilliams commented Feb 2, 2022

edeutsch commented Feb 2, 2022

edeutsch commented Feb 2, 2022

edeutsch commented Feb 3, 2022

cbizon commented Feb 3, 2022

edeutsch commented Feb 3, 2022

cbizon commented Feb 4, 2022

edeutsch commented Feb 4, 2022