Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disease (MONDO:0005737) <subclass_of> Disease #173

Open
sstemann opened this issue Jan 31, 2022 · 9 comments
Open

Disease (MONDO:0005737) <subclass_of> Disease #173

sstemann opened this issue Jan 31, 2022 · 9 comments
Labels

Comments

@sstemann
Copy link
Contributor

Query: subclass.json
PK: 24506c4-b3e4-464e-89e9-eb4e456b9ef9
Control: Query looking for what diseases Ebola is a subclass of. Controls should be things like viral infection, filoviridae, or viral hemorrhagic fever

image

@finnagin
Copy link
Contributor

finnagin commented Feb 1, 2022

@MarkDWilliams I made a workflowitized version of this query which overlays ngd values and fisher exact test scores. It then scores the results based on those values and ranks them.

Workflow Query: subclassWorkflow.json
ARAX Results

@MarkDWilliams
Copy link
Collaborator

Expected result

Subject Predicate Object
MONDO:0005737 biolink:subclass_of MONDO:0018087
MONDO:0005737 biolink:subclass_of MONDO:0005108
MONDO:0005737 biolink:subclass_of MONDO:0005762

@edeutsch
Copy link

edeutsch commented Feb 2, 2022

@cbizon we are doing a little testing to reranking messages from other ARAs and are hitting some snags. We tried reranking the Aragorn results above, but ARAX had trouble with it. The main reason seems be bindings that don't go anywhere. In the Aragorn message: https://arax.ncats.io/api/arax/v1.2/response/99adc6b5-0803-4c52-972e-fe587744a7aa

we see:

    "query_graph": {
      "edges": {
        "e00": {
          "constraints": [],
          "object": "n01",
          "predicates": [
            "biolink:subclass_of"
          ],
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": [
            "MONDO:0005737"
          ],
          "is_set": false
        },
        "n01": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": null,
          "is_set": false
        }
      }
    },
    "results": [
...
      {
        "edge_bindings": {
          "e00": [
            {
              "attributes": [
                {
                  "attribute_source": null,
                  "attribute_type_id": "biolink:has_numeric_value",
                  "attributes": null,
                  "description": null,
                  "original_attribute_name": "weight",
                  "value": 0.2679491924311228,
                  "value_type_id": "EDAM:data_1669",
                  "value_url": null
                }
              ],
              "id": "16888608"
            }
          ],
          "s14": [
            {
              "attributes": [
                {
                  "attribute_source": null,
                  "attribute_type_id": "biolink:has_numeric_value",
                  "attributes": null,
                  "description": null,
                  "original_attribute_name": "weight",
                  "value": 0.06359757610223138,
                  "value_type_id": "EDAM:data_1669",
                  "value_url": null
                }
              ],
              "id": "1dfbc4c2-3884-4455-9b55-6e7433e7059b"
            }
          ]
        },

So it seems that a result has an edge binding from a KG edge to s14, which does not exist in the QG? So binds to nothing. Did we ever decide whether that is legal TRAPI? ARAX doesn't do that and got very unhappy when it saw it here. We're working to make ARAX more robust in the face of such things. This will be a wider issue where we try to operate on each others' TRAPI and have different assumptions.

We're fixing ARAX to be tolerant of this, but this might cause problems for others as well.

Perhaps this is an issue that we should face head on: is the above semantically valid TRAPI? or should a semantic validator of TRAPI (which we don't have today but should have) flag that as an error?

@edeutsch
Copy link

edeutsch commented Feb 2, 2022

Current TRAPI documentation would seem to imply that all edge_binding keys must resolve in the QueryGraph:

        edge_bindings:
          type: object
          description: >-
            The dictionary of Input Query Graph to Result Knowledge Graph edge
            bindings where the dictionary keys are the key identifiers of the
            Query Graph edges and the associated values of those keys are
            instances of EdgeBinding schema type (see below). This value is an
            array of EdgeBindings since a given query edge may resolve to
            multiple knowledge graph edges in the result.

@edeutsch
Copy link

edeutsch commented Feb 3, 2022

Here's a different question related to this test. Maybe we should all be on the same page on this.

Are all entities always a subclass of themselves?

I thought I heard it stated once that this was on a stone tablet? And Aragorn's results state it. But it is not in Mark's list.

@cbizon
Copy link
Collaborator

cbizon commented Feb 3, 2022

Yeah, pretty sure that according to the data modeling group, A is a subclass of A.

In terms of Mark's list, I don't think it's meant to be exhaustive, but a list of what a particular educated user might expect to be highly ranked answers.

@edeutsch
Copy link

edeutsch commented Feb 3, 2022

But isn't the list what we intend to test against with your testing framework? If so, the scoring on the test should be quite different on whether the self subclass is there. With the current list, Aragorn should be penalized substantially because its number one answer (Ebola itself) is not in the list. If we were to add Ebola itself to the list, then ARAX would be penalized because the number one answer (Ebola itself) is nowhere in our results.

@cbizon
Copy link
Collaborator

cbizon commented Feb 4, 2022

Yes the goal is to capture some good answers to build benchmarks. But (IMO) each benchmark represents one view of what makes a good answer, and a different benchmark might emphasize different qualities.

This set of benchmarks is driven by what a user might expect for simple one hops. I would argue (based on this particular result and the discussion on the call) that no user particularly wants to see the self-subclass result, even if it's useful at the modeling level in maintaining logical consistency.

In other words, I think it's a useful datapoint for ARAGORN to know that it's returning an uninteresting answer (defined in one way) to a user.

@edeutsch
Copy link

edeutsch commented Feb 4, 2022

Sounds good to me. I would agree that is the best way forward.

So it seems like the proposed formal policy is: From a modeling perspective every concept is also a subclass of itself, but when users of Translator request items with a subclass relationship, they should not be shown the self class as a result.

Should we document that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants