Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifier and IdentifierSystem #137

Closed
Fak3 opened this issue Jul 29, 2024 · 31 comments · Fixed by #148
Closed

Identifier and IdentifierSystem #137

Fak3 opened this issue Jul 29, 2024 · 31 comments · Fixed by #148

Comments

@Fak3
Copy link
Contributor

Fak3 commented Jul 29, 2024

Products and Organizations can have multiple identifiers expressed using various identifier schemes.

Current model issues

As currently suggested in DigitalproductPassport.md, identifier is described with idScheme, idValue, idSchemeName:

{
      "id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
      "name": "ACME Pty Ltd",
      "idValue": "90664869327",
      "idScheme": "abr.business.gov.au",
      "idSchemeName": "Australian Business Number"
}

The problem is that those properties are assigned not to the specific identifier, but to the entity, which can have multiple identifiers with different identification schemes.

Let's imagine this json-ld data is stored and processed by owl inferencer. And in their graph database they already reflect that this organization have two identifiers:

<did:web:acme.au> <owl:sameAs> <https://abr.business.gov.au/ABN/View?abn=90664869327> .

Processing this new data according to owl:sameAs semantics, the owl inferencer will add new triples to the graph:

<did:web:acme.au> <idSchemeName> "Australian Business Number" .

This does not make sense, as it means Decentralized Identifier conforms to ABN identifier scheme.

Proposed model

To resolve the issue, identifiers should be described separately from the entity itself:

"id": "did:web:company.au",
"identifier": [{
  "notation": "115378.76",
  "isPartOf": {
     "id": "augov:ABN",
     "type": "IdentifierSystem",
     "name": "Australian Business Number"
  }
}, {
  "notation": "679429",
  "isPartOf": {
     "id": "gs1:GLN",
     "type": "IdentifierSystem",
     "name": "GS1 Global Location Number"
  }
}]

In the example above I omitted "type": "Identifier" but included explicit "type": "IdentifierSystem". Whether we should require those types explicitly declared in documents is debatable.

I suggest reusing existing vocabularies:

  • Class adms:Identifier based on the UN/CEFACT Identifier class.
    Properties:
    • skos:notation -- string with the literal value of the identifier
    • dcterms:isPartOf -- reference to the ebg:IdentifierSystem

An important point to note is that properties of adms:Identifier are properties of the Identifier, not the resource that it identifies or the agency that issued it.

  • Class ebg:IdentifierSystem from BusinessGraph vocabulary
    Definition from ontology.ttl: "A system managed by a publisher (e.g., a register or agency) that is used to issue identifiers to entities (companies, persons, etc)."
    Properties:

    • schema:name -- label for humans
    • dcterms:creator -- Agent that issues identifiers and then keeps them in a database (register), and who issued a specific identifier
  • Property adms:identifier that links a resource to the adms:Identifier.

There are other properties in the BusinessGraph ontology wich can be reused, for ex jurisdiction, issuance and expiration date, etc... We should probably add them to our json-ld context file as well.

Related to #135

Similar issue was discussed on traceability vocab: w3c-ccg/traceability-vocab#944

@Fak3
Copy link
Contributor Author

Fak3 commented Jul 29, 2024

@VladimirAlexiev, as a coauthor of BuisnessGaph ontology, can you please comment if the proposed reuse of it is adequate?

@VladimirAlexiev
Copy link

Hi @Fak3 , thanks for referencing EuBusinessGraph!

@Fak3
Copy link
Contributor Author

Fak3 commented Aug 3, 2024

@VladimirAlexiev at w3c-ccg/traceability-vocab#944 you proposed to use schema:PropertyValue instead of more specific adms:Identifier. Is there any reason to prefer the former?

@onthebreeze
Copy link
Contributor

Not sure how to resolve this one. @Fak3 says

As currently suggested in DigitalproductPassport.md, identifier is described with idScheme, idValue, idSchemeName:

{
      "id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
      "name": "ACME Pty Ltd",
      "idValue": "90664869327",
      "idScheme": "abr.business.gov.au",
      "idSchemeName": "Australian Business Number"
}

The problem is that those properties are assigned not to the specific identifier, but to the entity, which can have multiple identifiers with different identification schemes.

I don't think I agree with the assertion that these properties are not specific to the identifier.

the id is the full URI of the identifier - globally unique
the name is human readable name as registered with that speecific registry
the idValue is the identifier as it is know within the registry - unique only within the registry.
the idScheme is the URI of the registry itself.
the idSchemeName is the human readable name of the id scheme.

Lets imagine the same business entity has another identifier in another business register

{
      "id": "https://gln.gs1.org/1234567",
      "name": "ACME Industries",
      "idValue": "1234567",
      "idScheme": "gln.gs1.org",
      "idSchemeName": "Global Location Number"
}

Totally different values for every property. There in nothing that demands an entity to use their national registered legal entity name when creating a GLN with GS1 - they might choose something more like their trading name.

Could you clarify the problem here?

@Fak3
Copy link
Contributor Author

Fak3 commented Aug 15, 2024

Lets imagine the same business entity has another identifier in another business register

  1. Initially the application has two distinct graph nodes for the same entity:
{
      "id": "https://gln.gs1.org/1234567",
      "name": "ACME Industries",
      "idValue": "1234567",
      "idScheme": "gln.gs1.org",
      "idSchemeName": "Global Location Number"
}

and

{
      "id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
      "name": "ACME Pty Ltd",
      "idValue": "90664869327",
      "idScheme": "abr.business.gov.au",
      "idSchemeName": "Australian Business Number"
}
  1. Some third-party registry then sends to the application statement of equality between the two identifiers (as they represent the same entity):
{
  "id": "https://abr.business.gov.au/ABN/View?abn=90664869327"
  "owl:sameAs": "https://gln.gs1.org/1234567"
 }
  1. Application processes owl:sameAs according to its semantics, which instructs to copy all the properties from node1 to node2 and vice versa. Now it has two nodes with all their properties merged:
{
      "id": "https://gln.gs1.org/1234567",
      "owl:sameAs": "https://abr.business.gov.au/ABN/View?abn=90664869327", 
      "name": ["ACME Industries", "ACME Pty Ltd"],
      "idValue": ["1234567", "90664869327"],
      "idScheme": ["gln.gs1.org", "abr.business.gov.au"],
      "idSchemeName": ["Global Location Number", "Australian Business Number"]
}

and

{
      "id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
      "owl:sameAs": "https://gln.gs1.org/1234567", 
      "name": ["ACME Industries", "ACME Pty Ltd"],
      "idValue": ["1234567", "90664869327"],
      "idScheme": ["gln.gs1.org", "abr.business.gov.au"],
      "idSchemeName": ["Global Location Number", "Australian Business Number"]
}

Thus, the original intent of assigning idScheme to single node with particular identifier is violated by applying owl:sameAs according to its semantics. I.e the resulting graph will contain the nonsense triple:
<https://gln.gs1.org/1234567> <idScheme> "abr.business.gov.au" .

The issue here is that with the current data model intended separation between the distinctly identified nodes can't be ensured. The RDF states that properties describe entity itself, while we are currently assuming that properties describe entity's identifier, violating that rule.

So my proposal is to separate identifier metadata into its own node, which explicitly describes identifier of that original entity. In that proposal identifier becomes a distinct entity (graph node with type adms:Identifer), linked to the original entity (graph node with type Party) via adms:identifier property.

@VladimirAlexiev
Copy link

In addition, it makes sense to split out a separate IdentifierSystem node.
Otherwise different Identifier nodes can have inconsitent info about the scheme that they use.

Currently you have only 2 props:

      "idScheme": "abr.business.gov.au",
      "idSchemeName": "Australian Business Number"

but in the future you may have more, eg:

  • who issues it
  • where is its home page
  • what is the URL pattern for company pages in that register
  • is it unique, unambiguous, opaque, etc

You can read about it in the euBusinessGraph Semantic Data Model https://docs.google.com/document/d/1dhMOTlIOC6dOK_jksJRX0CB-GIRoiYY6fWtCnZArUhU/edit#heading=h.hofh07qhoz6m

@onthebreeze
Copy link
Contributor

OK I'll separate out the id scheme into it's own class with it's own "id" so that the graph will have a seaprate node for identity schemes.

@Fak3
Copy link
Contributor Author

Fak3 commented Aug 24, 2024

This issue still exists on current published spec:
Example DPP from https://uncefact.github.io/spec-untp/docs/specification/DigitalProductPassport:

"issuer": {
   "type": "CredentialIssuer",
   "id": "did:web:identifiers.acme.com:12345",
   "name": "ACME industries",
   "otherIdentifiers": [{
      "type": "Entity",
      "id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
      "name": "ACME Pty Ltd",
      "idValue": "90664869327",
      "idScheme": "abr.business.gov.au",
      "idSchemeName": "Australian Business Number"
    }]
},

Should be:

"issuer": {
   "type": "CredentialIssuer",
   "id": "did:web:identifiers.acme.com:12345",
   "name": "ACME industries",
   "identifier": [{
      "type": "Identifier",
      "notation": "https://abr.business.gov.au/ABN/View?abn=90664869327",
      "name": "ACME Pty Ltd",
      "isPartOf": {
           "id": "abr.business.gov.au/ABN",
           "type": "IdentifierSystem",
           "name": "Australian Business Number"
       }
    }]
},

@Fak3 Fak3 reopened this Aug 24, 2024
@onthebreeze
Copy link
Contributor

I missed to update the sample snippet on the the page. But the model and schema and sample at the top of the page is different -please check https://uncefact.github.io/spec-untp/assets/files/untp-digital-product-passport-v0.3.6-86c6ee585e0905f8871b40838616f9ff.json

@Fak3
Copy link
Contributor Author

Fak3 commented Aug 24, 2024

I missed to update the sample snippet on the the page. But the model and schema and sample at the top of the page is different -please check https://uncefact.github.io/spec-untp/assets/files/untp-digital-product-passport-v0.3.6-86c6ee585e0905f8871b40838616f9ff.json

This sample has the same issue:

"issuer": {
    "type": [
      "CredentialIssuer"
    ],
    "id": "did:web:identifiers.example-company.com:12345",
    "name": "Example Company Pty Ltd",
    "otherIdentifiers": [
      {
        "type": [
          "Entity"
        ],
        "id": "https://business.gov.au/ABN/View?abn=1234567890",
        "name": "Sample Company Pty Ltd",
        "registeredId": "1234567890",
        "idScheme": {
          "type": [
            "IdentifierScheme"
          ],
          "id": "https://business.gov.au/ABN/",
          "name": "Australian Business Number"
        }
      }
    ]
  },

Should be:

"issuer": {
  "type": "CredentialIssuer",
  "id": "did:web:identifiers.example-company.com:12345",
  "name": "Example Company Pty Ltd",
  "identifier": [
    {
      "type": "Identifier",
      "notation": "https://business.gov.au/ABN/View?abn=1234567890",
      "name": "Sample Company Pty Ltd",        
      "isPartOf": {
        "type": "IdentifierSystem",
        "id": "https://business.gov.au/ABN-HTTP",
        "name": "Australian Business Number URL"
      }
    },
    {
      "type": "Identifier",
      "notation": "1234567890",
      "name": "Sample Company Pty Ltd",        
      "isPartOf": {
        "type": "IdentifierSystem",
        "id": "https://business.gov.au/ABN",
        "name": "Australian Business Number"
      }
    }
  ]
},

Also note that "https://business.gov.au/ABN-HTTP" and plain "https://business.gov.au/ABN" are different IdentifierSystems

@onthebreeze
Copy link
Contributor

I think there might be a linked data architecture or strategy question behind this issue. I think it boils down to a question of whether entities should be merged with there is some kind of equivalence declared or only when identifiers are exactly identical. If an entity with ID = abn.gov.au/123454567 declares "otherIidentifiers" like gs1.org/gln/9876543 then does this mean they are the same and all data about abn.gov.au/123454567 and gs1.org/gln/9876543 should be merged?

  • @Fak3 suggests that the declaration of other identifiers is a declaration of equivalence (like "owl:sameAs") and that common practice will be to merge the two entities into one. And, when that is done, there will be logical inconsistencies.
  • my experience on the other hand is that this is a very common scenario (two different identifiers about what "might" be the same thing") and that merging is always bad practice. Government agencies face this exact scenario all the time and have learned the hard way not to merge because it is then almost impossible to split.

In the example given, an ABN is an Australian national business tax registration number. A GLN is a GS1 identifier for a logistics location. In some cases where a business has only one operating location these two IDs could resolve to very similar things. But even so, a legal tax registration is not the same as a logistics location. Also, as soon as the business opens a second location and creates another GLN there will be far worse inconsistencies associated with any merge.

I suggest add some words in the graphs section of UNTP to emphasise that meta data of two entities should only be merged when the declared identifiers are identical (eg two instances of gs1.org/gln/9876543) but never when two different identifiers are declared to be related and possibly equivalent.

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 2, 2024

Section "9.2 Equivalence" of gs1 digital link spec https://www.gs1.org/docs/Digital-Link/GS1_Digital_link_Standard_i1.1.pdf mentions use of owl:sameAs
Screenshot_20240902_151359

As well as the section "7.2 Decompression":
Screenshot_20240902_151201

@philarcher @mgh128 Can you please tell if you see a realistic scenario for a single product referenced by several different gs1 links? For ex. issuer1 uses compressed link, issuer2 uses uncompressed GTIN+batch. Could both links reference the same product in two different Product Passports or Conformity Credentials?

If we forbid to process owl:sameAs on the verifier side, we must document that clearly in the spec. This restriction will force verifiers to to do one or a combination of the following:

  1. construct more complex SPARQL queries
  2. strip off incoming owl:sameAs statements
  3. apply custom graph merging rules (strip off or replace idScheme property)

@jgmikael
Copy link

jgmikael commented Sep 2, 2024

Hi, in the EU context the Nordic business register authority cooperation advocates the use of adms:Identifier but in the way it's been modeled in the EU Core Vocabularies:
https://semiceu.github.io/Core-Business-Vocabulary/releases/2.2.0/#Identifier

Here all the attributes are directly properties of the Identifier class...
"Properties > For this entity the following properties are defined: date of issue , identifies , notation , schema agency , scheme name , scheme URI ."

Then again there's a reference to "the UN/CEFACT class with the same name" - but in the UN/CEFACT CCL these attributes are actually includet in the uDT (IdentifierType) - and there they are indeed mostly properties of the "Identification Scheme".

An example of a Finnish implementation of EU Core Business Vocabularies > "Identifier"
https://tietomallit.suomi.fi/en/model/isa2core/class/Identifier?ver=0.1.0 - this is the OWL Vocabulary, from which SHACL Shapes are derived by reuse.... RDF version of the whole vocabulary: https://tietomallit.suomi.fi/api/getModelAsFile?modelId=isa2core&fileType=RDF&raw=true&version=0.1.0&language=en

@onthebreeze
Copy link
Contributor

The gs1 doc basically says "use owl:sameAs" with care - only when you are really sure the two different t identifiers refer to the same thing.

UNTP is not going to specify owl:sameAs in any scenario - that's a choice of the processor using vocabularies we don't control

UNTP can only recommend that "If two things have different identifiers then dint legę them". I really don't understand why this is even a contentious issue?

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 2, 2024

The gs1 doc basically says "use owl:sameAs" with care - only when you are really sure the two different t identifiers refer to the same thing.

UNTP is not going to specify owl:sameAs in any scenario - that's a choice of the processor using vocabularies we don't control

UNTP can only recommend that "If two things have different identifiers then dint legę them". I really don't understand why this is even a contentious issue?

We don't really leave a choice to the processors. If they receive and accept equivalence statements, they face the inconsistent graph, as described in the comments above. So to prevent inconsistent processing we must either fix data model as proposed here, or document specific guidance how to deal with inconsistency.

@onthebreeze
Copy link
Contributor

Ok fair enough. But where are we (ie UNTP) making any equivalence statement using owl:sameAs ? Is there an assumption that "otherIdentifiers" or "alsoKnownAs" should be interpreted as "owl:sameAs"?

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 2, 2024

Ok fair enough. But where are we (ie UNTP) making any equivalence statement using owl:sameAs ? Is there an assumption that "otherIdentifiers" or "alsoKnownAs" should be interpreted as "owl:sameAs"?

We do reference using GS1 digital link in our IdentityResolver.md

GS1 digital link spec mandates the owl:sameAs relationship between short (compressed) and full links

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 2, 2024

Hi, in the EU context the Nordic business register authority cooperation advocates the use of adms:Identifier but in the way it's been modeled in the EU Core Vocabularies:
https://semiceu.github.io/Core-Business-Vocabulary/releases/2.2.0/#Identifier

Then again there's a reference to "the UN/CEFACT class with the same name" - but in the UN/CEFACT CCL these attributes are actually includet in the uDT (IdentifierType) - and there they are indeed mostly properties of the "Identification Scheme".

Thank you! I believe it is important to align with EU Core business vocabulary and UN/CEFACT CCL. We should reuse same data model and have dedicated Identifier class for identifier metadata.

@onthebreeze
Copy link
Contributor

The uncefact ccl defines an identifier data type that mixes both the ID of the entity and the ID of the identifier scheme in one class. Which is what I thought was exactly what you are objecting to?

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 2, 2024

The uncefact ccl defines an identifier data type that mixes both the ID of the entity and the ID of the identifier scheme in one class. Which is what I thought was exactly what you are objecting to?

No, as we discussed on slack, separating identifierSystem metadata is not that important, as it does not lead to inconsistent graph.

@onthebreeze
Copy link
Contributor

But surely we are talking about two different questions here. Short and full links are just different technical representations of exactly the same registry entry. There's not even a merge question here because they both point to the exact same entry.

But whether or not to merge two entries across two different registers is not the same question.

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 2, 2024

But surely we are talking about two different questions here. Short and full links are just different technical representations of exactly the same registry entry. There's not even a merge question here because they both point to the exact same entry.

But whether or not to merge two entries across two different registers is not the same question.

"http://example.org/gtin/054123450013/lot/ABC%26%2B123?3103=000189&3923=2172"
And "(3103)000189(01)05412345000013(3923)2172(10)ABC&+123" reference the same product but follow different identifier schemes with different parsing rules, so it is incorrect to say that the first has the compressed identifier schema and the second has a full one

@onthebreeze
Copy link
Contributor

Those are not different schemes. They are both the same GTIN. Just different technical representations of the same thing. "Scheme" does not refer to a technical syntax. A different scheme means a different register (like ABN vs GLN).

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 2, 2024

Those are not different schemes. They are both the same GTIN. Just different technical representations of the same thing. "Scheme" does not refer to a technical syntax. A different scheme means a different register (like ABN vs GLN).

For the example of same schema and different registries, there is an european union open data endpoint: https://data.europa.eu/data/sparql?locale=en

queriying it with SELECT distinct ?d ?x WHERE { ?d owl:sameAs ?x . ?d a vcard:Organization . } LIMIT 1000
Reveals that

<http://data.brreg.no/enhetsregisteret/enhet/950037687> <owl:sameAs> <https://register.geonorge.no/organisasjoner/norsk-institutt-for-naturforskning> 

i.e there are two registries: 1. data.brreg.no 2. register.geonorge.no
And they state that those two organization identifiers are strictly equivalent.

Do we ever encounter such cases in untp?

@mgh128
Copy link

mgh128 commented Sep 2, 2024

But surely we are talking about two different questions here. Short and full links are just different technical representations of exactly the same registry entry. There's not even a merge question here because they both point to the exact same entry.

But whether or not to merge two entries across two different registers is not the same question.

"http://example.org/gtin/054123450013/lot/ABC%26%2B123?3103=000189&3923=2172"
And "(3103)000189(01)05412345000013(3923)2172(10)ABC&+123" reference the same product but follow different identifier schemes with different parsing rules, so it is incorrect to say that the first has the compressed identifier schema and the second has a full one

http://example.org/01/054123450013/10/ABC%26%2B123?3103=000189&3923=2172

is an example of a fully uncompressed GS1 Digital Link URI.

(3103)000189(01)05412345000013(3923)2172(10)ABC&+123

is an example of a corresponding GS1 element string using parentheses around the GS1 Application Identifiers. It is not a GS1 Digital Link URI nor any kind of compressed format.

It would be OK to express an owl:sameAs relationship between an uncompressed GS1 Digital Link URI and the exactly equivalent compressed or partially compressed GS1 Digital Link URIs but only if they identify the same thing, i.e. if the fully/partially compressed GS1 Digital Link URI encodes the same combination of GS1 Application Identifiers and their values.

In practice, GS1 does not currently recommend the use of fully compressed or partially compressed GS1 Digital Link URIs within 2D barcodes for products. In most situations, a cautious use of upper-case alphanumeric characters and very few symbol characters enables efficient QR encoders to use the "alphanumeric" mode rather than "binary/byte" mode and this typically achieves an equivalent reduction in size of QR Code without the complexity of handling compression or decompression.

I would expect that when GS1 Digital Link URIs are used within Linked Data or within Verifiable Credentials, it would be the fully uncompressed format, without any compression.

I hope this helps.

@VladimirAlexiev
Copy link

I think the discussion went astray.
The gist is that we need different classes for entity and Identifier .
As for IdentifierSystem, that is optional, depending on whether we need to:

  • express characteristics of the system that don't belong to an individual Identifier (I.e. beyond dateIssued, dateExpiry). Eg the GLEI RAL ID is such.
  • express when one agency issues different kinds of identifiers. That's quite common and you can see examples in GLEI RAL

@onthebreeze
Copy link
Contributor

I still do not understand. Every node in a graph needs an identifier. So if there is an Entity class it must have an id. Separately that id may be issued under a governed scheme which itself has an id.

So I understand if the requirement is "entities (with id) should be a separate class from identifierScheme (with it's own id)". But I don't understand what it means to say "we need sepaate classes for entity and identifier". If the id of an entity is in a separate class then what is the id of the entity class??

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 4, 2024

@onthebreeze graph node must have exactly one primary id. Also it can have additional identifiers associated with it via properties.

Instance of a Product class, as an abstract concept can be independently described by multiple parties (issuers), while each party independently can choose a different primary id for the graph node which represents that same instance of a Product class in their own separate graph.

Verifier receives those separate graphs and some additional data which suggests that those parties indeed chose different primary id for the same Product instance. Now knowing that those ids refer to the same entity, he can correctly apply business rules, treating properties of those separate graph nodes as if they belong to the same entity.

So one issuer chose one identifier of an entity and promoted it to be graph node's primary id. Then, as currently suggested, he associates idScheme and all other product properties with this primary id. Note here is that all of these properties describe a particular physical product instance, except for idScheme which describes the abstract arbitrarily chosen primary id of a node in the graph.

Now the verifier attempting to apply business logic must be careful, because it faces data with physical properties of a product, which does not depend on the issuers choice, mixed with multiple idScheme properties, each one of idScheme is only valid for a particular choice of node's primary id of the issuer.

@Fak3
Copy link
Contributor Author

Fak3 commented Sep 16, 2024

I have just stumbled upon a recommendation to use adms:identifier property to link DCAT datasets in section 5.1.2 of the report on Open government data ecosystem in Europe

Screenshot_2024-09-16-16-31-38-358-edit_org.mozilla.firefox.jpg

@mgh128
Copy link

mgh128 commented Sep 16, 2024 via email

@onthebreeze
Copy link
Contributor

I think the outcome of all this is

  • Identifier and IdentifierScheme should be different node objects - this is now the case.
  • Implementers should not assume that multiple identifiers listed for a given product/facility/organisation are semantically identical. AlsoKnownAs is not the same as owl:sameAs.

Closing this unless there are objections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants