Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCAT property for subsets ? #1527

Open
dzkwsk opened this issue Aug 12, 2022 · 8 comments
Open

DCAT property for subsets ? #1527

dzkwsk opened this issue Aug 12, 2022 · 8 comments
Labels
dcat feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round

Comments

@dzkwsk
Copy link

dzkwsk commented Aug 12, 2022

The content of a statistical classification evolves over time with explanatory notes for items that may change slightly and have successive versions. While it is important to point directly to the current version of all items in the classification, it is also relevant to obtain the history of all items. It would therefore be useful to be able to distinguish two dcat:distributions which correspond to the current notes of a classification on the one hand and to the whole history on the other.

dcat:hasCurrentVersion may not be exactly what we need. This property could separate different versions that correspond to the same master object. In our case, it is more of a content restriction to the latest versions of the notes.

A subproperty of dcat:Distribution whose scope is a subset corresponding to the current contents of a dcat:dataset would be relevant. Or would dcat:hasCurrentVersion still make sense anyway?
Deliverable(s): XKOS Best Practices

http://linked-statistics.github.io/xkos/xkos-best-practices.html#issue-container-number-12

@smrgeoinfo
Copy link
Contributor

If a classification system (A) is coherent and covering (non-overlapping, no gaps) for its scope, then if an individual class (C1) in the system is updated such that it changes the classification of other entities, then the update is breaking, and the classification system with the new concept (C1) MUST be identified as a new classification system (B). The issue is that if an entity is a member of class x under system A, it is not necessarily a member of class x under system B.

versioning of Classification systems is very tricky!

@dr-shorthair
Copy link
Contributor

Correct @smrgeoinfo

The statistical agencies are generally all across this. However DCAT is probably incomplete since I don't think we had anyone with expertise in official statistics in the conversations.

@nicholascar
Copy link
Contributor

I urge the DCAT editors to defer fine-grained versioning & change details to Dataset modelling and to not to try to cater for them at the DCAT the metadata level.

Consider: a large dataset like the Australian Address Database has addresses added and removed every few months, so should it have a long list of Distributions? No! The Dataset is the overall thing and Address addition/removal/change is annotated at the Feature (sub-Dataset) level since it is complex and knowledge about what an 'Address' is - a sub-Dataset element - is needed to correctly use such information.

My motivation for stepping in here is that I would hate to see DCAT get too expressive: more skill in the vocabulary will harm adoption for simple catalogues given the perception of it being "heavyweight" and broad adoption is more important to me that deep skill.

Anyway, there are already many Semantic Web ways to model versioning issues (e.g. PAV) that are DCAT-compatible. So use DCAT for the catalogue and drop down into fine-grained versioning in PAV, SDMX/QB etc. as needed.

@tfrancart
Copy link

If a classification system (A) is coherent and covering (non-overlapping, no gaps) for its scope, if an individual class (C1) in the system is updated such that it changes the classification of other entities, then the update is breaking, and the classification system with the new concept (C1) MUST be identified as a new classification system (B)

I agree but this is NOT what the original use-case describes. The use-case is that a class in a given classification system A is described with explanatory notes, and these explanatory notes changes over time, but this does not lead a reclassification of entities, so we are not creating a new classification system B. And the history of the notes is kept.

So basically, the question is what would be the recommended practice between:

  1. Keeping a single Dataset and multiple distributions, some distributions with full note history, some distributions without complete history of notes (just the latest version). And in that case how to identify/link/tag note-history-complete-distributions vs. only-current-note-version-distributions.
  2. Declaring 2 different Datasets : ex:classificationA-WithFullNoteHistory and ex:classificationA-WithOnlyMostCurrentNotes, and how to identify/link/tag those 2 datasets

For more details and regarding what XKOS suggests in terms of versioning of notes in statistical classification, see http://linked-statistics.github.io/xkos/xkos-best-practices.html#bp-notes-versioning-timestamping

@smrgeoinfo
Copy link
Contributor

Just my opinion, but to me, if the changes do not cause reclassification of entities (or introduction of new subcategories), then it would make sense to me to have one distribution with all the notes (assuming they are time stamped in some way).

@tfrancart
Copy link

Just my opinion, but to me, if the changes do not cause reclassification of entities (or introduction of new subcategories), then it would make sense to me to have one distribution with all the notes (assuming they are time stamped in some way).

Yes notes are timestamped (see link sent previously for details).
Yes we want to have one distribution with all the notes.
But we are also considering providing another distribution with only the most recent note of each concept, and not the full note history; we think it can be easier for data consumers if they don't have to query the note history to simply retrieve the current note.

And so the question is : would this be 2 distributions of the same dataset ? or 2 datasets (but this may not be practical for reusers) ? and how to identify/link/tag those distributions or datasets ?

@agbeltran agbeltran added the dcat label Sep 12, 2022
@andrea-perego andrea-perego added the feedback Issues stemming from external feedback to the WG label Oct 24, 2022
@riccardoAlbertoni
Copy link
Contributor

Before further studying, I noticed that
http://linked-statistics.github.io/xkos/xkos-best-practices.html#issue-container-number-12 seems to have disappeared from your draft document. Should we assume you have already resolved your doubts?

Can we close this issue?

@tfrancart
Copy link

tfrancart commented Nov 15, 2022

For the moment we have simply differed the answer and we have actually referred to this very issue; here is how the XKOS best practices document now reads:

These versions containing the history of all notes are considered as different distributions of the datasets. They should be described by specific properties. Here we use rdfs:comment for describing and discriminating the distributions, given a lack of other modeling alternatives in DCAT (as of september 2022; this was raised in this DCAT issue). Other information is given by the dcterms:temporal metadata that has a larger span for the distributions containing all the note changes.

Exact pointer : http://linked-statistics.github.io/xkos/xkos-best-practices.html#bp-publishing-classification

So, no I don't think the issue should be closed.

@andrea-perego andrea-perego changed the title dcat property for subsets ? DCAT property for subsets ? Nov 23, 2022
@andrea-perego andrea-perego added this to the DCAT3 CR milestone Nov 23, 2022
@davebrowning davebrowning added the future-work issue deferred to the next standardization round label Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round
Projects
None yet
Development

No branches or pull requests

9 participants