DCAT property for subsets ? #1527

dzkwsk · 2022-08-12T13:57:00Z

The content of a statistical classification evolves over time with explanatory notes for items that may change slightly and have successive versions. While it is important to point directly to the current version of all items in the classification, it is also relevant to obtain the history of all items. It would therefore be useful to be able to distinguish two dcat:distributions which correspond to the current notes of a classification on the one hand and to the whole history on the other.

dcat:hasCurrentVersion may not be exactly what we need. This property could separate different versions that correspond to the same master object. In our case, it is more of a content restriction to the latest versions of the notes.

A subproperty of dcat:Distribution whose scope is a subset corresponding to the current contents of a dcat:dataset would be relevant. Or would dcat:hasCurrentVersion still make sense anyway?
Deliverable(s): XKOS Best Practices

http://linked-statistics.github.io/xkos/xkos-best-practices.html#issue-container-number-12

smrgeoinfo · 2022-08-12T19:00:47Z

If a classification system (A) is coherent and covering (non-overlapping, no gaps) for its scope, then if an individual class (C1) in the system is updated such that it changes the classification of other entities, then the update is breaking, and the classification system with the new concept (C1) MUST be identified as a new classification system (B). The issue is that if an entity is a member of class x under system A, it is not necessarily a member of class x under system B.

versioning of Classification systems is very tricky!

dr-shorthair · 2022-08-14T06:07:03Z

Correct @smrgeoinfo

The statistical agencies are generally all across this. However DCAT is probably incomplete since I don't think we had anyone with expertise in official statistics in the conversations.

nicholascar · 2022-08-14T23:35:52Z

I urge the DCAT editors to defer fine-grained versioning & change details to Dataset modelling and to not to try to cater for them at the DCAT the metadata level.

Consider: a large dataset like the Australian Address Database has addresses added and removed every few months, so should it have a long list of Distributions? No! The Dataset is the overall thing and Address addition/removal/change is annotated at the Feature (sub-Dataset) level since it is complex and knowledge about what an 'Address' is - a sub-Dataset element - is needed to correctly use such information.

My motivation for stepping in here is that I would hate to see DCAT get too expressive: more skill in the vocabulary will harm adoption for simple catalogues given the perception of it being "heavyweight" and broad adoption is more important to me that deep skill.

Anyway, there are already many Semantic Web ways to model versioning issues (e.g. PAV) that are DCAT-compatible. So use DCAT for the catalogue and drop down into fine-grained versioning in PAV, SDMX/QB etc. as needed.

tfrancart · 2022-08-16T09:08:22Z

If a classification system (A) is coherent and covering (non-overlapping, no gaps) for its scope, if an individual class (C1) in the system is updated such that it changes the classification of other entities, then the update is breaking, and the classification system with the new concept (C1) MUST be identified as a new classification system (B)

I agree but this is NOT what the original use-case describes. The use-case is that a class in a given classification system A is described with explanatory notes, and these explanatory notes changes over time, but this does not lead a reclassification of entities, so we are not creating a new classification system B. And the history of the notes is kept.

So basically, the question is what would be the recommended practice between:

Keeping a single Dataset and multiple distributions, some distributions with full note history, some distributions without complete history of notes (just the latest version). And in that case how to identify/link/tag note-history-complete-distributions vs. only-current-note-version-distributions.
Declaring 2 different Datasets : ex:classificationA-WithFullNoteHistory and ex:classificationA-WithOnlyMostCurrentNotes, and how to identify/link/tag those 2 datasets

For more details and regarding what XKOS suggests in terms of versioning of notes in statistical classification, see http://linked-statistics.github.io/xkos/xkos-best-practices.html#bp-notes-versioning-timestamping

smrgeoinfo · 2022-08-16T17:46:20Z

Just my opinion, but to me, if the changes do not cause reclassification of entities (or introduction of new subcategories), then it would make sense to me to have one distribution with all the notes (assuming they are time stamped in some way).

tfrancart · 2022-08-17T11:10:50Z

Just my opinion, but to me, if the changes do not cause reclassification of entities (or introduction of new subcategories), then it would make sense to me to have one distribution with all the notes (assuming they are time stamped in some way).

Yes notes are timestamped (see link sent previously for details).
Yes we want to have one distribution with all the notes.
But we are also considering providing another distribution with only the most recent note of each concept, and not the full note history; we think it can be easier for data consumers if they don't have to query the note history to simply retrieve the current note.

And so the question is : would this be 2 distributions of the same dataset ? or 2 datasets (but this may not be practical for reusers) ? and how to identify/link/tag those distributions or datasets ?

riccardoAlbertoni · 2022-11-15T17:19:59Z

Before further studying, I noticed that
http://linked-statistics.github.io/xkos/xkos-best-practices.html#issue-container-number-12 seems to have disappeared from your draft document. Should we assume you have already resolved your doubts?

Can we close this issue?

tfrancart · 2022-11-15T21:29:43Z

For the moment we have simply differed the answer and we have actually referred to this very issue; here is how the XKOS best practices document now reads:

These versions containing the history of all notes are considered as different distributions of the datasets. They should be described by specific properties. Here we use rdfs:comment for describing and discriminating the distributions, given a lack of other modeling alternatives in DCAT (as of september 2022; this was raised in this DCAT issue). Other information is given by the dcterms:temporal metadata that has a larger span for the distributions containing all the note changes.

Exact pointer : http://linked-statistics.github.io/xkos/xkos-best-practices.html#bp-publishing-classification

So, no I don't think the issue should be closed.

agbeltran added the dcat label Sep 12, 2022

andrea-perego added the feedback Issues stemming from external feedback to the WG label Oct 24, 2022

andrea-perego changed the title ~~dcat property for subsets ?~~ DCAT property for subsets ? Nov 23, 2022

andrea-perego added this to the DCAT3 CR milestone Nov 23, 2022

andrea-perego mentioned this issue Nov 23, 2022

Update ack section for CR #1549

Closed

davebrowning added the future-work issue deferred to the next standardization round label Feb 13, 2023

davebrowning modified the milestones: DCAT3 CR, DCAT Future Priority Work Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DCAT property for subsets ? #1527

DCAT property for subsets ? #1527

dzkwsk commented Aug 12, 2022

smrgeoinfo commented Aug 12, 2022

dr-shorthair commented Aug 14, 2022

nicholascar commented Aug 14, 2022

tfrancart commented Aug 16, 2022

smrgeoinfo commented Aug 16, 2022

tfrancart commented Aug 17, 2022

riccardoAlbertoni commented Nov 15, 2022

tfrancart commented Nov 15, 2022 •

edited

Loading

DCAT property for subsets ? #1527

DCAT property for subsets ? #1527

Comments

dzkwsk commented Aug 12, 2022

smrgeoinfo commented Aug 12, 2022

dr-shorthair commented Aug 14, 2022

nicholascar commented Aug 14, 2022

tfrancart commented Aug 16, 2022

smrgeoinfo commented Aug 16, 2022

tfrancart commented Aug 17, 2022

riccardoAlbertoni commented Nov 15, 2022

tfrancart commented Nov 15, 2022 • edited Loading

tfrancart commented Nov 15, 2022 •

edited

Loading