Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKOS Category extracted produces some weird triples #711

Open
kurzum opened this issue Sep 3, 2021 · 2 comments
Open

SKOS Category extracted produces some weird triples #711

kurzum opened this issue Sep 3, 2021 · 2 comments
Assignees
Labels
priority issues to be discussed by the dev-team status: accepted status: fix-provided PR related to issue was submitted status: minidump-test-provided status: verification-discussion-needed hard to decided if the issue was solved correctly type: data

Comments

@kurzum
Copy link
Member

kurzum commented Sep 3, 2021

Issue validity

Some explanation: DBpedia Snapshot is produced every three months, see Release Frequency & Schedule, which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. Berlin or Joe_Biden here: http://dief.tools.dbpedia.org/server/extraction/en/
If the issue persists, please post the link from your browser here:

http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Category%3APininfarina&revid=&format=trix&extractors=custom

Error Description

Please state the nature of your technical emergency:

See title,

Pinpointing the source of the error

Where did you find the data issue? Non-exhaustive options are:

Should be one of these:

Also I assume that error is caused by these line on Wikipedia (https://en.wikipedia.org/wiki/Category:Pininfarina)

{{commonscat|Pininfarina}}
{{Cat main|Pininfarina}}

Details

please post the details

Wrong triples RDF snippet

http://dbpedia.org/resource/Category:Pininfarina | http://purl.org/dc/terms/subject | http://dbpedia.org/resource/Pininfarina 
-- | -- | -- | --
http://dbpedia.org/resource/Pininfarina | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://www.w3.org/2004/02/skos/core#Concept |

Expected / corrected RDF outcome snippet

  1. remove the triple starting with http://dbpedia.org/resource/Pininfarina . It is easier, if all extractors just produce triples with the page as subject.
http://dbpedia.org/resource/Pininfarina | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://www.w3.org/2004/02/skos/core#Concept 
  1. use custom property for linking Category: to main article, because dct:subject is definitely the wrong one, i.e. wrong direction and underspecified semantics. I created dbo:mainArticleForCategory http://mappings.dbpedia.org/index.php/OntologyProperty:MainArticleForCategory for this
<http://dbpedia.org/resource/Category:Pininfarina> dbo:mainArticleForCategory <http://dbpedia.org/resource/Pininfarina>

Example DBpedia resource URL(s)


Other

@jlareck jlareck self-assigned this Sep 3, 2021
jlareck added a commit that referenced this issue Sep 3, 2021
@jlareck
Copy link
Collaborator

jlareck commented Sep 3, 2021

This data error was in the TopicalConceptsExtractor and so I removed extraction of triples like:

http://dbpedia.org/resource/Pininfarina | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://www.w3.org/2004/02/skos/core#Concept 

and changed dct:subject to dbo: mainArticleForCategory in one of parts of this extractor

jlareck added a commit to jlareck/extraction-framework that referenced this issue Sep 17, 2021
@kurzum kurzum added the priority issues to be discussed by the dev-team label Sep 21, 2021
@Vehnem Vehnem added status: fix-provided PR related to issue was submitted tested labels Sep 23, 2021
@kurzum kurzum added this to the snapshot-2021-09 milestone Sep 23, 2021
@kurzum
Copy link
Member Author

kurzum commented Nov 18, 2021

TODO:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority issues to be discussed by the dev-team status: accepted status: fix-provided PR related to issue was submitted status: minidump-test-provided status: verification-discussion-needed hard to decided if the issue was solved correctly type: data
Projects
None yet
Development

No branches or pull requests

3 participants