-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop the requirement to support ill-typed literals with recognized datatype IRIs #60
Comments
The current text is a bit strange.
I don't think that the "MUST" can be meaningful if the literals are outside RDF-semantics. In RDF concepts, the text
Any system can issue warning for anything regardless of this text so it can be dropped or made advice text as encouragement to do that. For RDF Concepts , can we just say: which allows variation when there's justification. ("support" is stronger than "accept". "Accept" is about RDF terms (correct syntax). I would read "Support" is about acting, e.g. on the values c.f. D-entailment.) |
+1 Actually, I consider this bit of RDF Concepts to contradict RDF Semantics §7.2, which says:
and in fact some implementations already do :) This makes a strong case for replacing this MUST with a MAY in RDF-syntax, IMO. |
MAY is weak IMO. It would be nice to encourage the behavior of passing through syntactically correct data with "SHOULD accept ill-typed literals". |
This can be expressed as an advisory in the specification as a Note or within the Considerations section providing additional context for implementations to evaluate advantages and pitfalls. |
This was discussed during the rdf-star meeting on 31 October 2024. View the transcriptDrop the requirement to support ill-typed literals with recognized datatype IRIs 2pfps: I agree with what Andy says in the issue AndyS: it depends what "support" really means here AZ: I also want to ask what is ment by "support". If you have a system that does not recognize a datatype IRI <AndyS> RDF concepts -- "The list of datatypes supported by an implementation is determined by its recognized datatype IRIs." seems to be the nearest to defining "support". AZ: If you say this kind of graphs may not be supported, what about other kinds of inconsistencies. Should any such graph not be supported? pfps: one option would be to tweak the wording <pfps> One option is that implementations MUST accept input documents with ill-typed literals and SHOULD include the resultant triple in the RDF graph. gkellogg: it makes no sense to talk about an ill-type literal for non-recognised datatypes <pfps> That is - parsing MUST NOT stop at an ill-typed literal but the system MAY choose to not include the triple in the resultant graph. gkellogg: I think the idea is to be able to only retain well-typed literals <pfps> I would add that if an implementation drops the triple then it MUST produce a warning. gkellogg: it would be reasonable for RDF systems to not deal with ill-typed literals TallTed: the current text is "MUST accept", not "MUST support" ktk: how are different implementations dealing with this? AndyS: in SPARQL, there are cases when you need to assign a value, so it does not work with ill-typed literals but that a SPARQL process <pfps> agreed that it is difficult to require a warning james: We are very accepting (in our implem) and it has been very useful <AndyS> "SHOULD accept" -- MUST for warnings is a bit strange. We don't have a "warning" mechanism in the specs. james: but it's personal opinion Souri: when we find an ill-typed literal, we separate it AndyS: choosing the datatypes you choose to handle is something you do when you use the data TallTed: I'm concerned to hear that some implementations are not conformant <niklasl> +1 for evolution (with the caveat that I prefer opt-in "drop unrecognized" modes to avoid sending inexplicable data onward). Souri: if we have an xsd:integer with "abc" lexical form, we don't accept it, but if you have ex:mytype, we don't do anything <Zakim> pfps, you wanted to say that implementations that reject unrecognized datatypes are broken but ones that do not fully accept known ill-typed literals are not so bad james: we do 2 kind of things, one on the values to do efficient operation, and one that just take any literal transparently ktk: what do we do with this issue? We don't really have a conclusion <TallTed> "MUST accept" is current text Souri: in Oracle, we don't want to have, e.g., 31st February, so we reject it TallTed: not accepting data is bad but you can handle the ill-typed literals after they are loaded tl: I like the idea that there are several phases, 1st you parse and put in store, then other processes AndyS: I find the use cases of rejecting or not rejecting both reasonable <AndyS> My pref is change "MUST accept" to "SHOULD accept". All the described handling cases seem reasonable for their different cases. Souri: w do not reject entire graph, just the triples with ill-typed literals <Dominik_T> +1 for SHOULD Souri: the earlier the problems can be pointed out the better <ktk> Strawpoll: "Implementations MUST accept ill-typed literals" gets changed to "Implementations SHOULD accept ill-typed literals" <Dominik_T> +1 <gtw> +1 <ktk> +1 <pfps> +1 <AndyS> +1 <Souri> +1 <gkellogg> +1 <AZ> -0.3141592 <enrico> 0 <TallTed> -0.5 <james> 0 <niklasl> +0.5 (I might prefer some "SHOULD by default, MUST if asked to accept"...) TallTed: if we make this change, we have to be really clear how errors are dealt with AndyS: I don't think we should go into how errors and warnings are handled <TallTed> An ill typed literal is not a syntax error. <TallTed> An ill typed literal conforms to syntax. <Souri> +1 to AndyS AndyS: there's an historical example (??) where specs mentioned what to do with errors and it took a large space, and was eventually dropped niklasl: I had experienced cases of systems that reject things that I would have like be accepted because things evolved ktk: there could be a note that explain what pitfalls etc occur and how to deal with them |
A message, https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Nov/0008.html, was sent to the WG mailing list with a proposal to resolve this issue. The contents of the message are: PROPOSAL: Change the requirements for handling ill-typed literals so that The relevant wording is in RDF 1.2 Concepts and Syntax: If the literal's datatype IRI is in the set of recognized datatype IRIs, let d A possible change if the proposal is accepted is: ... |
The note in the possible wording above should be expanded to read: NOTE: Implementations MUST NOT exclude triples that do not contain literals with datatypes that are in their recognized datatypes from the RDF graphs they produce. |
This seems to be contradictory. Is "accept" meaning "can parse and continue" here? and then "produce ..." needs changing. Or is the MUST supposed to be SHOULD? "Implementations SHOULD accept ill-typed literals" |
@afs I don't think so. Implementations MUST accept ill-typed literals, in that they MUST NOT halt when they encounter an ill-typed literal, but they MAY decide to not include triples with ill-typed literals in the RDF graphs they produce. |
I agree with @afs that this can seem contradictory. I read the "produce RDF graphs for inputs" in
as requiring a conforming application to be able to return the graph with all its ill-typed literals intact, or otherwise it can't claim to be returning the original graph. Any other graph it produces, e.g. one in which ill-typed literals are dropped, can not claim to be the original graph. I read the second sentence
as honoring the fact that implementations are free to change graphs according to their needs. That is evident, but then the proposal adds the requirement that implementations SHOULD warn users if they drop ill-formed literals, and that seems sensible. If my interpretation is correct I support the design of the proposal, but maybe the wording could be made clearer? |
i suggest that the proposed change not be made for at least these reasons:
|
I don't see much loss in having literals like "a"^^xsd:int not being preserved. |
And |
It's good that you don't have such data to deal with. That's no guarantee that such data will never be encountered. What if the (apparently What if the apparent ill-typing is something like (I think) I'm fine with an implementation declaring that such data will be handled in that way, especially if some kind of alert is raised when it happens. I don't think I'm OK with our spec dictating any of the above handlings. Significantly, this would break from RDF 1.1 and 1.0, and I daresay, some datasets and stores would become non-compliant. I think there is little difference between rejecting literal data that doesn't match my internal definition of its declared data type, and rejecting literal data that has a declared data type that I don't recognize. I believe both should be accepted and stored. Errors may arise when some comparison function is applied to the literal based on its declared type and which fails because the literal does not actually fit that declared type. That is OK! This is the point at which the user may decide to change the type of that literal, or change the literal to suit the type, or some combination of the two. |
Section in current version of RDF 1.2 Concepts and Abstract Syntax: https://www.w3.org/TR/rdf12-concepts/#section-Graph-Literal It says:
The proposal is to include all possible variations being considered as comment here, so we can conduct a straw poll on them. |
Minimal change: Change to "Implementations SHOULD accept ..." where SHOULD is defined by RFC 2119:
This is than not a purely optional feature - that would use the word "MAY". |
Observation
RDF 1.1 requires that implementations support ill-typed literals, including ill-typed literals with recognized datatype IRIs.
Ill-typed literals with recognized datatype IRIs do not have any known use cases. They are semantically inconsistent, do not denote anything, have no value, and any triple that contains them is false in every interpretation.
Notice that there is nothing wrong with requiring implementations to support ill-typed literals with unrecognized datatype IRIs. For example, it is good that RDF implementations are required to support literals like [1] that have a datatype IRI that is not broadly recognized.
However, it is unclear why implementations are allowed to support, let alone are required to support, ill-typed literals with recognized datatype IRIs.
Example
Suppose a triple store recognizes the RDF datatype IRIs + the XSD datatype IRIs + the GeoSPARQL datatype IRIs. Such a triple store can upon data ingest immediately detect that [2] and [3] are ill-typed literals with recognized datatype IRI.
The RDF 1.1 standard forbids triple stores to throw an error upon encountering data that contains [2] or [3], even though this may be the preferred data quality approach for many users.
Suggestion
In RDF 1.2, let's weaken the RDF 1.1 phrase "Implementations MUST accept ill-typed literals" to:
Implementations MUST support the RDF datatype IRIs, and MAY support any other datatype IRIs that they believe important enough for their users. The notion "recognized datatype IRI" is used as defined in RDF 1.1 Semantics.
Ramifications
The proposed change makes it possible for RDF 1.2 data to be accepted in one implementation, but not in another implementation. For example, it is possible to upload data that contains literals [2] and [3] into an implementation that does not recognize the
xsd:boolean
datatype IRI. But it is not possible to upload the same data into an implementation that does recognize thexsd:boolean
datatype IRI.This differentiation is a good thing, because it allows stricter implementations to be created, rather than requiring all implementations to support the exact same ill-typed nonsense data.
Notice that RDF 1.1 Semantics already allows implementations to differ from one another in their support for more/fewer recognized datatype IRIs. Implementations that differ in their recognized datatype IRIs already differ in their behavior in RDF 1.1.
The text was updated successfully, but these errors were encountered: