-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Dataset#equals strict #23
Comments
This actually affects all other operations too. If you only had an isomorphic a.equals(b); // true
a.difference(b).size === 0; // false
a.contains(b); // false
b.contains(a); // false
let u = a.union(b);
u.size === a.size; // false
u.size === b.size; // false
let i = a.intersection(b);
i.size === a.size; // false
i.size === b.size; // false If anything, I would think that if you wanted isomorphism to hold under each operation, you would need to first canonicalize each dataset anyway. |
Has this been fixed here #18 (comment) or is it still open? |
I think there's good reason to rename Dataset#equals, but I don't think "strict" equality is a useful concept. It's not meaningful to ask if two bnodes are the "same bnode" between graphs except to say they're conveying the same information, and mathematically this means testing if the graphs are isomorphic. |
Bnodes are scoped to a dataset rather than a graph, so it's possible for two graphs in a dataset to be tested for equivalence beyond isomorphism. Practically, it's handy to see if a copy of a graph has changed from the original (user changed their profile, librarian updated a PREMIS record, clinical record is an exact duplicate). Theoretically, the means you may be drawing the Regardless of whether rdfjs includes a |
I created a strawman to describe the diff between equals and isomorphicTo. (BTW, those anchors have "dom-" in them. I wonder if that's intended to convey that it applies to the DOM spec, or if it's just that the anchors are created in (our) DOM. |
First, IMO, I don't think two graphs can share a bnode even if RDF Semantics says they can, as a matter of mathematics, because that would defeat the point of having bnodes being an existential quantifier: you can't conditions to an existential quantifier after the fact. But that's a quibble for that specification. Maybe I'm misunderstanding something, or maybe it's not a problem for some reason I haven't considered. More relevant to this is, what case do you need to compare bnodes for "strict" equality because isomorphism checking doesn't suffice? |
Many use cases for this violate your position on the scope of bnodes as they count on some external reference to a bnode, either from another part of the dataset or from some program state. In order to duck this discussion, I'll use a subgraph use case; the results of a _:alice foaf:knows _:bob ;
foaf:mbox <mailto:[email protected]> .
_:bob foaf:mbox <mailto:[email protected]> .
_:claire foaf:mbox <mailto:[email protected]> . let knows = f.namedNode(foaf:knows)
let alice = d.getQuads(null, knows, f.namedNode(<mailto:alice@example.com>))
let known1 = d.getQuads(alice, knows)
// UI process allows user to update `_:alice`'s `foaf:knows` to be `_:claire`
let known2 = d.getQuads(alice, knows)
console.log(known1.equals(known2)) // false
console.log(known1.isomorphicTo(known2)) // true Having a
|
@ericprud Can you elaborate the example? As I take your example, nothing changed because the graph conveys the same information before and after: |
The part of the graph we're examining started out saying:
The use case says that the user changed the graph to say:
The way the program detected that those particular triples changed (many other things may have changed in the graph as well, but we specifically saved state capturing who alice knew), was by asking again for who alice knows and comparing the results with |
I think @ericprud's definition is clear (and I really like the other WebIDL changes). Here is the example in code: const assert = require('assert')
const rdf = require('some dataset-spec compliant lib')
const first = (dataset) => dataset.toArray()[0]
const knows = rdf.namedNode('http://xmlns.com/foaf/0.1/knows')
const mbox = rdf.namedNode('http://xmlns.com/foaf/0.1/mbox')
const alice = rdf.blankNode('alice')
const bob = rdf.blankNode('bob')
const claire = rdf.blankNode('claire')
const mailto = (address) => rdf.namedNode(`mailto:${address}`)
const dataset = rdf.dataset([
rdf.quad(alice, knows, bob),
rdf.quad(alice, mbox, mailto('[email protected]')),
rdf.quad(bob, mbox, mailto('[email protected]')),
rdf.quad(claire, mbox, mailto('[email protected]'))
])
const alice2 = dataset.match(null, mbox, mailto('[email protected]')).toArray()
assert(alice2[0].subject.equals(alice))
const known1 = dataset.match(alice, knows)
assert(first(known1).object.equals(bob))
dataset.delete(rdf.quad(alice, knows, bob))
dataset.add(rdf.quad(alice, knows, claire))
const known2 = dataset.match(alice, knows)
assert(first(known2).object.equals(claire))
assert(!known1.equals(known2))
assert(known1.isomorphicTo(known2)) Here's how I had implemented A suggestion to move forward with this:
If most of you agree with this: @ericprud do you want to take care of creating this PR without |
👍
Implementation of this is indeed not that trivial (I have one here). From a user-perspective, this would definitely be useful to have. However, this would probably not be an operation that is needed for most cases, so I currently lean towards not including it in the spec, to lower the burden of implementors. Furthermore, there are different ways to implement |
I still have to revisit this, but having a user select between one bnode or another bnode seems like the wrong way to implement a program. Blank nodes are existential quantifiers, not identifiers. It fundamentally doesn't make sense to me for a user to change a node in a statement from one bnode to another one. If you want this functionality, use an identifier (URI/IRI). Even using the example for the sake of argument, you can still use isomorphism here. What I want to know is, what does a strict check do that isomorphism cannot? Performance shouldn't be that much of an issue. Let n be the number of statements; if the two graphs contain the same bnodes, or are found in the same internal sort order, the performance will be O(n), same as strict equality. Only if the bnodes are different addresses in memory, and they're in a different internal sort order, will the performance approach O(n!).
On the contrary, I think it's important to standardize the things that are most difficult to implement. The whole point of having a library is to remove burden from developers. It wouldn't be much of a library if all the difficult stuff always got shoveled down the road. |
Also, here's my simple implementation for reference, which additionally returns the mapping of bnodes if such a mapping is found: https://github.com/awwright/node-rdf/blob/master/lib/GraphEquals.js I haven't done much work between the differences in the algorithms; mine is a simple subgraph matching algorithm instead of e.g. Jeremy Carroll's algorithm; but as I understand, the pathological case is always O(n!) and the best we can do is going to be for certain subsets of cases, like preserved sort order, and such. |
So where is consensus at this point? My reading of the above is that most folks want to merge in my strict |
Well I wouldn't say unconvinced, but I'm looking for persuading information. If there's an application of this, where it needs to know some Datasets have the same statements pointing to the same bnodes, and mere isomorphism is not allowed, then I can consider that. With such an example in mind, there'd be some follow-up questions. Is the name appropriate? The name "strict" sounds like the Finally, in general, there's just a bunch of different ways to test for equality:
The only definition of "equals" that ensures two graphs encode the same data as each other is isomorphism; developers who need something different might be better suited by specifically writing what they're looking for. Isomorphism is the only hard test on that list, anyways. And again, performance isn't too much of an issue. If two graphs don't use any bnodes, or store statements in the same internal sort order, or point to the same bnode instance, or use the same bnode label, those cases can all be optimized down to O(n). |
Originally brought up in #18 (comment).
I suggest to make
DatasetCore#equals
not test for isomorphism, but instead test strict equality.DatasetCore
orDataset
could then have an additionalisomorphic
method.As mentioned by @bergos,
equals
could be implemented as simple asresult.difference(initial).size === 0
.The main reason I have for changing this is to be consistent with
Term#equals
. Furthermore, being isomorphic is not the same thing as being equal, so we should not confuse these things IMO.The text was updated successfully, but these errors were encountered: