Make Dataset#equals strict #23

rubensworks · 2019-01-03T07:40:25Z

Originally brought up in #18 (comment).

I suggest to make DatasetCore#equals not test for isomorphism, but instead test strict equality.
DatasetCore or Dataset could then have an additional isomorphic method.

As mentioned by @bergos, equals could be implemented as simple as result.difference(initial).size === 0.

The main reason I have for changing this is to be consistent with Term#equals. Furthermore, being isomorphic is not the same thing as being equal, so we should not confuse these things IMO.

The text was updated successfully, but these errors were encountered:

blake-regalia · 2019-01-03T22:06:50Z

This actually affects all other operations too. If you only had an isomorphic #equals and all other operations are strict, you could end up with a situation like this:

a.equals(b);  // true

a.difference(b).size === 0;  // false
a.contains(b);  // false
b.contains(a);  // false
let u = a.union(b);
u.size === a.size;  // false
u.size === b.size;  // false
let i = a.intersection(b);
i.size === a.size;  // false
i.size === b.size;  // false

If anything, I would think that if you wanted isomorphism to hold under each operation, you would need to first canonicalize each dataset anyway.

vhf · 2019-03-07T10:23:15Z

Has this been fixed here #18 (comment) or is it still open?

awwright · 2019-04-12T06:03:25Z

I think there's good reason to rename Dataset#equals, but I don't think "strict" equality is a useful concept. It's not meaningful to ask if two bnodes are the "same bnode" between graphs except to say they're conveying the same information, and mathematically this means testing if the graphs are isomorphic.

ericprud · 2019-04-18T08:47:10Z

Bnodes are scoped to a dataset rather than a graph, so it's possible for two graphs in a dataset to be tested for equivalence beyond isomorphism. Practically, it's handy to see if a copy of a graph has changed from the original (user changed their profile, librarian updated a PREMIS record, clinical record is an exact duplicate). Theoretically, the means you may be drawing the dataset boundary around your business process.

Regardless of whether rdfjs includes a .equals() to support such use cases, I agree that it should not be confused with either isomorphism or entailment. Users can supply their own .equals(), so long as the Dataset spec promises not to rewrite bnode labels.

ericprud · 2019-04-18T13:50:03Z

I created a strawman to describe the diff between equals and isomorphicTo.

(BTW, those anchors have "dom-" in them. I wonder if that's intended to convey that it applies to the DOM spec, or if it's just that the anchors are created in (our) DOM.

awwright · 2019-04-18T20:09:54Z

First, IMO, I don't think two graphs can share a bnode even if RDF Semantics says they can, as a matter of mathematics, because that would defeat the point of having bnodes being an existential quantifier: you can't conditions to an existential quantifier after the fact. But that's a quibble for that specification. Maybe I'm misunderstanding something, or maybe it's not a problem for some reason I haven't considered.

More relevant to this is, what case do you need to compare bnodes for "strict" equality because isomorphism checking doesn't suffice?

ericprud · 2019-04-19T07:18:39Z

Many use cases for this violate your position on the scope of bnodes as they count on some external reference to a bnode, either from another part of the dataset or from some program state. In order to duck this discussion, I'll use a subgraph use case; the results of a .getQuads() operation:

_:alice foaf:knows _:bob ;
  foaf:mbox <mailto:[email protected]> .
_:bob foaf:mbox <mailto:[email protected]> .
_:claire foaf:mbox <mailto:[email protected]> .

let knows = f.namedNode(foaf:knows)
let alice = d.getQuads(null, knows, f.namedNode(<mailto:alice@example.com>))
let known1 = d.getQuads(alice, knows)
// UI process allows user to update `_:alice`'s `foaf:knows` to be `_:claire`
let known2 = d.getQuads(alice, knows)
console.log(known1.equals(known2)) // false
console.log(known1.isomorphicTo(known2)) // true

Having a .equals() isn't strictly necessary as the user can always implement it; but it is:

handy for testing and change detection
much cheaper than .isomorphicTo().

awwright · 2019-04-19T08:42:10Z

@ericprud Can you elaborate the example? As I take your example, nothing changed because the graph conveys the same information before and after:
Before, "there exists a person who knows Alice"
And after, "there exists a person who knows Alice"

ericprud · 2019-04-19T16:14:19Z

The part of the graph we're examining started out saying:

There exists someone with the mbox mailto:[email protected] .
There exists someone with the mbox mailto:[email protected] .
There exists someone with the mbox mailto:[email protected] .
The person with mbox mailto:[email protected] knows the person with mbox mailto:[email protected].

The use case says that the user changed the graph to say:

There exists someone with the mbox mailto:[email protected] .
There exists someone with the mbox mailto:[email protected] .
There exists someone with the mbox mailto:[email protected] .
The person with mbox mailto:[email protected] knows the person with mbox mailto:[email protected].

The way the program detected that those particular triples changed (many other things may have changed in the graph as well, but we specifically saved state capturing who alice knew), was by asking again for who alice knows and comparing the results with .equals().

vhf · 2019-05-06T16:17:43Z

I think @ericprud's definition is clear (and I really like the other WebIDL changes).

Here is the example in code:

const assert = require('assert')
const rdf = require('some dataset-spec compliant lib')

const first = (dataset) => dataset.toArray()[0]

const knows = rdf.namedNode('http://xmlns.com/foaf/0.1/knows')
const mbox = rdf.namedNode('http://xmlns.com/foaf/0.1/mbox')
const alice = rdf.blankNode('alice')
const bob = rdf.blankNode('bob')
const claire = rdf.blankNode('claire')
const mailto = (address) => rdf.namedNode(`mailto:${address}`)

const dataset = rdf.dataset([
  rdf.quad(alice, knows, bob),
  rdf.quad(alice, mbox, mailto('[email protected]')),
  rdf.quad(bob, mbox, mailto('[email protected]')),
  rdf.quad(claire, mbox, mailto('[email protected]'))
])

const alice2 = dataset.match(null, mbox, mailto('[email protected]')).toArray()
assert(alice2[0].subject.equals(alice))

const known1 = dataset.match(alice, knows)
assert(first(known1).object.equals(bob))

dataset.delete(rdf.quad(alice, knows, bob))
dataset.add(rdf.quad(alice, knows, claire))

const known2 = dataset.match(alice, knows)
assert(first(known2).object.equals(claire))

assert(!known1.equals(known2))
assert(known1.isomorphicTo(known2))

Here's how I had implemented .equals: https://github.com/rdf-ext/rdf-dataset-indexed/blob/ff76073bb73c96eea77b3c7c64917733a399b15a/Dataset.js#L36-L46 , it works in the above example case although it's a bit different than the suggested implementation.

A suggestion to move forward with this:

Merge @ericprud's work without the .isomorphicTo part
Create a new issue here to discuss .isomorphicTo (do we want it to be part of the spec? is it a feature that would not be used that often and requires a ~big implementation so we leave it out? etc)

If most of you agree with this: @ericprud do you want to take care of creating this PR without .isomorphicTo, otherwise can I cherry-pick from your repo and open it?

rubensworks · 2019-05-07T06:39:49Z

Merge @ericprud's work without the .isomorphicTo part

👍

Create a new issue here to discuss .isomorphicTo

Implementation of this is indeed not that trivial (I have one here). From a user-perspective, this would definitely be useful to have. However, this would probably not be an operation that is needed for most cases, so I currently lean towards not including it in the spec, to lower the burden of implementors. Furthermore, there are different ways to implement .isomorphicTo, so it would make sense to give developers control over this, so that an implementation can be chosen that is optimal for the use-case.

awwright · 2019-05-07T09:35:23Z

I still have to revisit this, but having a user select between one bnode or another bnode seems like the wrong way to implement a program. Blank nodes are existential quantifiers, not identifiers. It fundamentally doesn't make sense to me for a user to change a node in a statement from one bnode to another one. If you want this functionality, use an identifier (URI/IRI).

Even using the example for the sake of argument, you can still use isomorphism here. What I want to know is, what does a strict check do that isomorphism cannot?

Performance shouldn't be that much of an issue. Let n be the number of statements; if the two graphs contain the same bnodes, or are found in the same internal sort order, the performance will be O(n), same as strict equality. Only if the bnodes are different addresses in memory, and they're in a different internal sort order, will the performance approach O(n!).

to lower the burden of implementors

On the contrary, I think it's important to standardize the things that are most difficult to implement. The whole point of having a library is to remove burden from developers. It wouldn't be much of a library if all the difficult stuff always got shoveled down the road.

awwright · 2019-05-07T09:44:41Z

Also, here's my simple implementation for reference, which additionally returns the mapping of bnodes if such a mapping is found: https://github.com/awwright/node-rdf/blob/master/lib/GraphEquals.js

I haven't done much work between the differences in the algorithms; mine is a simple subgraph matching algorithm instead of e.g. Jeremy Carroll's algorithm; but as I understand, the pathological case is always O(n!) and the best we can do is going to be for certain subsets of cases, like preserved sort order, and such.

ericprud · 2019-05-30T07:12:50Z

So where is consensus at this point? My reading of the above is that most folks want to merge in my strict equals() definition but that @awwright remains unconvinced that it's a practical or proper way to manipulate RDF data.

awwright · 2019-05-30T20:36:08Z

Well I wouldn't say unconvinced, but I'm looking for persuading information. If there's an application of this, where it needs to know some Datasets have the same statements pointing to the same bnodes, and mere isomorphism is not allowed, then I can consider that.

With such an example in mind, there'd be some follow-up questions. Is the name appropriate? The name "strict" sounds like the === operator which is usually preferable in ECMAScript; but for us, according to RDF Semantics, it's unpreferable.

Finally, in general, there's just a bunch of different ways to test for equality:

Are two datasets the same instance in memory?
Do two different instances of Dataset point to the same instances of statements? (this could be especially important if Quad is mutable!)
Are two Datasets different if they have different bnodes with the same label?
Do the Datasets share the same set of named graphs, and are each pair isomorphic?

The only definition of "equals" that ensures two graphs encode the same data as each other is isomorphism; developers who need something different might be better suited by specifically writing what they're looking for. Isomorphism is the only hard test on that list, anyways.

And again, performance isn't too much of an issue. If two graphs don't use any bnodes, or store statements in the same internal sort order, or point to the same bnode instance, or use the same bnode label, those cases can all be optimized down to O(n).

rubensworks mentioned this issue Jan 3, 2019

Methods of DatasetCore #18

Closed

vhf changed the title ~~Make DatasetCore#equals strict~~ Make Dataset#equals strict Apr 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Dataset#equals strict #23

Make Dataset#equals strict #23

rubensworks commented Jan 3, 2019

blake-regalia commented Jan 3, 2019

vhf commented Mar 7, 2019

awwright commented Apr 12, 2019

ericprud commented Apr 18, 2019

ericprud commented Apr 18, 2019

awwright commented Apr 18, 2019

ericprud commented Apr 19, 2019

awwright commented Apr 19, 2019

ericprud commented Apr 19, 2019

vhf commented May 6, 2019

rubensworks commented May 7, 2019

awwright commented May 7, 2019

awwright commented May 7, 2019

ericprud commented May 30, 2019

awwright commented May 30, 2019

Make Dataset#equals strict #23

Make Dataset#equals strict #23

Comments

rubensworks commented Jan 3, 2019

blake-regalia commented Jan 3, 2019

vhf commented Mar 7, 2019

awwright commented Apr 12, 2019

ericprud commented Apr 18, 2019

ericprud commented Apr 18, 2019

awwright commented Apr 18, 2019

ericprud commented Apr 19, 2019

awwright commented Apr 19, 2019

ericprud commented Apr 19, 2019

vhf commented May 6, 2019

rubensworks commented May 7, 2019

awwright commented May 7, 2019

awwright commented May 7, 2019

ericprud commented May 30, 2019

awwright commented May 30, 2019