2010-07-13

Coreference using substitution rules

Note : This is mostly copied/adapted from a message I posted last week in yet another conversation about the identity issue on W3C Library Linked Data Incubator Group internal mailing list.
Basically, most proposals to tackle the identity issue have boiled down so far to use direct assertions. To express that http://ex1.org/foo and http://ex2.org/bar denote more or less exactly the same thing, one uses dedicated predicates to make declarations such as:

http://ex1.org/foo     p      http://ex2.org/bar

The predicate p may stand here for owl:sameAs, rdfs:seeAlso; skos:exactMatch; umbel:isLike, any future foaf:whatever ... all those predicates conveying some kind of co-reference. In fact, even if it's not respected, among those only owl:sameAs has hard-defined semantics, the other ones can be interpreted at will by applications, through any follow-your-nose heuristics. Moreover, defining formal semantics for any of those will not prevent hacking. You can define as many same-ness similarity properties you like, they are bound to be used and abused the same way owl:sameAs has been. And if you consider that owl:sameAs semantics are as straightforward as can be, go figure how more subtle definitions will be hacked.

But there are other ways to explore this issue, including the radical "blank hub" way introduced here years ago. The path I would like to explore now uses operational rules rather than declarative assertions, and in particular substitution rules.

The basic principle is as following : Two denotations (e.g., URIs) are (somehow) co-referent if they can be substituted to each other in (some, many, most, all) assertions.

An owl:sameAs declaration amounts to absolute substitutability. When substitutability is partial, substitution rules could assert the conditions under which substitution is valid.

For exemple one could say that ex:author is substitutable to dc:creator if the subject of the predicate is a Book. Put formally, using e.g., RIF Basic Logic Dialect
Forall ?x ?p (ex:author(?x ?p) :- And(?x#ex:Book dc:creator(?x ?p))
This rule is different, and in fact independent of a declaration such as
ex:author rdfs:subPropertyOf dc:creator
because it does not say anything about the use of those properties outside the Book class.

Let's take an example discussed at length a few months ago on DBpedia forum.
ex1:MichelleObama rdf:type foaf:Person
ex2:MichelleObama rdf:type skos:Concept
In which context are those URIs substitutable? Certainly not for assertions using either predicates specific to the class foaf:Person (foaf:mbox) or specific to the class skos:Concept (skos:related) or which would bear different values for the two resources (dcterms:date). But they are substitutable for example for labeling predicates, such as :
?x rdfs:label 'Michelle LaVaughn Robinson'
which hold for both URIs.

This could be captured by the following rule (using RIF syntax again)

Forall ?name (rdfs:label(ex2:MichelleObama ?name) :- rdfs:label(ex1:MichelleObama ?name))

Using such rules has several advantages over declarative assertions:

- They do not need extra vocabulary to be defined and (mis)understood
- They have non-ambiguous formal interpretation
- They are flexible ad libitum to cover the whole spectrum of similarity-sameness flavours.

They can be expressed in various, more or less expressive rule languages, such as SPARQL CONSTRUCT.