2013-12-16

Linked Open Vocabularies, please meet Google+

The Google+ Linked Open Vocabularies community was created more than one year ago. The G+ community feature was new and trendy at the time, and the LOV community gathered quickly over one hundred members, then the hype moved to someting else, and the community went more or less dormant. Which is too bad, because Google+ communities could be very effective tools, if really used by their members, and LOV creators, publishers and users definitely need a dedicated community tool. We made lately another step towards better interfacing this Google+ community and the LOV data base. Whenever available, we now use in the data base the G+ URIs to identify the vocabulary creators and contributors. As of today,  we have managed to identify a little more than 60% of LOV creators and contributors this way. 
Among those, only a small minority (about 20%) is member of the above said community, which means about 80% of this community members are either lurkers of users of vocabularies. It means also that a good deal of people identified by a G+ profile in LOV still rarely or never use it. One could think that we should then look at other community tools. But there are at least two good reasons to stick to this choice.
Google+ aims at being a reliable identity provider. This was clearly expressed by Google at the very beginning of the service. The recent launch of "custom URIs" such as http://google.com/+BernardVatant through which a G+ account owner can claim her "real name" in the google.com namespace is just a confirmation of this intention. "Vanity URLs" as some call them, are not only good at showing off or being cool. My guess is that they have some function in the big picture of the Google indexing scheme, and certainly something to do with the consolidation of the Knowledge Graph.
We need dynamic, social URIs. I already made this point at the end of the previous post. And the more so for URIs of living and active people. Using URIs of social networks will hopefully make obsolete the too long debate over "URI of Document" vs "URI of Entity". Such URIs are ambiguous, indeed, because we are ambiguous. 
The only strong argument against G+ URIs is that using URIs held by a private company namespace to identify people in an open knowledge project is a bad choice. Indeed, but alternatives might turn to be worse. 

2013-12-10

Content negotiation, and beyond

I had in the past, and for many years, looked at content negotiation with no particular attention, as just one among those hundreds of technical goodies developed to make the Web more user-friendly, along with javascript, ajax, cookies etc. When the httpRange-14 solution proposed by the TAG in 2006 was based on content negotiation, I was among those quite unhappy to see this deep and quasi-metaphysical issue solved by such a technical twist, but three months later I eventually came to some better view of it. Not only content negotiation was indeed the way to go, but this decision can be seen now as a small piece in a much bigger picture.
Content negociation has become so pervasive we don't even notice it any more. Even if a URI is supposed to have a permanent referent, what I GET when I submit that URI through a protocol is dependent on a growing number of parameters of the client-server conversation : traditional parameters pushed by the client are language preference, required mime type (the latter being used for the httpRange-14 solution), localisation, various cookies, and user login. Look at how http://google.com/+BernardVatant works. This URI is a reference for me on the Web (at least it's the one I give those days to people wanting a reference), but the representation it yields will depend on the user asking it : anonymous request, someone being logged on G+ but not in my circles, someone in my circles (and depending on which), someone having me in her circles etc, and of course of the interface (mobile, computer). This will look also differently if I call this URI indirectly from another page, like in a snippet etc.  
This kind of behavior will be tomorrow the rule. Every call to any entity through its URI will result in a chain of operations leading to a different conversation. And not only for profiles in social networks, not only for people, alive or dead, but for every entity on the web : places, products, events, concepts ... 
Imagine the following scenario applied to a VIAF URI for example. VIAF stores various representations of the same authority, names in various languages, preferred and alternative labels for the matching authority in a given library. I can easily imagine a customized acces to VIAF, where I could set my preferences such as my favourite library or vendor, with default values based on my geolocation (as already today in WorldCat) and/or user experience, parameters for selection of works (such as a period in time, only new stuff, only novels ...). The search on a name in VIAF would lead to a disambiguation interface if needed, and once the entity selected, to a customized representation of the resource identified under the hood.
This kind of customized content negotiation will not necessarily be provided by the original owner of the URI. In fact there certainly are a lot of sustainable business models around such services which would run on top of existing linked data. A temporal service would extract for any entity a time line of events involving this entity, e.g., events in the life of a person or enterprise, or various translations and adaptations of a book, life cycle of a product ... A geographical service would show the locations attached to an entity, like distribution of offices of a company or its organisational structure. And certainly the most popular way to interact with the entity will be to engage in the conversation with it, as we engage in conversation with people. In both pull and push mode. I would not say like Dominiek ter Heide that the Semantic Web has failed. But I agree it could. Things on the Web of Data have to go dynamic, join the global conversation, or die of obsolescence. 

2013-08-09

Thou shalt not take names in vain

This is certainly too serious a subject for a Friday night in the middle of August, but that's a good time for old ideas to be written down. And indeed this has been on my mind for so long, at least since I realized that common nouns such as english timeword, windows, apple, caterpillar, shell, bull, french orange, printemps, champion, géant, carrefour, german kinder, and many more, had been "borrowed" from the language commons to become brands. This is in principle forbidden by various trademark legislations, but there are subtle workarounds. I have always considered such practices as unacceptable enclosures in the knowledge commons. They might look anecdotic, leading to rather silly cases, but some borderline practices from major Web actors show that this affair is more important that it could seem at first sight.
One could argue that the market gives back words to the commons, lists of generic or genericized trademarks are easy to find, in a variety of languages. But curiously enough,  the other way round, systematic lists of common nouns used as trademarks I could not find either in Wikipedia or anywhere else. Note sure if they could get any longer than the former, in any case the lists I proposed to start on Wikipedia were proposed for deletion a few minutes after creation by zealous wardens of the Wikipedia Holy Rules, for lack of notability of the subject. Forget about it, I'm now trying to figure how to query DBpedia to get such a list, but the distinction between a proper name and a common noun is no more explicit in DBpedia descriptions and ontology than it is in Wikipedia.
Anyway, this is not necessarily the most important aspect of the way information technologies can impact, misuse and abuse our language commons at large. There is quite a lot of rules or guidelines one could imagine for that matter, some already explicited by laws even if tricky to enforce, some yet to be specified, not to mention being enforced. There is something deeply anchored in our culture about the fair use of names, coming certainly from the way they are rooted in our religions, hence I have only a slight compunction to take inspiration below from one of the most holy and ancient set of rules. Apologies to believers who might read the following as blasphemy uttered by an old agnostic, and disclaimer to everyone else : those were not cast in stone by any god on any mountain. But if the first and main item in this list seems clearly inspired by the Third Commandment, well, yes it is, and not only in form. The underlying claim is that every word, every name, carries along with it enough history and legacy to be honoured. Those who don't care that much about such religious considerations can read this as pragmatic deontological guidelines for a fair, efficient and sustainable use of names in our information systems at large, and on the Web in particular.

Here goes, ten items of course to stick to the original format. 
  1. Thou shalt not take names in vain
  2. Honour the many meanings of a name, for they belong to the Commons
  3. Acknowledge linguistic and semantic diversity, polysemy and synonymy
  4. Do not steal names from the Commons to be your proper names
  5. Do not sell and buy names, for they belong to the Commons
  6. Do not hide yourself or your products under false names
  7. Do not use names against their common meaning 
  8. Do not enforce your own meanings upon others
  9. Expose your meanings to the Commons, for they will be welcome
  10. Share your own names with the Commons, for they will thrive forever
I won't dwelve today in the details of each of those, some might look quite cryptic and need to be expanded in further posts. Just a remark on the first (and most important) one. The "take in vain" used by the King James version of the Bible has been replaced in more recent translations by "misuse". I prefer the former, which conveys the notion that whenever you use the name, it's not for nothing or something without importance and consequence. When you use a name, you should have well thought about its meaning. In French you would translate at best "Tu ne prendras pas les noms à la légère."

2013-07-16

FRBR and beyond : it's abstraction all the way

When we (Pierre-Yves and myself) decided a year ago to use FRBR to represent versions in Linked Open Vocabularies, we were not completely sure we were on the right track, but pretty convinced it was worth further exploration. FRBR concepts and insights should be powerful enough to be portable outside the library world. Using FRBR only in library science does not help to understand the power of its underlying paradigm, extract its very essence and port it successfully to other domains of knowledge representation. As always, translation is the key to better understanding.
Recent exchanges with old topic maps friends and thinkers Murray Altheim, Jack Park and Patrick Durusau (to name a few) achieved to convince me that it would lead to something interesting to look at other domains with a FRBR state of mind. What made some FRBR gurus frown at our use of FRBR in LOV is that the concepts of Work, Expression, Manifestation and Item are too much grounded in the library universe to be exactly portable to things like versions of a vocabulary (and beyond that to recipes, places, occurrences of a term in a text, models of a car, whatever). And yes indeed, a vocabulary is not really a Work in the library meaning of the term, and a version not really an Expression, and a version format not really a Manifestation. 
But at the end of the day, that is not the main point, nor is the fact to use four levels instead of three or five. BibFrame will keep certainly three levels instead of four, and looking closely one could define other levels of representation in the FRBR pile. At the most granular level, an Item has a life cycle, and one could define a level under Item called for example ItemState, which would represent the state of the Item in a given interval of time, having in its description such properties as the place in which this item has been kept during this interval, events marking its beginning and end (move, degradation, restoration ...). This level would be useful to the archivist, museum curator, restorers ... willing to track the story of their precious Items. At the other end of the spectrum, beyond the Work level, one could imagine something more abstract, like a common idea underlying different works of different creators, like the myth of Orpheus of which every Work it has inspired (painting, sculpture, poetry, movie, music ...) would be an avatar, so to speak.

What is it, then, which makes FRBR so portable and powerful? It's not its specific levels by themselves, but the general approach saying that any "thing" (anything) in its representation can be considered at different levels of abstraction (or realization), that such levels can be made distinct, named and identified separately, and the links between those levels made explicit. What is extremely portable is the notion of a spectrum extending on one direction towards more "concrete", "tangible" aspects of the thing, and in the other direction towards more "abstract" or "intangible" aspects. Not that there is any thin line you can draw which would clearly separate the (absolutely) "abstract" from the (absolutely) "tangible", as many have tried to do over ages, and even today try to make the foundation of upper ontologies. "Abstract" and "tangible" are useful concepts if you take them as relative opposite directions, not as essential (ontological) characteristics. 
FRBR captures this by using specific predicates "embodiment", "realization", "exemplar" ... depending on the levels of the objects it links. But actually it could use the same generic predicate at all levels with local constraints. A Work is an abstraction of an Expression, which is an abstraction of a Manifestation, itself an abstraction of an Item, itself an abstraction of an ItemState ... itself an abstraction of a stream of data captured by my senses at some moment, helped or not by various devices ... which might be the closest proxy to "reality" I can get.

Every representation in a language or information system (language being the first of them) can be actually seen as an abstraction of something else. And the power of mind and language is the capacity to bind abstractions together to create other levels of abstraction (sometimes ad nauseum).
There are indeed plenty of ways to formally deal with abstraction relationships in knowledge representation, but the most obvious and frequently used one is the class-instance relationship (aka rdf:type), and as a matter of fact many knowledge engineers bred in Description Logics or in RDF are reluctant to use anything else. With the limits that for example in OWL you can't easily use "types of types" (aka metaclasses in F-Logic) without stumbling into OWL-Full assertions and matching reasoning issues. But as FRBR shows, there are other ways to deal with abstraction. The link between an Expression and a Work could be modelized as a class-instance relationship. This model would fly well in a context where you consider only Work and Expression, but if you want to introduce Manifestation in the same model it will get you out of OWL-DL. The strength of FRBR pile is to represent abstraction otherwise.

Moving slightly off the native landscape of FRBR, out of library and into the kitchen, consider the general concept of apple crumble, a specific apple crumble recipe, the way your grandmother applied this recipe, and that unforgettable one she made for your tenth birthday. There again, you can debate for ages if the notion of apple crumble is a Work, an apple crumble recipe an Expression etc. Or you can just use specific classes such as "Abstract recipe", "Written recipe", "Recipe in action", and "Unforgettable recipe realization" (whatever you want to call them). And now that you have understood how this hammer works, you will see nails everywhere, such as :
Model of car > Car series > My car > My car as of today
Concept (in a thesaurus) > Term > Term encoding > Occurrence in a text
Software > Version > Distribution > My installation > My configuration today
A formal generalization of FRBR could be as simple as an Ontology Design Pattern enabling to deal in an uniform way with all such situations. Such a standard pattern would allow, without knowing the context or domain beforehand, humans and machines to identify levels and direction of abstraction. 

[Added 2013-07-23] Dan Brickley just made a good point on public-vocabs list, quite relevant to this discussion. Don't try to generalize too much the concepts, but a FRBR state of mind can be found with different concepts, hence different vocabularies, either in different branches of schema.org or in completely distinct ontologies.

2013-04-18

Adieu to hubjects.com

This is the end of the story I had updated a while ago. I don't own hubjects.com any more. If you have found this page, you have its new URL so it's OK.

2013-02-22

A small step for Google ...

... a giant leap for the Semantic Web?

For french-reading people I've already published two days ago on Mondeca's blog Leçons de Choses why I think the Google Knowledge Graph, despite all its value, is so far neither part of the Semantic Web, nor even properly interfaced with it. 
It's really frustrating, because a very small step for Google could represent indeed a giant leap for both Linked Data visibility at large and general understanding of the role of URIs for identification of things on the Web. This small step would be as simple for Google as doing the following.

- Acknowledge that the Web of things has not been invented by Google, but is the basis of the Semantic Web pile of languages, vocabularies and linked data (which did not invent the concept either, but provided the technical infrastructure enabling it).

- Acknowledge that the natural identification of things on the Web is done by dereferenceable URIs, and that billions of such URIs already exist, provided by scores of trustable sources.

- Acknowledge that the things stored by Google and displayed in the Knowledge Graph have most of the time already been identified by at least one of those URIs (on DBpedia, Freebase, VIAF, Geonames ...) 

- Hence, logically include such URIs in the Knowledge Graph descriptions, as every other regular linked data base does. 

This should be very simple and really cheap since Google certainly already holds such information, if only through Freebase. It does not even need to either coin its own URIs for the Knowledge Graph things, nor provide an API for them (this is a complementary move, but with certainly more technical, legal and business issues). 

And as a search engine, Google could (should) indeed go a step further, by ranking the URIs of things as it has done for the URLs of pages, images, videos, places, books, recipes ... As long as we have to go to a specific Semantic Web engine like Swoogle or Sindice to search a URI for some-thing, the Semantic Web will not be a natural part of the Web. Getting URIs of things as part of a regular search from the major search engine would be a significant milestone.

Added (2013-02-25) Follow-up of discussion with +Luca Matteis on Google+
Google results are currently URLs of pages in which the name of the thing I'm searching for is present, plus a proxy for the thing in the form of the Knowledge Graph item. Among results at https://www.google.com/search?q=Victor+Hugo I would like to find a little box with URIs identifying the thing-Victor-Hugo-himself, such as the following.
And looking more closely at this, even adding DBpedia to the query https://www.google.com/search?q=Victor+Hugo+DBpedia does not really improve the results, but actually shows that the URI explicitly declared by DBpedia as representing the thing I'm looking for, http://dbpedia.org/resource/Victor_Hugo, is plainly ignored. 

2013-02-08

Ontology Loose Coupling Annotation

I've slowly changed my mind since last year about schema.org semantics. The RDFa version of the data model, even if it was still "experimental", has clarified that schema.org notions of "domain" and "range" differ from their RDFS homonyms. I was pleased to see later on the proposal to rename them "domainIncludes" and "rangeIncudes" respectively to avoid further ambiguity, even if this proposal did not address all my questions.
And actually I now look forward to see those properties explicitly published and usable for linked vocabularies, because they offer a new way to link class to properties, alternative to the hard semantic constraints of either rdfs:domain and rdfs:range (often abused because default such alternative), or local constraints using OWL restrictions, more difficult to understand and use. The semantics of domainIncludes and rangeIncludes are indeed more fit to the needs of  loose semantic coupling in the linked data universe. They allow to suggest, indicate without enforcing any logical constraint, the classes that are expected or intended to be found for subjects and objects of a given predicate. They offers good guidelines to linked data publishers and consumers using the predicate. They are a nice workaround avoiding the cumbersome construction of complex domains using owl:unionOf.
Since the publication of those properties in the schema.org namespace does not seem in the top priorities, I decided to make a step forward and define them in a namespace of mine, and to use them in the latest version of the lingvoj.org ontology. This small vocabulary is (so far) called Ontology Loose Coupling Annotation. Not sure this is the most appealing title, maybe Loose Ontology Coupling (LoCo) would fly better.
Anyway ... OLCA, as it stands, defines among other properties olca:domainIncludes and olca:rangeIncludes as owl:AnnotationProperty, so that they can be included in OWL ontologies without interfering with the core semantics. 
I hope it will provide a way for lightly constrained popular vocabularies such as Dublin Core, SKOS and FOAF to play nicely together, through loose coupling declarations such as the one given as example.

dcterms:subject a  rdf:Property
    olca:domainIncludes   foaf:Document;
    olca:rangeIncludes    skos:Concept.

A proposal which seems to address the open range issue for DC terms. One can expect the subject (yes, I know, vocabulary clashes here) of dcterms:subject to be a foaf:Document, and it's OK, and the object to be a skos:Concept, and it's OK, but neither of those are constrained. The same property can link other classes as well, and it's OK. 

2013-02-04

From 'Long Data' to 'Long Meaning'

I was attracted and actually misled this morning by the title of this article published last week in Wired. I put comments both on the original article and in a Google+ post. But the question deserves certainly more than those quick reactions. Long is definitely climbing the hype curve, and it's a good thing if the concept meaning does not get blurred along the buzz process, a common pitfall accurately pointed by several comments on the  said Wired article.
The Long Now Foundation has coined the concept for quite a while now, and two recent Long Now Blog entries are indeed about Long Data. But Long Data is not only about gathering data from the past in consistent series despite all difficulties, but also (to paraphrase the tagline of next Dublin Core Conference) linking to the future, which means having the data of today still available 10,000 years from now.

2013-01-10

Hubject : the story so far

I coined the word hubject back in 2005. At the time I checked the pun was new, or at least unknown by Google. All I found were typos in the spelling of subject. The word and underlying concepts had not the success I secretly expected, adoption was actually limited to a relatively small circle of readers of this blog. Thinking there could be a business model I bought the domain name hubjects.com, which I never used beyond hosting this blog. I used hubject for a while as a pseudonym on Twitter, but with time came to the conclusion that the idea and the word were bound to slowly fade away.
Last year I discovered hubject.com was the Web of a company "connecting emobility across Europe", in plain words dealing with infrastructure for electrical vehicles. What was aimed at being a common name is now the proper name of Hubject GmbH, Torgauer Straße, 12-15 10829 Berlin. I sent them a mail asking if they were aware of previous use of the word, and as of today got no answer [1]
Yet I have not surrendered completely. Yesterday I reinstated the cloud of tags in the left menu. I never liked "tags" too much so I entitled it "hubjects" instead.

That's the story so far. Next summary around 2020.

[1] Update: On 2013-01-29 I eventually received a polite answer from hubject.com. "It is interesting for us to get to know that hubjects is used in a completely different context." 

2013-01-08

Everyone knows what a Semantic Dog looks like

A fierce debate is raging those days on the lod-public list between +Kingsley Idehen and +Hugh Glaser, plus a couple of others. The question is to know if the huge efforts to publish mountains of linked data have produced so far any kind of visible and useful applications consuming them. In other words, where are the semantic dogs consuming those heaps of Semantic Web Dog Food? Kingsley holds it of course that they have invaded all the Web avenues, and Hugh that they are nowhere to be seen. Obviously they are not looking for the same kind of dogs, or they don't agree on what a semantic dog could look like, making me wondering if such dogs might be akin to the dragon of the story.
This debate is to compare with +Amit Sheth's recent post entitled "Data Semantics and Semantic Web - 2012 year-end reflections and prognosis" suggesting among other things that although another five years or so could be necessary for the Linked Open Data to gain enough quality to allow building upon it seriously, on the other hand things like Google Knowledge Graph are opening the way to pervasive semantic applications. 
Beyond those ongoing cries of  "Publish more linked data" and "Show me the applications" why not try in this New Year time to think about linked data in a Long Now perspective?