2018-09-28

The stubborn arithmetic of cousins

When you get involved in genealogy, sooner or later, after days, weeks, months or years of patient research (depending on how lucky and obstinate you are) you discover that your best friend, your boss, the old lady next door, your favourite writer or singer, your loyal enemy or the latest serial killer, all are to some degree your cousin. 
Actually you knew this had to be true, in theory at least. All humans have common ancestors, somewhere in the past. But it's a completely different story to be able to identify and name them, and figure how far ago that was. It might be quite easy if you and I belong to families who have kept their genealogical records over centuries, and can proudly show their lineage tracing up to Charlemagne. No big deal, actually, since anyone tracing her ascendance thus far is likely to be in the same case. According to the genealogical database Roglo, the identified descendants of Charlemagne are more than 1,500,000, more than 20% of the database of about 7,500,000 people. But if your ancestors are, like mine, obscure and illiterate peasants, we are likely to stumble upon the lack of documents beyond the few last centuries of church and civil registries, lucky enough if we can reach as far in the past as around 1600 for some more or less reliable information about a handful of ancestors. This seems quite far away, but it's only about a dozen of generations, which means a few thousands of people. 
So, how far have you to go to find common ancestors with your best friend? Let's have a look at the harsh reality of numbers. The number of your ancestors at generation n is 2^n. You have two parents, four grand-parents, and so on. Counting thirty years for each generation (give or take a few), ten generations span three centuries. At the tenth generation you count 2^10 ancestors, which is about one thousand. Being born in the 1950's means I had around one thousand ancestors living around 1650 (under Louis XIV). Three centuries and ten generations before, it was one million around 1350 (under Jean II le Bon), and the same stubborn arithmetic leads to one billion ancestors around 1050 (under Henri Ier). Like in the famous wheat and chessboard problem, the exponential law makes figures explode beyond control at some point. Except that no one can have as many ancestors as one billion in 1050, because the entire world population by that time was less than half this figure. The curve of my theoretical number of ancestors crosses the curve of world population somewhere at the beginning of the 12th century. 
What does that mean? Any of my ancestors before 1200 is likely to be my ancestor by so many different paths, and is probably your ancestor as well. People tracing their genealogy thus far in the past know they indeed are all cousins. And all of royal ascendance, of course, since along those millions of different paths, it's highly probable to find a king or queen. But whether you know it or not, the figures are relentless. You who read those lines, you are very probably my cousin, but we'll also probably never know precisely either at which degree or the name and epoch of our last common ancestor. This is both a fascinating and frustrating conclusion.

2018-09-05

Half the sky of Wikidata

I thought I was over with this blog where I'd not published for almost two years, but I've been back to linked data lately, through a grandfather's interest in genealogy. For what is genealogy, if not the ancestor of linked data science? Genealogical trees are maybe the first type of semantic graph ever invented. Entities (persons) linked to each other by predicates such as has father, has mother, has child, has sibling, married to, linked to places (of birth, of death, of marriage), points in time (dates of birth, marriage, death), occupations, works etc. One could think that genealogical data would be the first candidate to be exposed as linked open data. But far from it. Most genealogical data is locked in proprietary data bases, and exchanged in formats far from the semantic web standards. The largest of those data bases such as MyHeritage hold billions of records.

In the linked data world, Person is indeed the most represented type of things, but the figures are three orders of magnitude below those of the above quoted giant genealogical data silos. As I write, Wikidata contains over 4,500,000 people. The current exact value can be retrieved from this query thanks to the excellent Wikidata SPARQL interface. That other query retrieves the current number of women (declared of gender female), a little more than 700,000. A similar one yields the number of those declared as male, more than 3,000,000. It lets a number of people of which gender is neither male or female, or not specified in the data base, similar to the number of women.

Let's not nitpick on numbers, and face the obvious fact that Wikidata has a strong gender bias. Less than one person out of five in Wikidata is a woman. This is not of course a deliberate Wikidata policy, but a mirror of how the notability process works at large in our world, not only in Wikipedia (the main source of Wikidata) but also in other data sources such as library authorities. If one applies to the previous queries a supplementary filter such as people with an ISNI or VIAF identifier, the proportion stays about the same. Is this changing with time? Maybe men were more notable in old ages, and the results are more balanced nowadays. Barely. More than half of people identified in Wikidata are born after 1900, and filtering the above queries to select only people of less than 50 years (born since 1968), one finds about 200,000 women for 550,000 men. The ratio has raised up slightly over 25%. A little better, but no big deal. Not half the sky, yet.



Many women can certainly be added to Wikidata, without breaching too much the notability policy. Reading through many Wikipedia articles of so-called notable people (of either gender), one can notice that women linked to them are often quoted and named as mother, spouse, daughter, sister, with elements of description such as birth and death date, and more. But those women have not yet been considered notable enough to be the subject of a separate entry in Wikipedia, and therefore not entered in Wikidata, although often they would provide a missing genealocical link between existing elements.

What about the genealogical relationships figures? Since they are the most ancient and obvious way of linking people, one would think they are very common in Wikidata. Far from it. Less than 10% of all people are linked, as either subject or object, by a parenthood predicate (child, mother, father, sibling, spouse). And focusing on gender again, one can find less than 15,000 mother-daughter links (declared both ways) versus more than 90,000 father-son links. The gender bias shown by the number of relationships is even more obvious than the number of entities.

Things can be done to improve such a situation, using the many existing tools to query Wikidata and report anomalies. For example this missing parent report, listing individuals linked directly to a grandparent without being linked to the in-between parent. In many cases, the missing link can be identified, and added to the data base. Anomaly reports exist for each parenthood relationship. I've started to work on this, one woman at a time.  Half the sky is far away, but I'll do my part.

More detailed introduction to genealogy and linked data, with examples here (in French).

2016-12-16

Meaning, quantum process and inscrutability

The analogy between meaning and measurement in quantum mechanics is something that has been on my mind for quite a while, as attested by a couple of posts from the early years of this blog. I'm therefore walking here an old path, but with a couple of new things in mind, including a quite radical shift in my viewpoint on signification since 2005, and the current lively debate around inscrutability of machine learning algorithms. The following points sum up where I stand today.

Meaning is a process

The Web has been a wide-scale experience in applied semantics, and more and more, in applied semiotics. Our interaction with the Web is using signs, the primordial and main ones being those weird identifiers called URIs. For years, I have, with many others, struggled with the thorny issue of what those URIs actually identify, or denote, or mean, or represent, spent hours in endless debates with Topic Maps and Semantic Web people to figure the difference or similarity between subjects and topics of the former, and resources of the latter. Eventually fed up with those intractable ontological issues, I decided to keep definitely agnostic about them, to focus on the dynamic aspects.

To the question What does it mean? I answer now It means what it does. In other words, the meaning of a URI on the Web is a process, whatever happens when you use it. This process can be technically described and tracked. It includes query processing, client-server dialogue, content negociation and federation, distributed computing, and more and more artificial intelligence. But from the end-user viewpoint, the URI are now hidden under the hood, the interface with the Web using natural language signs like words and sentences, written or spoken, and more and more those application icons on the touchscreen of our mobile devices, simple signs bringing us back to hieroglyphs and magic symbols. Meaning on the Web is the (more and more complex) processing of (more and more simple) signs.

Is this conception of meaning specific to the Web? If one looks closely, the answer is no. Meaning of (often simple) signs outside the Web is also the result of a (often complex) process. Whatever its nature, a sign means nothing outside a process of signification. The Web has simply given us an opportunity to explore this reality in-depth because we have engineered those process, whereas outside the Web those process are given, we use them without question on a daily basis, and we are not aware of their complexity. The more complex the Web is becoming, and the simpler the signs we use to interact with it, the closer it seems to our "natural" (read : pre-Web) semiotic activity.

Meaning process is similar to quantum process

The evolution of the Web is also tackling the difficult issue of meaning in context. The process triggered by the use of a sign is almost never the same. The time of the query, the nature state of your client device, the state of the network, your user preferences, interaction history and rights of access, the content negociation ... make every other URI resolution a unique event. Among all possible meanings, only one is realized.

Here comes the analogy with quantum mechanics. Among all possible states of a system, of which probability distribution might be known with great accuracy, only one is realized in any quantum event. Before the event, the system is described as a superposition of all its possible states. The reduction of this pack of possibles to one realization is technically called collapse of the wave function.

Samely, before you sent a query using a sign, before you click your email icon application, everything is possible. You might have mail or not. Your spam filter might have trashed an important contract. Whatever happens means the collapse of all possible states, but one. This collapse process defines the meaning of the sign at the moment you use it.

The same way in natural conversation you would say "Will you pass me the bowl?" and of all the possible meanings of "bowl" in your interlocutor's mind, all will collapse to zero but the one which indicates the only bowl sitting on the kitchen's table in front of you.

Both meaning and quantum process are inscrutable, and it's OK

The inscrutability of reference has been discussed in depth by Quine in Word and Object (1960). Quine wrote mostly before our world of pervasive information networks, before the Web, and although he died at the eve of the 21st century, he did not write anything about the Web, unless I miss something. Which is too bad, because "Word and Object" in the framework of the Web, and singularly the Semantic Web, translates easily into "URI and Resource", but maybe Quine was a bit too old in the early days of the Web to apply his theories to this new and exciting field.

Therefore, unless I miss something, Quine did not address the reference in the dynamic aspect we discuss here. Reference is inscrutable because it's a process which involves each time a sign is used a very complex and (either in theory or in practice) inscrutable process. In human natural interpretation of signs, this meaning process involves several parts of our brains and perception/action systems in a way we just barely figure. The signs we send to the network are and will be processed in more and more complex and practically inscrutable ways, such as the machine learning algorithms we already see implemented in chatbots.

Quantum process have been known since about one century ago to be inscrutable, although some of its famous founders did not like this frontal attack against determinism at the very heart of the hardest of all sciences. Albert Einstein among others was a fierce opponent to this probabilistic view of the world, defended by quantum mechanics orthodox interpretation, and used a lot of time and energy to defend without success some "hidden variable theory". Inscrutability was here to stay in physics. It seems also here to stay in semiotics, and in information systems. This is a singular convergence, which certainly deserves to be further considered and explored.

[Further reading might include works by Professor Peter Bruza (Queensland University of Technology, Brisbane, Australia) such as Quantum models of cognition and decision or Quantum collapse in semantic space : interpreting natural language argumentation.]

2016-11-18

The right tension of links

By 1990, at the dawn of the Web, Michel Serres was publishing Le Contrat Naturel (further translated into English as The Natural Contract). In this book the philosopher makes a strong and poetic evocation of those collective ventures where contracts are materialized by cords, lines, ropes, such as sailing and climbing. Those lines link people not only to each other, but to their apparatus (sails, winches, harnesses and spikes) and to the harsh natural elements with which they are engaged (wind and waves, ice and rocks). In high sea as in high mountain, in order to ensure the cohesion and security of the team, the lines need to be tightened. And, adds Serres, this tightening is not only a safeguard, it's also a condition for the line to convey information, in a way which is more immediately efficient than language in situations where you can't afford delays in appreciation of situation and decision. If the line is too slack, you do not feel the sail and the wind, you lose connection with your climbing mate. On the other hand, excessive tension means opposition and risk of breaking the line, and being tightly connected must not impede movement. Michel Serres does not mention martial arts, but in his excellent "guide for beginners" Aikido From the Inside Out, Howard Bornstein has similar thoughts in his chapter dedicated to connection. Connection has to be maintained just at the right level of tension, by feeling what he calls the point of first resistance.
When you connect like this, you become one with your partner in a very real, experiential way. When you move, your partner moves, at the same time and in the same direction. You are really one, in terms of movement. Your experience of movement is basically the same as if you were moving entirely by yourself.
Of course, understanding in theory those general principles will not make you an experienced sailor, climber or martial artist. You will have to practice and practice to get the quality of touch enabling you to keep the lines at the right tension, making everyone safe and giving you this wonderful feeling of being one with your teammates, partners, and the world around you.

Our online experience should abide by the same rules. All the links we are texting should be of the same quality as those of sailors, climbers and martial artists, enabling us to move together. In the stormy events we are facing, we need more than ever to reduce the slack in our connections. 

2016-10-19

More things in heaven and earth

Horatio : 
O day and night, but this is wondrous strange!

Hamlet : 
And therefore as a stranger give it welcome.
There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.

Horatio would certainly be as bewildered as we are today by the evergrowing number and diversity of things modern science investigation keeps discovering at a steady pace. A recurrent motto in science papers and articles I stumbled upon lately is more than expected, as the following short review illustrates, traveling outwards from earth to heaven. 

New living species, both living and fossil ones, are discovered almost on a daily basis in every corner of our planet, from the soil of our backyards to the most unlikely and remote places, and more and more studies suggest there are way more to discover than we already have. But the number of living things might be dangerously challenged by the growing number of artificial ones, products of our frantic industry cluttering our homes, backyards, cities and eventually landfills.

Even if a very populated one, our small planet is just itself a tiny thing in the universe, among a growing number of siblings. The number and variety of bodies in the Solar System, as well as the distance we can expect to find them, have been growing beyond expectations. Closer to us, a survey of impacts on the Moon over seven years has yielded more events than expected based on previous models of the distribution of small bodies in the inner Solar System. Images of the solar atmosphere by the SOHO coronograph has yielded an impressive number of spectacular sungrazing comets. And missions to planets have unveiled a wealth of amazing landscapes, comforting hopes to discover life in some of them.

Beyond the exploration of our home stellar system, the discovery of thousands of exoplanets did not come as a real surprise (our star being an exception would have been a big one), but there again we begin to discover more than expected, from an earth-sized planet around the star next door to improbable configurations such as planets orbiting binary stars. Moreover, free-floating, or so-called rogue planets, not tied to any specific star, are certainly cruising throughout our galaxy, and although very few of them have so far been actually detected, due to the extreme difficulty of such observations, some studies suggest they may outnumber the "regular" planets, those orbiting a star. Regarding stars themselves, the most recent catalog contains over one billion of them, which is less than 1% of the estimated total star population of our Milky Way galaxy, while new studies tend to indicate that the number of galaxies in the observable universe is at least one order of magnitude higher than previously thought. Even exotic thingies such as merging black holes, of which detection is now possible based on the transient ripples they create on space-time (aka gravitational waves) appear to be more frequent than expected. And the universe has certainly more in store, including the infamous missing mass, dark matter of which nature remains unknown.

The sheer number of objects unfolding in the depths of space and time is well beyond the grasp of human imagination and cataloguing power, not to mention philosophy. But fortunately the modern Horatio gets a little help from his friends, the machines. The overwhelming tasks of data acquisition, gathering and consolidation, identification, classification, cataloguing, are now more and more delegated to machines. Artificial intelligence, and singularly machine learning technology is beginning to be applied to tasks such as classifying galaxies or transient events. Using such black box systems for scientific tasks is stumbling again on issues linked to inscrutability, which we addressed in the previous post. Scientific enquiry is a very singular endeavour where whatever works is not easily accepted and the use of inscrutable information systems can be arguably considered as a non-starter. 

There are more and more things indeed in heaven and earth that we know of, and we are more and more eager to accept the unknown ones we discover every day. But the ones our poor imagination might be forever unable to fathom are those new ghosts haunting our intelligent machines. Are we ready to welcome those strangers?

[Edited, following +carey g. butler's comments to strikethrough above intelligent. Let me be agnostic about the fact that machine learning systems (or whatever systems to come) are intelligent or not, because I don't know what intelligent means exactly, be it natural or artificial. The "ghostly" point here is inscrutability.]

2016-10-13

I trust you because I don't know why

The ongoing quick and widespread development of neural networks and deep learning systems is triggering many debates and interrogations both practical and conceptual. Among various features of such systems, the most arguable ones are certainly inscrutability and fallibility. A deep learning system builds up knowledge and expertise, as natural intelligence does, by accumulation of experience of a great number of situations. It does better and better with time. But the drawback of this approach is that you can't open the box to understand how it achieves its expertise as you would do with a classical step-by-step algorithm (inscrutability), and the expertise is not 100% proof, it's bound to fail from time to time (fallibility). I've written on some philosophical aspects of those issues, and how they relate to ancient Chinese philosophy (in French here). 

A recent article in Nature entitled "Can we open the black box of AI" presents a very good review of those issues. And the bottom line of this article comforts me in the opinion that either all this debate is moot, or that it is not linked to this specific technology, and not even to any kind of technology. All the debate is to know if we can trust something we don't understand and which is, moreover, bound to fail at some point. This seems to fly in the face of centuries of science and technology development all based on understanding and control. 

Do we control and understand everything we trust? Or more exactly, do we need to understand and control before we trust? Most of the time, no. As children, we trust our parents and adult world to behave properly without understanding the why's and how's of this behavior. And if, growing up, we start trying to question those why's and how's, it might happen that for some reason we lose that trust. When I trust a friend to achieve what she promised, I won't, or a least I should not, try to control and check if she will do it or not, and how. Trust, in fact, if exactly the opposite of control. You trust because you can't afford to, or have not the technical or conceptual tools to, or simply believe it's useless, counter-productive or simply rude to understand and control.
That line of thought applies to more simple things that people. If I cross a bridge over a river, I don't check, and do not understand, most of the time, how it's built. I begin to check it if for some reason it seems poorly built, or rotten, looking like no one else has used it for ages. You trust food you eat because you trust your provider, you generally don't check the food chain again and again. You start to check when you suspect this chain to present some serious point of failure. It's not check before trusting, it's check because for some reason you don't trust anymore. The other way round is called paranoia.
Most of the time, you trust things to work safely as expected because so far they mostly did work safely. Based on experience, not logical analysis of how it works.This includes, and actually begins with, your own body and brain. Looking further at the world around you, you discover black boxes everywhere, and it's all right. Starting to check and control how they work is likely to lead you in some infinite recursion of effects and causes, and you will either reasonably stop at some point saying "well, it's gonna be all right", or pass the rest of your life lost in metaphysical and ontological mist, and fear of any action.
Let's face it. We trust before and without understanding and controlling. Every second of every day. And most of the time it's OK. Until it fails, at some point. We know that it will. We trust our body and brain in order to live, although we know they are bound to break down at some point. We are aware that things and people we trust are bound to fail once in a while. That's just how life goes. Parents have a second of distraction and a child dies crossing the street. Friends are stuck in a traffic jam, don't show up on time and miss their flight, bridges collapse in sudden earthquakes, hard drives break down, light bulbs explode, lovers betray each other ...
Despite of our awareness of such risk of failure, we keep trusting, and call this hope. Without trust we lose hope, and fall into depression and despair. This is a basic existential choice : trust and live, or try to control and understand everything, ask for total security, and despair because you can't find it. We trust each other although, and actually because, we don't know why. And knowing that each of us will eventually fail some day, if only once at this ultimate individual failure point which is called death, should make each of us more prone to forgiveness. 

Let me borrow those final words from the brand new and unexpected Nobel Prize in Literature

Trust yourself
Trust yourself to do the things that only you know best
...
Trust yourself
And look not for answers where no answers can be found
...

2016-08-30

Immortality, a false good idea

Immortality is trendy. According to some so-called "transhumanists", it is the promise of artificial intelligence at short or medium term, at the very least before the end of the 21st century. Considering the current advances in this field, we are bound to see amazing achievements which will shake our very notions of identity (what I am) and humanity (what we are). If I can transfer, one piece after another, neuron after neuron, organ after organ, each and every element which makes my identity into a human or machine clone of myself, supposing this is sound in theory and doable in practice, will this duplicate of myself still be myself? The same one? Another one? And if I make several clones, which one will be the "true" one? Do such questions make any sense at all? All this looks really like just another, high-tech, version of the Ship of Theseus, and our transhumanists provide no more no less than the ancient philosophers answers to the difficult questions about permanence and identity this old story has been setting, more than two thousand years ago.
None of those dreamers seem to provide a clear idea of how this immortality is supposed to be lived in practice, if ever we achieve it. A neverending old age? Not really a happy prospect! No, to be sure, immortality is only worth it if it goes with eternal youth! And even so, being alone in this condition, and seeing everyone else growing old and die, friends, family, my children and their children, does not that amount to buying an eternity of sorrow? Not sure how long one could stand that. But wait, don't worry, our transhumanists will claim, this is no problem because just everybody will be immortal! Everybody? You mean every single one of the 10 billion people expected to be living by 2100? Or only a very small minority of wealthy happy few? But let's assume the (highly unlikely) prospect of generalized immortality by 2050. In that case it will not be 10 but 15 billion immortal people at the end of the century if natality does not abate.That's clearly not sustainable. But maybe when everyone is immortal, there will be no need to have children anymore, and maybe even at some point it will be forbidden due to shrinking resources. Instead of seeing your children die like in the first scenario, you will not see children anymore. Not sure which one is the worst prospect!
Either way, alone or all together, immortality is definitely not a good idea. And if it were, life would have certainly invented and adopted it long ago. But since billions of years, evolution and resilience of life on this planet despite all kinds of cataclysms (the latest being humanity itself) is based on a completely different strategy. For a species to survive and evolve, individual beings have to die and be replaced by fresh ones, and for the life itself to continue, species have to evolve and eventually disappear, replaced by ones more fit to changing conditions.
So let's forget about actual immortality. We have many technical means to record and keep alive for as long as possible the memory of those who are gone, if they deserved it. To our transhumanists I would suggest to simply make their lives something worth remembering. It's a proven recipe for the only kind of immortality which is worth it, the one living in our memories.

[This post is available in French here]