Revisiting Zooko's triangle2014-04-07
In 2001, Zooko Wilcox-O'Hearn conjectured that a key value system, in the way the keys address those values, must make a compromise between three properties. Those properties are:
- distributed: there is no central authority in the system (and the other nodes of the system are potentially untrusted)
- secure: a name lookup cannot return an incorrect value, even in the presence of an attacker
- human usable keys: keys that a human will be able to remember and write without errors
The conjecture stated that you may have at most two out of these three properties in the system.
Examples of those properties at work:
- the DNS is secure (one possible record for a given name) and human usable (domain names can be learned), but not distributed (the server chain is centralized)
- the PGP mapping between email and public keys is distributed (no central authority defining which email has which key) and human usable (public keys are indexed by emails) but not secure (depending on your view of the web of trust, identities could be falsified)
- Tor .onion addresses are secure (the address is the hash of the public key) and distributed (nobody can decide that an address redirects to the wrong server) but not human meaningful (have you seen those addresses?)
Some systems, like NameCoin, emerged lately and exhibited those three properties at the same time. So is the conjecture disproved? Not really.
When considering that triangle, people often make a mistake: assuming that you have to choose two of these properties and will never approach the third. These properties must not be seen as absolute.
Human usability is easily addressed with Petname systems, where a software assistant can help you add human-meaningful names to secure and distributed keys (at risk of collision, but not system wide).
You can add some distributed features in a centralized system like SSL certificate authorities by accepting multiple authorities.
NameCoin is distributed and human meaningful, and creates a secure view of the system through the blockchain.
And so on.
The three properties should be redefined with what we know ten years later.
Human usable names
We now understand that a human meaningful name is a complex concept. We see people typing "facebook" in a search engine because they would not remember "facebook.com". Phone numbers are difficult to remember. Passwords, even when chosen through easy to remember methods, will be repeated at some point. And we could also talk about typosquatting...
Here is a proposition: a human usable key is an identifier that a human can remember easily in large numbers and distinguish as easily from other identifiers. I say identifier because it could be a string, a picture, anything that stands out enough to be remember and for which differences could be easily spotted.
That definition is useful because it is quantifiable. We can measure how many 10 digits phone numbers a human can remember. We can measure how many differences a human can spot in two strings or pictures.
Decentralized and/or secure names
The original text indicates distributed as "there is no central authority which can control the namespace, which is the same as saying that the namespace spans trust boundaries". The trust boundary is defined as the space in which nodes could be vulnerable to each other (because they trust each other). In a fully distributed system, a node may not be able to force any other node to have a different view of the system.
Secure means that there is only one correct answer for a name lookup, whoever does the lookup. This is the same as saying that there is a consensus (at least in a distributed system).
We can see that the decentralized and secure properties are at odds with the consensus problem in a distributed system. This is where systems like NameCoin make the compromise. They exchange an absolute secure view of the system for an eventual consistency, and fully distributed trust boundaries for byzantine problems. They guarantee that, at some point, and unless there are too many traitors in the systems, there will be a universal and unique mapping for a human readable name.
So, we could replace the "secure" property by: the whole system recognizes that there is only one good answer for a record at some point in the future. Again, this is measurable, and it also addresses the oddity in DNS that records have to propagate (so sometimes the view can differ). Systems where identifiers do not give unique responses will not satisfy that definition.
The last property, "decentralized", can be formulated like this: a node's view of the names cannot be influenced by a group of unauthorized nodes. In a centralized system, where there is essentially one node, this definition cannot apply. In a hierarchy of nodes, like DNS, we can easily see that a node's view can be abused, since there is only one node you need to compromise to influence it, its first nameserver (this is used, for good and for bad reasons, in a lot of companies). In a distributed system, this definition becomes quantifiable and joins the byzantine problem: how many traitor nodes do you need to modify a node's view of the system?
Here, we have three new definitions that are more nuanced, allow some compromise and modulation of the needs, and for which the limits can be measured or calculated. These quantified limits can be useful to describe and compare the future naming system propositions.