Telegram, AKA "Stand back, we have Math PhDs!"

Disclaimer: this post is now very old and may not reflect the current state of Telegram's protocol. There has been other research in the meantime, and this post should not be used for your choice of secure messaging app. That said, on a personal note, I still think Telegram's cryptosystem is weird, and its justifications are fallacious. If you want a recommendation on secure messaging apps: use a system based on the Axolotl/Signal protocol. It is well designed and has been audited. Signal and WhatsApp are both using that protocol, and there are others.

Here is the second entry in our serie about weird encryption apps, about Telegram, which got some press recently.

According to their website, Telegram is "cloud based and heavily encrypted". How secure is it?

Very secure. We are based on a new protocol, MTProto, built by our own specialists, employing time-tested security algorithms. At this moment, the biggest security threat to your Telegram messages is your mother reading over your shoulder. We took care of the rest.

(from their FAQ)

Yup. Very secure, they said it.

So, let's take a look around.

Available technical information

Their website details the protocol. They could have added some diagrams, instead of text-only, but that's still readable. There is also an open source Java implementation of their protocol. That's a good point.

About the team (yes, I know, I said I would not do ad hominem attacks, but they insist on that point):

The team behind Telegram, led by Nikolai Durov, consists of six ACM champions, half of them Ph.Ds in math. It took them about two years to roll out the current version of MTProto. Names and degrees may indeed not mean as much in some fields as they do in others, but this protocol is the result of thougtful and prolonged work of professionals

(Seen on Hacker News)

They are not cryptographers, but they have some background in maths. Great!

So, what is the system's architecture? Basically, a few servers everywhere in the world, routing messages between clients. Authentication is only done between the client and the server, not between clients communicating with each other. Encryption happens between the client and the server, but not using TLS (some home made protocol instead). Encryption can happen end to end between clients, but there is no authentication, so the server can perform a MITM attack.

Basically, their threat model is a simple "trust the server". What goes around the network may be safely encrypted, although we don't know anything about their server to server communication, nor about their data storage system. But whatever goes through the server is available in clear. By today's standards, that's boring, unsafe and careless. For equivalent systems, see Lavabit or iMessage. They will not protect your messages against law enforcement eavesdropping or server compromise. Worse: you cannot detect MITM between you and your peers.

I could stop there, but that would not be fun. The juicy bits are in the crypto design. The ideas are not wrong per se, but the algorithm choices are weird and unsafe, and they take the most complicated route for everything.

Network protocol

The protocol has two phases: the key exchange and the communication.

The key exchange registers a device to the server. They wrote a custom protocol for that, because TLS was too slow and complicated. That's true, TLS needs two roundtrips between the client and the server to exchange a key. It also needs x509 certificates, and a combination of a public key algorithm like RSA or DSA, and eventually a key exchange algorithm like Diffie-Hellman.

Telegram greatly simplified the exchange by requiring three roundtrips, using RSA, AES-IGE (some weird mode that nobody uses), and Diffie-Hellman, along with a proof of work (the client has to factor a number, probably a DoS protection). Also, they employ some home made function to generate the AES key and IV from nonces generated by the server and the client (server_nonce appears in plaintext during the communication):

  • key = SHA1(new_nonce + server_nonce) + substr (SHA1(server_nonce + new_nonce), 0, 12);
  • IV = substr (SHA1(server_nonce + new_nonce), 12, 8) + SHA1(new_nonce + new_nonce) + substr (new_nonce, 0, 4);

Note that AES-IGE is not an authenticated encryption mode. So they verify the integrity. By using plain SHA1 (nope, not a real MAC) on the plaintext. And encrypting the hash along with the plaintext (yup, pseudoMAC-Then-Encrypt).

The final DH exchange creates the authorization key that will be stored (probably in plaintext) on the client and the server.

I really don't understand why they needed such a complicated protocol. They could have made something like: the client generates a key pair, encrypts the public key with the server's public key, sends it to the server with a nonce, and the server sends back the nonce encrypted with the client's public key. Simple and easy. And this would have provided public keys for the clients, for end-to-end authentication.

About the communication phase: they use some combination of server salt, message id and message sequence number to prevent replay attacks. Interestingly, they have a message key, made of the 128 lower order bits of the SHA1 of the message. That message key transits in plaintext, so if you know the message headers, there is probably some nice info leak there.

The AES key (still in IGE mode) used for message encryption is generated like this:

The algorithm for computing aes_key and aes_iv from auth_key and msg_key is as follows:

  • sha1_a = SHA1 (msg_key + substr (auth_key, x, 32));
  • sha1_b = SHA1 (substr (auth_key, 32+x, 16) + msg_key + substr (auth_key, 48+x, 16));
  • sha1_с = SHA1 (substr (auth_key, 64+x, 32) + msg_key);
  • sha1_d = SHA1 (msg_key + substr (auth_key, 96+x, 32));
  • aes_key = substr (sha1_a, 0, 8) + substr (sha1_b, 8, 12) + substr (sha1_c, 4, 12);
  • aes_iv = substr (sha1_a, 8, 12) + substr (sha1_b, 0, 8) + substr (sha1_c, 16, 4) + substr (sha1_d, 0, 8);

where x = 0 for messages from client to server and x = 8 for those from server to client.

Since the auth_key is permanent, and the message key only depends on the server salt (living 24h), the session (probably permanent, can be forgotten by the server) and the beginning of the message, the message key may be the same for a potentially large number of messages. Yes, a lot of messages will probably share the same AES key and IV.

Edit: Following Telegram's comment, the AES key and IV will be different for every message. Still, they depend on the content of the message, and that is a very bad design. Keys and initialization vectors should always be generated from a CSPRNG, independent from the encrypted content.

Edit 2: the new protocol diagram makes it clear that the key is generated by a weak KDF from the auth key and some data transmitted as plaintext. There should be some nice statistical analysis to do there.

Edit 3: Well, if you send the same message twice (in a day, since the server salt lives 24h), the key and IV will be the same, and the ciphertext will be the same too. This is a real flaw, that is usually fixed by changing IVs regularly (even broken protocols like WEP do it) and changing keys regularly (cf Forward Secrecy in TLS or OTR). The unencrypted message contains a (time-dependent) message ID and sequence number that are incremented, and the client won't accept replayed messages, or too old message IDs.

Edit 4: Someone found a flaw in the end to end secret chat. The key generated from the Diffie-Hellman exchange was combined with a server-provided nonce: key = (pow(g_a, b) mod dh_prime) xor nonce. With that, the server can perform a MITM on the connection and generate the same key for both peers by manipulating the nonce, thus defeating the key verification. Telegram has updated their protocol description and will fix the flaw. (That nonce was introduced to fix RNG issues on mobile devices).

Seriously, I have never seen anyone use the MAC to generate the encryption key. Even if I wanted to put a backdoor in a protocol, I would not make it so evident...

To sum it up: avoid at all costs. There are no new ideas, and they add their flawed homegrown mix of RSA, AES-IGE, plain SHA1 integrity verification, MAC-Then-Encrypt, and a custom KDF. Instead of Telegram, you should use well known and audited protocols, like OTR (usable in IRC, Jabber) or the Axolotl key ratcheting of TextSecure.

Handling IO failure

Let's talk a bit about IO programming. Filesystem, network, GUI... You cannot write useful code without doing IO these days. So why is it so damn hard to do in "safe" languages like Haskell?

Well, in Haskell, you isolate the unsafe parts to be able to reason safely on the rest of the code. What does "unsafe" mean in that context? Mostly, unsafe code is unpredictable, non deterministic and will fail for a number of reasons independent from your program's code.

Surely, you might think "come on, it cannot be that hard, I'm doing HTTP requests everywhere in my code, and there is no problem". Well, let's see how a simple HTTPS request could fail:

  • you are disconnected from the network (you have no IP)
  • you are connected to the network, but it is not connected to anything
  • your network is connected to Internet, but routers are dropping packets
  • your network is connected to Internet, but very slow
  • your DNS server is unreachable
  • your DNS server drops your packets
  • your DNS server cannot parse your request
  • your DNS server cannot contact other server to get your answer
  • your DNS server sends back an invalid response
  • your DNS server sends back an outdated response
  • you cannot reach the web server's IP from your network
  • the web server drops your packets silently before connecting
  • the web server connects, then drops the connection silently
  • the web server rejects your connection
  • the web server cannot parse your packets, and so, rejects them
  • the web server timeouts
  • the server's certificate is expired
  • the server's certificate is not for the right subject name
  • the server's certification chain has parts missing
  • the server's certification chain has an unknown root
  • the server's certificate was revoked
  • the packet's signatures are invalid
  • your user agent and the server do not support the same versions of TLS
  • your user agent and the server do not have common cipher suites
  • the web server closes the connection without warning
  • the web server timeouts
  • the web server crashes
  • the web server cannot parse your HTTP request and rejects it
  • your request is too large
  • the web server parses your HTTP request correctly, but your cookie or OAuth token is invalid
  • the data you requested does not exist
  • the data you requested is elsewhere
  • your user agent does not support the mime type of the data
  • the data requested is too large for a simple response
  • the server only sends a part of the data, then drops the connection
  • your user agent cannot parse the response
  • your user agent can parse the data, but some way or another, it is invalid

If you have worked for some time with networks, all of those have probably happened to you at some point (and the list is not nearly exhaustive). What did you do in your code? Did you handle all these exceptions? Did you catch all the exceptions (see what I did there)? Do you check for all the error codes? Do you retry the requests where you need to?

Let's face it: most of the network handling code out there is made of big chunks of procedural code, without much error handling, in blocking mode. In most cases, it is ok. But that is sloppy programming.

Safe languages do not allow you to write sloppy code like that. So, we are stuck between correct but overly complex code, and simple but failing code. Choose your weapons.

Personally, I prefer isolating unsafe code in asynchronous systems like futures or actors. I know failure will happen, I know threads will crash, I know I will make errors in my code. That is ok, it happens. So, let's write robust code to handle failure.

For network errors, I just want to know if the server is unreachable. It is ok, I will try later. If my request's authentication is rejected, I want to know, and must handle that failure. Some errors should be handled seriously, others must be put in the "ok, it failed, whatever" bin.

Even if languages like Haskell make it harder to perform IO safely, they are still good tools, because they let you isolate unsafe parts, to let you reason on safe, deterministic parts of the program.

P.S.: ok, the network case was maybe a bit too much. Surely, filesystem usage will be easier? Just for the fun, let's list some possible failures when you want to open a file for reading and writing:

  • invalid path
  • correct path, but you do not have the permission
  • correct path, you have the permission, but the file does not exists
  • you do not have the permission to create the file
  • you check that the file does not exists, then you try to create it, but someone already created it in the meantime (fun security bug, that one)
  • the file exists, but someone is already writing on it, no concurrent access
  • you have the handle you want on the file, but someone just deleted it
  • not enough file descriptors available (oh, please, no)
  • someone is writing to the file at the same time
  • there are so many page faults that your program is slowed down
  • the disk is slow, blocking on a large operation
  • the disk is full
  • you checked that you have enough room, but someone is filling the disk at the same time
  • the file is on a networked file system, and it is slow
  • the file is on a remote disk, and the network just failed
  • hardware failure in the disk
  • hardware failure in the RAID array (and for some reason, redundancy was not enough, you lost the data)
  • the file is on a USB card that someone just unplugged

Basically, IO is a nightmare. Please wake me up now.

SafeChat, P2P encrypted messages?

For the first article in the new post serie about "let's pick apart the new kickstarted secure decentralized software of the week", I chose SafeChat, which started just two days ago. Yes, I like to hunt young preys :p

A note, before we begin: this analysis is based on publicly available information at the time of writing. If the authors of the project give more information, I can update the article to match it. The goal is to assert, with what little we know about the project, if it is a good idea to give money to this project. I will only concentrate on the technical parts, not on the team itself (even if, for some of those projects, I think they're idiots running with scissors in hand).

What is SafeChat?

Open source encryption based instant messaging software

SafeChat is a brilliantly simple deeply secure instant messaging system for mobile phones and computers

SafeChat is an instant messaging software designed by Commercial Free. There is no real indication about who really works there, and where the company is based, except for David Crawford, who created the Kickstarter project and is based in Montreal in Canada.

Note that SafeChat is only a small part of the services they want to provide. Commercial Free will also have plans including an email encryption service (no info about that one) and cloud storage.

Available technical information

There is not much to see. They say they are almost done with the core code, but the only thing they present is some videos of what the interaction with the app could be.

Apparently, it is an instant messaging application with Android and iOS applications and some server components.  Session keys are generated for the communication between users. They will manage the server component, and the service will be available with a yearly subscription.

It seems they don't want to release much information about the cryptographic components they use. They talk about "peer to peer encryption" (lol) which is open source and standard. If anyone understands what algorithm or protocol they refer to, please enlighten me. They also say they will mix in some proprietary code (so much for open source).

I especially like the part about NIST. They mock NIST, telling that they have thrown "all standard encryption commonly used today out the window". I am still wondering what "open source and standard peer to peer encryption" means.

Network protocol

The iOS and Android applications will apparently provide direct communication between users. I guess that from their emphasis on P2P, but also from the price they claim: $10 per user per year would be a bit small to pay for server costs if they had to route all the messages.

P2P communication between phones is technically feasible. They would probably need to implement some TCP hole punching in their solution, but it is doable.

Looking athe the video, it seems there is a key agreement before communication. I do not really like the interaction they chose to represent key agreement (with the colors and the smileys). There are too many different states, while  people only need to know "are we safe now?"

I am not sure if there is a presence protocol. The video does not really show it. If there is no presence system, are messages stored until the person is online? Stored on the server or on the client? Does the server notify the client when the person becomes available?

Cryptography

By bringing together existing theories of cryptography and some proprietary code to bind them together, we are making a deeply encrypted private chatting system that continues to evolve as the field of cryptography does.

Yup, I really feel safe now.

Joke aside, here is what we can guess:

  • session keys for the communication between users. I don't know if it is a Diffie-Hellman based protocol
  • no rekeying, ie no perfect forward secrecy
  • no info on message authentication or integrity verification
  • I am not sure if the app generates some asymmetric keys for authentication, if there is trust on first use, or whatever else
  • the server might not be very safe, because they really, really want to rely on German laws to protect it. If the crypto was fully managed client side, they would not care about servers taken down, they could just pop another somewhere.

There could be a PKI managed by Commercial Free. That would be consistent with the subscription model (short lived certificates is an easy way of limiting the usage of a service).

Threat model

Now, we can draw the rough threat model they are using:

What we want to do is make it impractical for an organization to snoop your communications as it would become very hard to find them and then harder still to decrypt them.

Pro tip: a system with a central server does not make it hard to find communications.

Attacker types:

  • phone thief: I don't think they use client side encryption for credentials and logs. Phone thiefs and forensics engineers won't have a real problem there
  • network operator: they can disrupt the communication, but will probably not be able to decrypt or do MITM (I really think the server is managing the authentication part, along with setting up the communication)
  • law enforcement: they want to rely on German laws to protect their system. At the same time, they do not say they will move out to Germany to operate the system. If they stay in Canada, that changes the legal part. If they use a certificate authority, protecting the server will be useless, because they can just ask the key at the company.
  • server attacker: the server will probably be Windows based (see the core developer's skills). Since that design is really server centric, taking down the server might take down the whole service. And attacking it will reveal lots of interesting metadata, and probably offer MITM capabilities
  • nation state: please, stop joking...

So...

Really, nothing interesting here. I do not see any reason to give money to this project: there is nothing new, it does not solve big problems like anonymous messaging, or staying reliable if one server is down. Worse, it is probably possible to perform a MITM attack if you manage the server. Nowadays, if you create a cryptographic protocol with client side encryption, you must make sure that your security is based on the client, not the server.

Alternatives to this service:

  • Apple iMessage: closed source, only for iOS, encrypted message, MITM is permitted for Apple by the protocol, but "we have not architected the server for this". Already available.
  • Text Secure by OpenWhisperSystems: open source, available for iOS and Android, uses SMS as a transport protocol, uses OTR (Off the Record protocol) to protect the communication, no server component. Choose Text Secure! It is really easy to use, and OTR is well integrated in the interface.

The rules of security by obscurity

The first rule of security by obscurity is: DON'T

There, I said it. Now you can stop reading. Or you can continue. But watch where you step.

Security by obscurity is generally frowned upon, because people have relied on it as their only layer of defense. When you think that nobody will be able to reverse engineer your clever code, or find that specific file holding all the secrets, you have already lost. People got quite efficient at reverse engineering and finding secrets.

That's why it is recommended to rely on safer algorithms and techniques to protect your system. They have been tested, and were created especially for that purpose. That can be encryption algorithms, authorization systems, etc.

The second rule of security by obscurity is: you should not need it

Defense is hard because of information asymmetry: the attacker often has more information than you to approach the system. That could be 0-day vulnerabilities, or the knowledge that you misconfigured one of your servers. The attacker has basically more time and more money than what you can spend on security.

The tools you have at hand? Patching the code, controlling authorizations, verifying your logs... You get a system that should not be easy to attack even if the attacker is familiar with the underlying software.

Except that the attacker might be well informed on the problems of deploying that particular CMS. Or there's a specific vulnerability you don't know about that is exploited automatically by botnets...

The third rule of security by obscurity is: DON'T, but...

The goal of your defense layers is to drive up the cost of an attack. A really motivated attacker will always find a way (with enough ressources, they could break in your office to steal data directly). So, what you want is to make an attack costly, to drive off attackers with less ressources.

And what will security through obscurity provide you? Time! They will provide you with time to defend, and waste the attacker's time. It is in no way enough to protect you, but it can give you a lot of benefits:

  • confuse automated tools: moving some specific files or pages of your CMS out of their default location will prevent most automated attacks. That will not stop human attackers, but they might need to modify their scripts, and we all know that's annoying :p
  • slow down information gathering: removing the server headers is a pretty standard practice.  Some more sophisticated tools might be able to guess server version and/or framework type in other ways, but basic tools will not.
  • lie to the attackers to send them through a honeypot. Then, you can observe them, learn about their process and prepare for other attacks.
  • detect suspect behaviour, like bruteforce queries, and send bogus data instead of rejecting them. That means more data to analyze manually for the adversary.
  • Are you trying to send encrypted data? Do you really need a public handshake protocol, displaying the whole algorithm negotiation? Sometimes, communicating directly with a pre shared key and pre negotiated algorithms will work just fine, and only show garbage to the attacker.

See the pattern there? We already know ressourceful attackers will get past those false defenses. But the point is to make them waste time in basically three ways:

  • forbid automated attacks, they should ressort to manual ways
  • make them work to get useful information
  • mess with their heads

Security by obscurity is basically the fun side of defense, because you're always looking for ways to annoy the adversary ;)

The fourth rule of security by obscurity is: you are not the attacker

While those defense techniques can be useful, be aware that they can significantly hamper your day to day work. They should not prevent you from managing your system correctly. Every "WTF" moment for an attacker might be a "WTF" moment for one of your developers, system administrators, or even worse, one of your users.

Worse, sometimes, it might annoy you, but will be bypassed easily by the adversary (do not underestimate their ability to write clever automated tools).

So, be careful with security by obscurity, do not rely only on it, and have fun annoying the attackers :)

Theoretical definitions for crypto wannabes

Every week, I hear about a new secure software designed to protect your privacy, thwart the NSA/GCHQ and save kittens. Most of the time, though, they're started by people that are very enthusiastic yet unskilled.

They tend to concentrate directly on choosing algorithms and writing code, instead of stepping back and thinking a bit about what they want to develop.

Sure, they probably spent some time saying things like:

  • that piece of data should absolutely be encrypted
  • users will all have key pairs to authenticate themselves
  • we should use AES, that's the safest choice (what are theses "modes" you're talking about?)

That is not how you design a protocol. That is not how you design a software using encryption. And that is not how you will design the next secure distributed social network.

To design your system, you need three things:

  • a good threat model
  • theoretical tools addressing the threats
  • algorithms implementing these theoretical tools

As you see, most of the projects only have the third item, and that's insufficient to design a correct system. If you don't have a good threat model, you don't have a good mental model of your users and attackers, their means and their objectives. If you don't have the theoretical tools, you will try to shoehorn your favorite algorithm on the problem without knowing if it really fits (example: using hash algorithms to store passwords :p).

So, in this post, I'll provide those (simplified) theoretical definitions. You will probably recognize some of them.

High level view

First, you need to forget notions like "privacy", use any of these terms to describe the properties you want to achieve:

  • authentication: you can recognize which entity you are communicating with
  • authorization: the entity cannot get access to data it has no permission on (note that it is different from authentication)
  • confidentiality: the data should not be readable by an entity that has no permission on it (it can be protected by crypto, but also by policies in the code)
  • integrity: unauthorized modification of the data can be detected and marked as invalid
  • non repudiation: an entity cannot deny it has executed an action
  • deniability: an entity _can_ deny it has executed an action

Ok, now that we have some basic properties, let's apply them: think for a long time about the actors of the system (users, malicious users, admins, sysadmins, random attacker on the network, etc), what authorizations they have, what they should not get access to, what data moves on the network and between whom.

You should now have a very basic threat model and a rough overview of your system or protocol: you know what part of the network communications should be confidential, you know where you would need to authenticate.

You will now need some ideas about the type of attacks that could happen to your system, because you probably did not think of everything. Separate your systems in logical parts (like "client", and "server", etc), observe them, and observe how they communicate.

Common attacks

Security properties

Here are some security properties that will be useful when you will try to choose algorithms later:

  • Random oracle: a system that answers deterministically every question with a random answer from its answer space. You cannot predict what it will answer, but if you send the same question twice, it will answer the same both times.
  • Perfect secrecy: for any two plaintext messages of same size, an attacker cannot distinguish which plaintext maps to which ciphertext. Basically, the adversary learns nothing from the ciphertext only.
  • Semantic security: same thing as perfect secrecy, except what the adversary learns on the plaintext is negligible. example: One Time Pad

Security tests

those properties can be tested by creating a "game", where the attacker tries to guess information on the data:

  • IND-CPA (indistinguishability under the chosen plaintext attack) test: the adversary can generate as many ciphertexts as he wants. Then, he chooses two messages m0 and m1 of same length, those messages are encrypted, and one of the ciphertexts is sent back to him. The adversary should not be able to guess which message was used to generate that ciphertext (note that this is just one way of testing for CPA, there are many other schemes, some with stronger properties)
  • IND-CCA (indistinguishability under the chosen ciphertext attack): the attacker can get the decryption of arbitrary ciphertexts, but should not, from this, be able to decrypt any other ciphertext.

Attack patterns

Here are some common attack types that can be applied to crypto protocols. The list is not exhaustive, and covers only crypto attacks: there are many more ways to attack a system.

  • Replay attacks: the attacker has observed some valid (encrypted or not) data going on the wire, and tries to send it again. Obviously, it should not be accepted
  • MITM (Man In The Middle): the attacker can observe and modify live data running between two actors of the system. In the worst case, the attacker should not be able to forge valid data, decrypt data, or impersonate one of the users. In the best case, it should be detectable.
  • Oracle attack: when an algorithm or protocol has a part that can act as an oracle (can be asked something and give answer, like a server), an attacker could exploit flaws in the algorithm to get useful information on the data (then the oracle is not a random oracle). Timing attacks are part of this type of attack. See also padding oracle attacks, or the recent BREACH attack on TLS.
  • Offline attack: the attacker got access to some encrypted data (on the wire, or by accessing a disk somewhere), stored it, and tries to decrypt it for an amount of time

You should now have a better view of the system: what are the parts of the system that need protection, what attacks they must resist, and what properties they should have.

That means we can go to the next part: choosing the tools to implement the solution.

General cryptographic functions

No, we will not choose algorithms right now. That would be too easy :D

We will choose from a list of cryptographic constructions that implement some of the security properties of the system, and combine them to meet all the needed properties:

  • Secure Pseudo Random Function: function from spaces K (keys) and X (message) to Y (other message space). The basic definition is that if you choose randomly one of these functions (like choosing randomly a k from K), its output will appear totally random (testable with IND-CPA).
  • Pseudo Random Permutation: this is a PRF where X and Y are the same space. It is bijective (every y from Y maps to exactly one x from X, and every y is an output of the function), and there exists an efficient inversion function for it (from Y to X). Example: AES (in ECB, which is unsafe for common use)
  • Message Authentication Code: defines a pair of algorithms. One of them takes a key k and a message m and outputs a code c. The other algorithm takes k, m, c and outputs True or False. An attacker should not be able to forge a valid c without knowing k. A PRF could be used to construct a MAC system. A hash function too. Example: HMAC
  • Authenticated encryption: an encryption function (with semantic security under the CPA) where the attacker cannot forge new ciphertexts that decrypt correctly. Example: AES-GCM.
  • Hash function: collision resistance (cannot find different messages m1, m2 so that H(m1) == H(m2), with different collision levels). Not easily invertible. Usually, they are fast. Example: SHA2.
  • Trapdoor function: a function that is easy to compute, for which finding its inverse is hard, unless you have specific information. (secure under IND-CCA). examples: RSA, DSA.
  • Zero knowledge proof: a way to prove something to the other party in a communication, without giving her any info, except the proof.

There are a lot of other constructions, depending on your needs, from low level algorithms like the Diffie Hellman key exchange to higher level protocols like OTR. Again, the constructions will depend on the security properties.

Choosing the algorithms

Can we do it now? YES! But there are rules. You must not choose an algorithm because it's hype or because someone said so in an old book. Basically, you choose algorithms that implement the properties you need (like authenticated encryption), and you choose the parameters of the algorithm (key size, exponent, elliptic curve) depending on the strength you need. Basically, a key size can define how much time encrypted data should remain impossible to decrypt. Those parameters also define the performance of the algorithm. Don't choose them without consulting experts, or you will face problems similar to those encountered by the projects that used low RSA exponents (it looks good from a performance standpoint, but it introduces very bad security).

Am I done now?

Nope. We have only define some very high level parts. Creating a protocol implies a lot of thoughts on:

  • how you establish the communications
  • protocol negotiation (version, algorithms, etc)
  • key exchange
  • authentication
  • nonce usage
  • storing sessions
  • handling lost connections
  • renegotiation
  • closing the connection
  • etc.

As you can see, designing a protocol involves a lot more than choosing a few algorithms. Note that this was only a very rough overview of what you would need to create a safe system. And we did not even start coding!

So, if you want to build the next privacy protecting system, please talk to experts. They don't necessarily want to make you feel bad. They just have a lot of formal tools and the experience needed to see what will not work.