How to rewrite your project in Rust

In a previous post, I explained why rewriting existing software in Rust could be a good idea. The main point being that you should not rewrite the whole application, but replace the weaker parts without disturbing most of the code, to strengthen the codebase without disruption.

I also provided pointers to projects where other people and I did it succesfully, but without giving too many details. So let’s get a real introduction to Rust rewrites now. This article requires a little bit of knowledge about Rust, but you should be able to follow it even as a

As a reminder, here are the benefits Rust bring into a rewrite:

  • it can easily call C code
  • it can easily be called by C code (it can export C compatible functions and structures)
  • it does not need a garbage collector
  • if you want, it does not even need to handle allocations
  • the Rust compiler can produce static and dynamic libraries, and even object files
  • the Rust compiler avoids most of the memory vulnerabilities you get in C (yes, I had to mention it)
  • Rust is easier to maintain than C (this is discutable, but not the point of this article)

As it turns out, this is more or less the plan to replace C code with Rust:

  • import C structures and functions in Rust
  • import Rust structures and functions from C
  • reuse the host application’s memory allocations whenever possible
  • write code (yes, we have to do it at some point)
  • produce artefacts that can be linked with the host application
  • integrate with the build system

We’ll see how to apply this with examples from the Rust VLC plugin.

Import C structures and functions in Rust

Rust can easily use C code directly, by writing functions and structures definitions. A lot of the techniques you would use for this come from the “unsafe Rust” chapter of “The Rust Programming Language” book. For the following C code:

struct vlc_object_t {
    const char   *object_type;
    char         *header;
    int           flags;
    bool          force;
    libvlc_int_t *libvlc;
    vlc_object_t *parent;

You would get the following Rust structure:

extern crate libc;
use libc::c_char;

pub struct vlc_object_t {
  pub psz_object_type: *const c_char,
  pub psz_header:      *mut c_char,
  pub i_flags:         c_int,
  pub b_force:         bool,
  pub p_libvlc:        *mut libvlc_int_t,
  pub p_parent:        *mut vlc_object_t,

the #[repr(C)] tag indicates to the compiler that the structure should have a memory layout similar to the one generated by a C
compiler. We import types from the libc crate, like c_char. Those types are platform dependent (with their different form already handled in libc). Here, we use a lot of raw pointers (indicated by *), which means by using this structure directly, we’re basically writing C, which is no good! A good approach, as we’ll see later, is to write safer wrappers above those C bindings.

Importing C functions is quite straightforward too:

ssize_t  vlc_stream_Peek(stream_t *, const uint8_t **, size_t);
ssize_t  vlc_stream_Read(stream_t *, void *buf, size_t len);
uint64_t vlc_stream_Tell(const stream_t *);

These C function declarations would get translated to:

#[link(name = "vlccore")]
extern {
  pub fn vlc_stream_Peek(stream: *mut stream_t, buf: *mut *const uint8_t, size: size_t) -> ssize_t;
  pub fn vlc_stream_Read(stream: *mut stream_t, buf: *const c_void, size: size_t) -> ssize_t;
  pub fn vlc_stream_Tell(stream: *const stream_t) -> uint64_t;

The #[link(name = "vlccore")] tag indicates to which library we are linking. It is equivalent to passing a -lvlccore argument to the linker. Libvlccore is a library all VLC plugins must link to. Those functions are declared like regular Rust functions, but like the previous structures, will mainly work on raw pointers.


You can always write all your bindings manually like this, but when the amount of code to import is a bit large, it can be a good idea to employ the awesome bindgen tool, that will generate Rust code from C headers.

It can work as a command line tool, but can also work at compile time from a build script. First, add the dependency to your Cargo.toml file:

version = "^0.25"

You can then write your build script like this:

extern crate bindgen;
use std::fs::File;
use std::io::Write;
use std::path::Path;

fn main() {
  let include_arg = concat!("-I", env!("INCLUDE_DIR"));
  let vlc_common_path = concat!(env!("INCLUDE_DIR"), "/vlc_common.h");

  let _ = bindgen::builder()
    .header(concat!(env!("INCLUDE_DIR"), "/vlc_block.h"))
    .raw_line("use ffi::common::vlc_object_t;")

So there’s a lot to unpack here, because bindgen is very flexible:

  • we use clang_arg to pass the include folder path and pre include a header everywhere (vlc_common.h is included pretty puch everywhere in VLC)
  • the header method specifies the header from which we will import definitions
  • hide_type prevents redefinition of elements we already defined (liek the ones from the common header)
  • whitelisted_type and whitelisted_function specify types and functions for which bindgen will create definitions
  • raw_line writes its argument at the top of the file. I apply it to reuse definitions from other files
  • write_to_file writes the whole definition to the specified path

You can apply that process to any C header you must import. With the build script, it can run every time the library is compiled, but be careful, generating a lot of headers can take some time. It might be a good idea to pregenerate them and commit the generated files, and update them from time to time.

It is usually a good idea to separate the imported definitions in another crate with the -sys suffix, and write the safe code in the main crate.
As an example, see the crates openssl and openssl-sys.

Writing safe wrappers

Previously, we imported the C function ssize_t vlc_stream_Read(stream_t *, void *buf, size_t len) as the Rust version pub fn vlc_stream_Read(stream: *mut stream_t, buf: *const c_void, size: size_t) -> ssize_t but kept an unsafe interface. Since we want to use those functions safely, we can now make a better wrapper:

use ffi;

pub fn stream_Read(stream: *mut stream_t, buf: &mut [u8]) -> ssize_t {
  unsafe {
    ffi::vlc_stream_Read(stream, buf.as_mut_ptr() as *mut c_void, buf.len())

Here we replaced the raw pointer to memory and the length with a mutable slice. We still use a raw pointer to the stream_t instance, maybe we can do better:

use ffi;

pub struct Stream(*mut stream_t);

pub fn stream_Read(stream: Stream, buf: &mut [u8]) -> ssize_t {
  unsafe {
    ffi::vlc_stream_Read(stream.0, buf.as_mut_ptr() as *mut c_void, buf.len())

Be careful if you plan to implement Drop for this type: is the Rust code supposed to free that object? Is there some reference counting involved? Here is an example of Drop implementation from the openssl crate:

pub struct SslContextBuilder(*mut ffi::SSL_CTX);

impl Drop for SslContextBuilder {
    fn drop(&mut self) {
        unsafe { ffi::SSL_CTX_free(self.as_ptr()) }

Remember that it’s likely the host application has a lot of infrastructure to keep track of memory, and as a rule, we should reuse the tools it offers for the code at the interface between Rust and C. See the Rust FFI omnibus for more examples of safe wrappers you can write.

Side note: as of now (2017/07/10) custom allocators are still not stable

Exporting Rust code to be called from C

Since the host application is written in C, it might need to call your code. This is quite easy in Rust: you need to write unsafe wrappers.

Here we will use as example the inverted index library for mobile apps I wrote for a conference. In this library, we have an Index type that we want to use from Java. Here is its definition:

pub struct Index {
  pub index: HashMap<String, HashSet<i32>>,

This type has a few method we want to provide:

impl Index {
  pub fn new() -> Index {
    Index {
      index: HashMap::new(),

  pub fn insert(&mut self, id: i32, data: &str) {

  pub fn search_word(&self, word: &str) -> Option<&HashSet<i32>> {

  pub fn search(&self, text: &str) -> HashSet<i32> {

First, we need to write the functions to allocate and deallocate our index. Every use from C will be wrapped in a Box.

pub extern "C" fn index_create() -> *mut Index {

The Box type indicates and owns a heap allocation. When the box is dropped, the underlying data is dropped as well and the memory is freed. The following function takes ownership of its argument, so it is dropped at the end.

pub extern "C" fn index_free(ptr: *mut Index) {
    let _ = unsafe { Box::from_raw(ptr) };

Now that allocation is handled, we can work on a real method. The following method takes an index, and id for a text, and the text itself, as a C string (ie, terminated by a null character).

Since we’re kinda writing C in Rust here, we have to first check if the pointers are null. Then we can transform the C string in a slice. Then we check if it is correctly encoded as UTF-8 before inserting it into our index.

pub extern "C" fn index_insert(index: *mut Index, id: i32, raw_text: *const c_char) {
  unsafe { if index.is_null() || raw_text.is_null() { return } };
  let slice = unsafe { CStr::from_ptr(raw_text).to_bytes() };
  if let Ok(text) = str::from_utf8(slice) {
    (*index).insert(id, text);

Most of the code for those kinds of wrappers is just there to transform between C and Rust types and checking that the arguments coming from C code are correct. Even if we have to trust the host application, we should program defensively at the boundary.

There are other methods we could implement for the index, we’ll leave those as exercise for the reader 🙂

Now, we need to write the C definitions to import those functions and types:

typedef struct Index Index;

Index* index_create();
void   index_free(Index* index);
void   index_insert(Index* index, int32_t id, char const* raw_text);

We defined Index as an opaque type here. Since Rust structures can be compatible with C structures, we could export the real type, but since it only contains a Rust specific type, HashMap, it is better to hide it completely and write accessors and wrappers.

Generating bindings with rusty-cheddar

Writing function imports from C to Rust is tedious, so we have bindgen for this. We also have a great tool to go the other way: rusty-cheddar.

In the same way, it can be used from a build script:

extern crate cheddar;

fn main() {
  cheddar::Cheddar::new().expect("could not read definitions")
  cheddar::Cheddar::new().expect("could not read definitions")
    .module("index").expect("malformed module path")
    .insert_code("#include \"main.h\"")

Here we run rusty-cheddar a first time without specifying the module: it will default to generate a header for the definitions in src/
The second run specifies a different module, and can insert a file inclusion at the top.

It can be a good idea to commit the generated headers, since you will see immediately if you changed the interface in a breaking way.

Integrating with the build system

As you might know, we can make dynamic libraries and executables with rustc and cargo. But often, the host application will have its own build system, and it might disagree with the way cargo builds its projects. So we have multiple strategies:

  • build Rust code separately, store libraries and headers in Maven or something (don’t laugh, I’ve worked with such a system once, and it was actually great)
  • try to let rustc build dynamic libraries from inside the build system. We tried that for VLC and it was not great at all
  • build a static library from inside or outside the build system, include it in the libraries at link. This was done in Rusticata
  • build an object file and let the build system link it. This is what we ended up doing with VLC

Building a static library is as easy as specifying crate-type = ["staticlib"] in your Cargo.toml file. To build an object file, use the command cargo rustc --release -- --emit obj. You can see how we added it to the autotools usage in VLC.

Unfortunately, for this part we still do not have automated ways to fix the issues. Maybe with some time, people will write scripts for autotools,
CMake and others to handle Rust and Cargo.

Side note on reproducible builds: if you want to fix the set of Rust dependencies used in your project and make them always available, you can use cargo-vendor to store them in a specific folder

As you might have guessed, this is the most complex part, for which I have no good generic answer. I’d recommend that you spend the most time on this during the project’s prototyping phase: import very little C code, export very little Rust code, try to make it build entirely from within the host application’s build system. Once this is done, extending the project will get much easier. You really don’t want to discover this task at the end of your project and try to retrofit your code in there.

Going further

While this article just explores the surface of Rust rewrites, I hope it provides a good starting point on the tools and techniques you can apply.
Any rewrite will be a large and complex project, but the result is worth the effort. The code you will write will be stronger, and Rust’s type system will force you to review the assumptions made in the C version. You might even find better ways to write it once you start refactoring your code in a more Rusty way, safely hidden behind your wrappers.


Why you should, actually, rewrite it in Rust

You might have seen those obnoxious “you should rewrite it in Rust comments” here and there:

It’s like at every new memory vulnerability in well known software, there’s that one person saying Rust would have avoided the issue. We get it, it’s annoying, and it does not help us grow Rust. This attitude is generally frowned upon in the Rust community. You can’t just show up into someone’s project telling them to rewrite everything.

so, why am I writing this? Why would I try to convince you, now, that you should actually rewrite your software in Rust?

That’s because I have been working on this subject for a long time now:

  • I did multiple talks on it
  • I even co-wrote a paper
  • I did it both as client and personal work

So, I’m commited to this, and yes, I believe you should rewrite some code in Rust. But there’s a right way to do it.

Why rewrite stuff?

Our software systems are built on sand. We got pretty good at maintaining and fixing them over the years, but the cracks are showing. We still have not fixed definitely most of the low level vulnerabilities: stack buffer overflow (yes, those still exist), heap overflow, use after free, double free, off by one; the list goes on. We have some tools, like DEP, ASLR, stack canaries, control flow integrity, fuzzing. Large projects with funding, like Chrome, can resort to sandboxing parts of their application. The rest of us can still run those applications inside a virtual machine. This situation will not improve. There’s a huge amount of old (think 90s), bad quality, barely maintained code that we reuse everywhere endlessly. The good thing with hardware is that at some point, it gets replaced. Software just gets copied again. Worse, with the development of IoT, a lot of the code that ships will never be updated. It’s likely that some of those old libraries will still be there 15, 20 years from now.

Let’s not shy away from the issue here. Most of those are written in C or C++ (and usually an old version). It is well known that it is hard to write correct, reliable software in those languages. Think of all the security related things you have to keep track of in a C codebase:

  • pointer arithmetic
  • allocations and deallocations
  • data is mutable by default
  • functions return integers to mean pointers and error codes. Errors can be implicitely ignored
  • type casts, overflows and underflows are hard to track
  • buffer bounds in indexing and copying
  • all the undefined behaviours

Of course, some developers can do this work. Of course, there are sanitizers. But it’s an enormous effort to perform everyday for every project.

Those languages are well suited for low level programming, but require extreme care and expertise to avoid most of those issues. And even then, we assume the developers will always be well rested, focused and careful. We’re only humans, after all. Note that in 2017, there are still people claiming that a C developer with sufficient expertise would avoid all those issues. It’s time we put this idea to rest. Yes, some projects can avoid a lot of vulnerabilities, with a team of good developers, frequent code reviews, a restricted set of features, funding, tools, etc. Most projects cannot. And as I said earlier, a lot of the code is not even maintained.

So we have to do something. We must make our software foundations stronger. That means fixing operating systems, drivers, libraries, command line tools, servers, everything. We might not be able to fix most of it today, or the next year, but maybe 10 years from now the situation will have improved.

Unfortunately, we cannot rewrite everything. If you ever attempted to rewrite a project from scratch, you’d know that while you can avoid some of the mistakes you made before, you will probably introduce a lot of regressions and new bugs. It’s also wrong on the human side: if there are maintainers for the projects, they would need to work on the new and old one at the same time. Worse, you would have to teach them the new approach, the new language (which they might not like), and plan for an upgrade to the new project for all users.

This is not doable, and this is the part most people asking for project rewrites in Rust do not understand. What I’m advocating for is much simpler: surgically replace weaker parts but keep most of the project intact.


Most of the issues will happen around IO and input data handling, so it makes sense to focus on it. It happens there because that’s where the code manipulates buffers, parsers, and uses a lot of pointer calculations. It is also the least interesting part for software maintainers, since it is usually not where you add useful features, business logic, etc. And this logic is usually working well, so you do not want to replace it. If we could rewrite a small part of an application or library without disrupting the rest of the code, we would get most of the benefits without the issues of a full rewrite. It is the exact same project, with the same interface, same distribution packaging as before, same developer team. We would just make an annoying part of the software stronger and more maintainable.

This is where Rust comes in. It is focused on providing memory safety, thread safety while keeping the code performant and the developer productive. As such, it is generally easier to get safe, reliable code in production while writing basic Rust, than a competent, well rested C developer using all the tools available could do.

Most of the other safe languages have strong requirements, like a runtime and a garbage collector. And usually, they expect to be the host application (how many languages assume they will handle the process’s entry point?). Here, we are guests in someone else’s house. We must integrate nicely and quietly.

Rust is a strong candidate for this because:

  • it can easily call C code
  • it can easily be called by C code (it can export C compatible functions and structures)
  • it does not need a garbage collector
  • if you want, it does not even need to handle allocations
  • the Rust compiler can produce static and dynamic libraries, and even object files
  • the Rust compiler avoids most of the memory vulnerabilities you get in C (yes, I had to mention it)

So you can actually take a piece of C code inside an existing project, import the C structures and functions to access them from Rust, rewrite the code in Rust, export the functions and structures from Rust, compile it and link it with the rest of the project.

If you don’t believe it’s possible, take a look at these two examples:

  • Rusticata integrates Rust parsers written with nom in Suricata, an intrusion detection system
  • a VLC media player plugin to parse FLV files, written entirely in Rust

You get a lot of benefits from this approach. First, Rust has great package management with Cargo and That means you can separate some of the work in different libraries. See as an example the list of parsers from the Rusticata project. You can test them independently, and even reuse them in other projects. The FLV parser I wrote for VLC can also work in a Rust GStreamer plugin You can also make a separate library for the glue with the host application. I’m working on vlc_module exactly for that purpose: making Rust VLC plugins easier to write.

This approach works well for applications with a plugin oriented architecture, but you can also rewrite core parts of an application or library. The biggest issue is high coupling of C code, but it is usually easy to rewrite bit by bit by keeping a common interface. Whenever you have rewritten some coupled parts of of a project, you can take time to refactor it in a more Rusty way, and leverage the type system to help you. A good example of this is the rewrite of the Zopfli library from C to Rust.

This brings us to another important part of that infrastructure rewrite work: while we can rewrite part of an existing project without being too intrusive, we can also rewrite a library entirely, keeping exactly the same C API. You can have a Rust library, dynamic or static, with the exact same C header, that you could import in a project to replace the C one. This is a huge result. It’s like replacing a load-bearing wall in an existing building. This is not an easy thing to realize, but once it’s done, you can improve a lot of projects at once, provided your distribution’s package manager supports that replacement, or other projects take the time to upgrade.

This is a lot of work, but every time we advance a little, everybody can benefit from it, and it will add up over the years. So we might as well start now.

Currently, I’m focused on VLC. This is a good target because it’s a popular application that’s often part of the basic stack of any computer (browser, office suite, media player). So it’s a big target. But take a look at the list of dependencies in most web applications, or the dependency graph of common distributions. There is a lot of low hanging fruit there.

Now, how would you actually perform those rewrites? You can check out the next post and the paper explaining how we did it in Rusticata and VLC.

PoC: using LLVM’s profile guided optimization in Rust

call graph

What does profile-guided optimization mean?

Some languages have a JIT (Just In Time) compiler available at runtime, that can optimize the executed code depending on current execution patterns. This is, in large part, the cause of the performance of Lua and the JVM. They can start a bit slow, but by accumulating information on actual running code, they make it faster and faster for the current load. PfLua is a great example: the firewall rules are optimized again and again, until the current network traffic is handled as quickly as possible.

When you use other languages, such as C, you usually cannot optimize the application once it is compiled. Except when you use an optimization technique known as Profile-Guided Optimization. From Wikipedia :

Profile-guided optimization (PGO, sometimes pronounced as pogo), also known as profile-directed feedback (PDF), is a compiler optimization technique in computer programming that uses profiling to improve program runtime performance.

It relies on profiling the compiled application, while it runs with the expected, real world load (web traffic, calculations, etc), and feed this profiling information to the compiler. On the next build, the compiler will have more information on which parts of the program are less used, which branches are taken more often, the expected values in a range, etc. Instead of guessing how the program would behave to choose optimizations, the compiler has true information, and can optimize more precisely. There’s one issue with the process: you need two compilations and a profiling run to generate the final executable. But it gets easier when you automate it, as we can see in the Firefox build process.


While it has been available in other systems for a long time (Visual Studio 2005, the Intel compiler ICC for Itanium), it appeared recently in LLVM.  It has since then been applied successfully to XCode (Objective C, Swift) and LDC, the D compiler.

LLVM has a great feature: it uses an Intermediate Representation code (IR), which is a kind of high level assembly language. It applies its optimizations and machine code generation to that representation. If you make a compiler for a new language, targeting the LLVM IR will give you these features (nearly) for free.

In practice, compiler frontends choose which features they use, so you may not access everything LLVM has to offer. In particular, the Rust compiler, as of now (April 2016), provides a llvm-args option, but that option filters what you can send to LLVM, so we cannot use PGO here.

PGO in Rust

Still, with rustc, you can generate directly the IR, or its binary encoding, named bitcode:

rustc –emit llvm-bc
# or, with cargo:
cargo rustc — –emit llvm-bc

The approach I tried here is to take that bitcode, and manually apply LLVM’s transformations until I get a compiled executable. This is not really usable for now, especially because I chose an example with very few dependencies. With more dependencies, the compilation and linking will get more complex and unmanageable manually.

LLVM comes with a few commands that you can use to build code manually. The first one is opt, and it applies optimizations and instrumentation on the bitcode file (here, the file target/release/pgo.bc):

opt-3.8 -O2 -pgo-instr-gen -instrprof target/release/pgo.bc -o pgo.bc

The new bitcode file contains code to profile the end application (mainly by counting how often we use each code path). We can now convert that bitcode file to an object file, and link it using clang:

llc-3.8 -O2 -filetype=obj pgo.bc
clang-3.8 -O2 -flto -fprofile-instr-generate pgo.o -L/usr/local/lib/rustlib/x86_64-apple-darwin/lib -lstd-ca1c970e -o pgo

Note: I built my own rustc from source, so your libstd file may not have the same hash. Since Rust (as of April 2016) uses LLVM 3.7, we can use LLVM 3.8’s PGO features, since the bitcode format is apparently backward compatible. I use OS X, and Homebrew’s LLVM 3.8 has compilation issues, so I needed to build the compiler runtime from source. It’s a proof of concept, not production code 😉

We will now run the program we just built, preferably with production data and traffic. It will automatically generate a default.profraw file, containing the profiling information. This file must be transformed to a format that opt will understand with llvm-profdata:

llvm-profdata-3.8 merge -output=pgo.profdata default.profraw

This .profdata file will now be used in the compilation steps:

opt-3.8 -O2 -pgo-instr-use -pgo-test-profile-file=pgo.profdata target/release/pgo.bc -o pgo-opt.bc
llc-3.8 -O2 -filetype=obj pgo-opt.bc
clang-3.8 -O2 -flto -fprofile-instr-use=pgo.profdata pgo-opt.o -L/usr/local/lib/rustlib/x86_64-apple-darwin/lib -lstd-ca1c970e -o pgo-opt

We now have an executable compiled using profiling information. Is it fast?

The benchmarks

The program I tested is a n-body simulation. It was a great test target since libstd is the only dependency, and the load factor depends on a number given as command line argument. Here is a test with time (I know it’s not the most precise benchmarking tool, but for a tenth of second precision, it works alright):

$ time ./target/release/pgo 1000000000

real    1m22.528s
user    1m22.214s
sys     0m0.173s

$ time ./pgo-opt 1000000000

real    1m9.810s
user    1m9.687s
sys     0m0.070s

As it turns out, we gain nearly 15% in running time on this program. Other examples could have less impact, but this is encouraging! So, what happened inside our program?

The generated code

I provide assembly dumps of the normal program, generated with cargo –release, and the one optimized with PGO. Mostly, the code has been reordered, probably to fit better in cache lines. You can also consult PDF files with call graphs: normal, PGO optimized.

The whole code for this article is available here if you want to reproduce the results or tinker with optimizations yourself.

This is a proof of concept, demonstrating that profile guided optimization could work in Rust. It is probably worthy of integration into rustc, but there’s a lot of work before it could be usable. Still, there’s a github issue where you can weigh in, if you would like this optimization in your applications.


Crypto problems you actually need to solve

To follow up on the small Twitter rant that got people to explain GPG and OTR to me for a whole day, I’ll explain the ideas behind this.

tweet rant screenshotThere is always a new project around self hosting email and PGP, or building a new encrypted chat app. Let’s say it aloud right now: those projects are not trying to improve the situation, they just want to build their own. Otherwise, they would help one of the 150 already existing projects. Worse, a lot of those try to build a business out of it, and end up making a system where you need to completely trust their business (as a side note, building a business is fine, but just be honest about your goals).

There are things that attract new projects, because the concept is easy to get, and this drives developers right into the “I’m smarter than everybody and can do better than existing projects” territory. But there’s a reason they fail.

Making an encrypted chat app is solving the same problem as making a chat app -moving people off their existing platform, onto a new one- and additionally writing a safe protocol. Building a whole new infrastructure is not an easy task.

Making email and PGP easier to use is not only a UX issue. There is a complete mental model of how it works, what are the failure modes, and what are the trust levels. And you add above that the issues of email’s metadata leaks, the lack of forward secrecy, the key management. Even with the simplest interface, it requires that the user builds that mental model, and that she understands other users may have different models. This is pervasive to the way PGP works. It has always been and will always be an expert’s tool, and as such unfit to be deployed massively. You cannot solve infrastructure problems by teaching people.

You cannot solve infrastructure problems by teaching people

There is a pattern here: people want to build tools that will be the basis of safe communications for the years to come. Those are infrastructure problems, like electricity, running water or Internet access. They cannot be solved by teaching people how to use a tool. The tool has to work. They cannot be solved either by decentralization. At some point, someone has to pay for the infrastructure’s maintenance. As much as I like the idea of decentralizing everything, this is not how people solve serious engineering problems. But the difference with other types of business is that infrastructure businesses are dumb pipes. They don’t care what’s running through them, and you can easily replace one with the other.

There is a strong need for “private by default”, authenticated, anonymous communication. This will only come if the basic building blocks are here:

  • multiparty Off-The-Record(or other protocols like Axolotl): forward secret communication currently only works in two-party communication. Adding more members and making the protocols safe against partitions is a real challenge
  • multidevice PGP (or alternate message encryption and authentication system): currently, to use PGP on multiple devices, either you synchronize all your private keys on all your devices, or you accept that some devices will not be able to decrypt messages. This will probably not work unless you have a live server somewhere, or at least some hardware device containing keys
  • redundant key storage: systems where a single key holds everything are very seducing, but they’re a nightmare for common operation. You will lose the key. And the backup. And the other backup. You will end up copying your master key everywhere, encrypted with a password used infrequently. Either it will be too simple and easily crackable, or too complex and you will forget it. How can a friend or family access the key in case of emergency?
  • private information retrieval: PIR systems are databases that you can query for data, and the database will not know (in some margins) which piece of data you wanted. You do not need to go wild on homomorphic encryption to build something useful
  • encrypted search: the tradeoffs in search on encrypted data are well known, but there are not enough implementations
  • usable cryptography primitives: there is some work in that space, but people still rely on OpenSSL as their goto crypto library. Crypto primitives should not even let you make mistakes

Those are worthwhile targets if you know a fair bit of cryptography and security, and know how to build a complete library or product.

But for products, we need better models for infrastructure needs. A like Tahoe-LAFS is a good model: the user is in control of the data, the provider is oblivious to the content, the user can be the client of multiple providers, and switching from one to another is easy. This does not make for shiny businesses. Infrastructure companies compete mainly on cost, scale, availability, speed, support, since they all provide roughly the same features.

Of course, infrastructure requires standards and interoperability, and this usually does not make good material for startup hype. But everyone wins from this in the end.

Frustrating communication

I’m getting less and less satisfied with Twitter to exchange thoughts. The 140 characters is not the obvious problem, since you can chain messages easily. The issue is that those thoughts are ephemeral. This medium does not optimize for smart discussion with relevant people, but for quick wit from currently available people, before being dumped under a stack of comments on the latest news. The retweeting does not help much, since the primary reason for retweeting are 1. it’s funny 2. it is shocking 3. it’s inspiring, and long last “maybe it’s interesting”. They don’t create much discussion.

Until now, I have primarily used this blog for long posts (thus explaining why I don’t write much here). As my friends say “if it’s more than 3 tweets, write a blog post”.

So in the following months, I’ll try to post short, not well researched but spontaneous articles, instead of ranting in 140 characters.

A world without certificate authorities

love locks
When networks began to expand and people saw the need for secure communication, they designed complex systems based on public key cryptography, that worked more or less. Problem: how do you trust that the key a server sent you is the right one? How can you make sure that it is not somebody else trying to impersonate that website?

Multiple solutions were proposed, and the most promising was a public directory of domain names and associated public keys, maintained by a peer to peer network named KeyCoin. It looked better than so called Web Of Trust solutions, because everybody could agree on what was the correct key for a given domain. As long as nobody hold 51% of the network, no change could happen without being validated by a lot of different peers. The network was maintained by 10000 enthusiast system administrators who took their task very seriously (after all, the security of the whole system depended on their honesty), and nobody had enough computing power to take over the network.

After a while, people began using the system, since it was directly integrated in their browsers, but they did not want to run a node on the network themselves. It was too bothersome, and they could trust the administrators. Also, they had to ask one of them to make a change everytime. The whole process was a bit artisanal.

In the meantime, some people demonstrated the 51% attack on networks of reduced size, and that worried people. They wanted a safe system, one that was not only relying on those sysadmins that could do anything. Who were they anyway? Running that system was still too complex for non technical too run it themselves anyway, so they did not worry enough. But some governments found that rewriting the truth of name/key matching was interesting. Maybe to catch pedophiles, terrorists, criminals. Or maybe to censor websites, I do not know, they told me it was for my own good.

Some smart person found a good solution: if controlling the whole system necessitated owning 51% of the system, the easiest way was to have a lot of machines, enough to counteract the sysadmins. That did not seem risky when people designed the system. Nobody could have enough computing power to take over the whole network, and there would be even more nodes every day.

Yet, that person got enough funding to install tens of thousands of machines and make them join the network. They even provided a nice enough interface for people and businesses to input their domain name and public key, as long as they paid some fees. The sysadmins welcomed him at first, since money coming in the system validated their ideas. Atfer a while, they started worrying, since none of them could keep up with the computing power, but that company asssured them it would never attain 51% of the network.

Other companies jumped on the bandwagon and started to profit from that new business opportunity. Governments started their own server farms to participate too. Problem: now that everybody (except the sysadmins) had a lot of computing power, nobody had enough to control the network entirely.

So they started making alliances. If a few major players work as a team, they can do whatever they want on the network. If one of them decided to try and replace a key on the ledger, others could help it. Of course, once they begun doing that, others wanted to participate. So they created a few rules to join their club. First, you needed to have enough machines. That was a good rule, because that made a big barrier to entry. You could not start as a small player. The other rules? You had to submit to an audit, performed by the other players. Yet another barrier to entry. And once they deemed you acceptable, you had to follow the requests of governments, which were arbitrarily refusing candidates.

Even with the big barriers to entry, a few hundred players came up, often backed by governments. Of course, all ended up in the same team, doing whatever they wanted, as long as nobody was complaining, because anytime one of them had something shady to do, all of them followed automatically.

Since building those big companies required money, they made their clients pay more and more, and to make it easier to accept, provided “premium” options where they show they trust you more, since they took the time to phone your company and ask a few questions.

Some found that big system too centralized, too obedient to states, and decided to fork it. There are separate public ledgers, but they do not come directly embedded in browsers, you need to integrate them yourself, and that’s bothersome. Also, most of those networks have a few hundred nodes at best.

From a nice, decentralized, home made system, we ended up with a centralized system controlled by corporations and governments.

Now let me tell you about that system I designed. It is based on a concept named certificate, a cryptographically signed file that links the public key to a domain name. Now here’s the catch: a certificate represents a key, and is signed by another key, which is represented by another certificate, and so on and so forth until a certificate that signs itself. That system is good, because you just have to embed the root certificate that your friends gives you, and you’ll be able to verify the key of his websites, even if those keys change. And this, without even asking the public ledger, so that is a truly decentralized and more anonymous system! Nothing could go wrong with that, right?