Why you should, actually, rewrite it in Rust

You might have seen those obnoxious "you should rewrite it in Rust comments" here and there:

in a blogpost
in a Github issue
you might have heard about that Rust Evangelism Strike Force
it even has a joke twitter account

It's like at every new memory vulnerability in well known software, there’s that one person saying Rust would have avoided the issue. We get it, it’s annoying, and it does not help us grow Rust. This attitude is generally frowned upon in the Rust community. You can't just show up into someone’s project telling them to rewrite everything.

so, why am I writing this? Why would I try to convince you, now, that you should actually rewrite your software in Rust?

That's because I have been working on this subject for a long time now:

I did multiple talks on it
I even co-wrote a paper
I did it both as client and personal work

So, I'm commited to this, and yes, I believe you should rewrite some code in Rust. But there's a right way to do it.

Why rewrite stuff?

Our software systems are built on sand. We got pretty good at maintaining and fixing them over the years, but the cracks are showing. We still have not fixed definitely most of the low level vulnerabilities: stack buffer overflow (yes, those still exist), heap overflow, use after free, double free, off by one; the list goes on. We have some tools, like DEP, ASLR, stack canaries, control flow integrity, fuzzing. Large projects with funding, like Chrome, can resort to sandboxing parts of their application. The rest of us can still run those applications inside a virtual machine. This situation will not improve. There's a huge amount of old (think 90s), bad quality, barely maintained code that we reuse everywhere endlessly. The good thing with hardware is that at some point, it gets replaced. Software just gets copied again. Worse, with the development of IoT, a lot of the code that ships will never be updated. It's likely that some of those old libraries will still be there 15, 20 years from now.

Let's not shy away from the issue here. Most of those are written in C or C++ (and usually an old version). It is well known that it is hard to write correct, reliable software in those languages. Think of all the security related things you have to keep track of in a C codebase:

pointer arithmetic
allocations and deallocations
data is mutable by default
functions return integers to mean pointers and error codes. Errors can be implicitely ignored
type casts, overflows and underflows are hard to track
buffer bounds in indexing and copying
all the undefined behaviours

Of course, some developers can do this work. Of course, there are sanitizers. But it's an enormous effort to perform everyday for every project.

Those languages are well suited for low level programming, but require extreme care and expertise to avoid most of those issues. And even then, we assume the developers will always be well rested, focused and careful. We're only humans, after all. Note that in 2017, there are still people claiming that a C developer with sufficient expertise would avoid all those issues. It's time we put this idea to rest. Yes, some projects can avoid a lot of vulnerabilities, with a team of good developers, frequent code reviews, a restricted set of features, funding, tools, etc. Most projects cannot. And as I said earlier, a lot of the code is not even maintained.

So we have to do something. We must make our software foundations stronger. That means fixing operating systems, drivers, libraries, command line tools, servers, everything. We might not be able to fix most of it today, or the next year, but maybe 10 years from now the situation will have improved.

Unfortunately, we cannot rewrite everything. If you ever attempted to rewrite a project from scratch, you'd know that while you can avoid some of the mistakes you made before, you will probably introduce a lot of regressions and new bugs. It's also wrong on the human side: if there are maintainers for the projects, they would need to work on the new and old one at the same time. Worse, you would have to teach them the new approach, the new language (which they might not like), and plan for an upgrade to the new project for all users.

This is not doable, and this is the part most people asking for project rewrites in Rust do not understand. What I'm advocating for is much simpler: surgically replace weaker parts but keep most of the project intact.

How

Most of the issues will happen around IO and input data handling, so it makes sense to focus on it. It happens there because that's where the code manipulates buffers, parsers, and uses a lot of pointer calculations. It is also the least interesting part for software maintainers, since it is usually not where you add useful features, business logic, etc. And this logic is usually working well, so you do not want to replace it. If we could rewrite a small part of an application or library without disrupting the rest of the code, we would get most of the benefits without the issues of a full rewrite. It is the exact same project, with the same interface, same distribution packaging as before, same developer team. We would just make an annoying part of the software stronger and more maintainable.

This is where Rust comes in. It is focused on providing memory safety, thread safety while keeping the code performant and the developer productive. As such, it is generally easier to get safe, reliable code in production while writing basic Rust, than a competent, well rested C developer using all the tools available could do.

Most of the other safe languages have strong requirements, like a runtime and a garbage collector. And usually, they expect to be the host application (how many languages assume they will handle the process's entry point?). Here, we are guests in someone else's house. We must integrate nicely and quietly.

Rust is a strong candidate for this because:

it can easily call C code
it can easily be called by C code (it can export C compatible functions and structures)
it does not need a garbage collector
if you want, it does not even need to handle allocations
the Rust compiler can produce static and dynamic libraries, and even object files
the Rust compiler avoids most of the memory vulnerabilities you get in C (yes, I had to mention it)

So you can actually take a piece of C code inside an existing project, import the C structures and functions to access them from Rust, rewrite the code in Rust, export the functions and structures from Rust, compile it and link it with the rest of the project.

If you don't believe it's possible, take a look at these two examples:

Rusticata integrates Rust parsers written with nom in Suricata, an intrusion detection system
a VLC media player plugin to parse FLV files, written entirely in Rust

You get a lot of benefits from this approach. First, Rust has great package management with Cargo and crates.io. That means you can separate some of the work in different libraries. See as an example the list of parsers from the Rusticata project. You can test them independently, and even reuse them in other projects. The FLV parser I wrote for VLC can also work in a Rust GStreamer plugin You can also make a separate library for the glue with the host application. I'm working on vlc_module exactly for that purpose: making Rust VLC plugins easier to write.

This approach works well for applications with a plugin oriented architecture, but you can also rewrite core parts of an application or library. The biggest issue is high coupling of C code, but it is usually easy to rewrite bit by bit by keeping a common interface. Whenever you have rewritten some coupled parts of of a project, you can take time to refactor it in a more Rusty way, and leverage the type system to help you. A good example of this is the rewrite of the Zopfli library from C to Rust.

This brings us to another important part of that infrastructure rewrite work: while we can rewrite part of an existing project without being too intrusive, we can also rewrite a library entirely, keeping exactly the same C API. You can have a Rust library, dynamic or static, with the exact same C header, that you could import in a project to replace the C one. This is a huge result. It's like replacing a load-bearing wall in an existing building. This is not an easy thing to realize, but once it's done, you can improve a lot of projects at once, provided your distribution's package manager supports that replacement, or other projects take the time to upgrade.

This is a lot of work, but every time we advance a little, everybody can benefit from it, and it will add up over the years. So we might as well start now.

Currently, I'm focused on VLC. This is a good target because it's a popular application that's often part of the basic stack of any computer (browser, office suite, media player). So it's a big target. But take a look at the list of dependencies in most web applications, or the dependency graph of common distributions. There is a lot of low hanging fruit there.

Now, how would you actually perform those rewrites? You can check out the next post and the paper explaining how we did it in Rusticata and VLC.

security vulnerabilities

Why rewrite stuff?

How

Related Posts

nom 5 is here 17 Jun 2019

FOSS is free as in toilet 27 Nov 2018

No, pest is not faster than nom 04 Oct 2018