<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.5">Jekyll</generator><link href="http://unhandledexpression.com/feed.xml" rel="self" type="application/atom+xml" /><link href="http://unhandledexpression.com/" rel="alternate" type="text/html" /><updated>2019-06-24T21:41:19+02:00</updated><id>http://unhandledexpression.com/feed.xml</id><title type="html">Unhandled Expression</title><subtitle>Geoffroy Couprie – software security and architecture consultant</subtitle><author><name>Geoffroy Couprie</name></author><entry><title type="html">nom 5 is here</title><link href="http://unhandledexpression.com/general/2019/06/17/nom-5-is-here.html" rel="alternate" type="text/html" title="nom 5 is here" /><published>2019-06-17T11:00:00+02:00</published><updated>2019-06-17T11:00:00+02:00</updated><id>http://unhandledexpression.com/general/2019/06/17/nom-5-is-here</id><content type="html" xml:base="http://unhandledexpression.com/general/2019/06/17/nom-5-is-here.html">&lt;p&gt;&lt;a href=&quot;https://github.com/geal/nom&quot;&gt;&lt;em&gt;nom&lt;/em&gt;, the Rust parser combinators library&lt;/a&gt;,
is now available at version 5.
This is the most mature version of &lt;em&gt;nom&lt;/em&gt;. This is the one that feels “done”.
This is the parser library that I wanted when I started nom 5 years ago.
It’s here at last.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;hamster eating broccoli: om nom nom nom!&quot; src=&quot;/assets/omnomnom.gif&quot; style=&quot;display: block; margin: auto&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;nom 5&lt;/em&gt; is a complete rewrite of the internal architecture, to use functions
instead of macros, while keeping backward compatibility with existing macros
based parsers, and making the error type completely generic.&lt;/p&gt;

&lt;p&gt;As an example, here are some elements of a JSON parser:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;delimited&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;'\&quot;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;parse_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;'\&quot;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;JsonValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;delimited&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;'['&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;separated_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;nf&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;','&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;json_value&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;']'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;json_value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JsonValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;alt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;JsonValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;JsonValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;JsonValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))),&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;JsonValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;Num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;JsonValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Boolean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;))(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;why&quot;&gt;Why?&lt;/h2&gt;

&lt;p&gt;Why move away from macros? And why use macros in the first place if it could
be done with plain functions?&lt;/p&gt;

&lt;p&gt;The short answer is that it couldn’t be done until recently. The long answer
is that when I started &lt;em&gt;nom&lt;/em&gt; in 2014, Rust was different, less powerful.
At that time I had an intriguing idea: what if we had parser combinators that
can return a reference to the input data instead of copying it?
This could fix some of the performance problems that were frequent with parser
combinators.&lt;/p&gt;

&lt;p&gt;It turned out that yes, it was possible, but making it usable required
some work. I needed to add lifetimes everywhere; closures were hard to manipulate.
Combining functions directly, which was my initial vision based on Haskell’s Parsec,
was not possible.
Still, declarative macros were already working pretty well, and gave us
a powerful meta programming tool.&lt;/p&gt;

&lt;p&gt;So &lt;em&gt;nom&lt;/em&gt; ended up being a macros based DSL, that would generate code depending on
macros calling other macros, rewriting their arguments, etc.&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;// named is used to declare a function and its arguments&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;named!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hex_primary&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;map_res!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;take_while_m_n!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;is_hex_digit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;from_hex&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;named!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hex_color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;// do_parse applies parsers in sequence&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;do_parse!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
           &lt;span class=&quot;c&quot;&gt;// tag recognizes a specific string&lt;/span&gt;
           &lt;span class=&quot;nd&quot;&gt;tag!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;hex_primary&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;green&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hex_primary&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;blue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;hex_primary&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Color&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;green&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blue&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;assert_eq!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;hex_color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#2F14DF&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Color&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;47&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;green&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;blue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;223&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}))&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For an insight into how it works, see how the &lt;code class=&quot;highlighter-rouge&quot;&gt;opt!&lt;/code&gt; combinator, that wraps
a parser result in an &lt;code class=&quot;highlighter-rouge&quot;&gt;Option&lt;/code&gt; (using &lt;code class=&quot;highlighter-rouge&quot;&gt;None&lt;/code&gt; if there’s an error), is
defined:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;macro_export&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;local_inner_macros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;macro_rules!&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;opt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$i:expr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$submac:ident&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$args:tt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$crate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;lib&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;Result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$crate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;lib&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;option&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;Option&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$crate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

      &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$i&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.clone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$submac&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;          &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Some&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))),&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;             &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$i:expr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$f:expr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;opt!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;call!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The first argument &lt;code class=&quot;highlighter-rouge&quot;&gt;$i&lt;/code&gt; is the input passed by the calling macro, and it is
given as first argument to the child parser, so if we had &lt;code class=&quot;highlighter-rouge&quot;&gt;opt!(input, parser)&lt;/code&gt;,
we would generate the following code:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.clone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;          &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Some&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))),&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;             &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;While macros could become complex, the generated code was quite simple, mainly
using nested pattern matching. This made most combinators easy to write,
and the generated code was fast.&lt;/p&gt;

&lt;p&gt;At the same time, macros had issues that plagued &lt;em&gt;nom&lt;/em&gt; for a long time.
The smallest typo when calling a macro would result into inscrutable error
messages. Macros variants that were not tested were never compiled (it was
harder to catch mistakes). Macro argument parsing limited the syntax.&lt;/p&gt;

&lt;p&gt;If you were willing to learn a bit about &lt;em&gt;nom&lt;/em&gt; and accept some of its quirks,
along with the array of tools to make building and debugging parsers much
easier; writing parsers were a fun, interactive process of figuring out
data byte after byte.&lt;/p&gt;

&lt;p&gt;But macros were hard to learn for many people, and I had to find another way.&lt;/p&gt;

&lt;h2 id=&quot;using-functions&quot;&gt;Using functions&lt;/h2&gt;

&lt;p&gt;Thanks to the development of the “impl Trait” feature in Rust, I had an idea:
could I make a function that accepts a function as argument, and returns
another function, making everything generic?&lt;/p&gt;

&lt;p&gt;As it turns out, yes, I can. See, for example, the &lt;code class=&quot;highlighter-rouge&quot;&gt;pair&lt;/code&gt; combinator, that
takes as argument 2 parsers, and returns a new parser that produces a tuple
of the results of both child parsers:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;G&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;G&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Fn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;O1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Fn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;G&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Fn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ParseError&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;move&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(|(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// you can call directly the resulting parser like this:&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parser1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parser2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(&lt;code class=&quot;highlighter-rouge&quot;&gt;IResult&lt;/code&gt; is the parser result type of &lt;em&gt;nom&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Not only is it possible, but the code is reasonably easy to write: I built an
&lt;a href=&quot;https://github.com/geal/nomfun&quot;&gt;example parser library&lt;/a&gt; based on this in a few
days.&lt;/p&gt;

&lt;p&gt;The previous hexadecimal color parser could be rewritten like this:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;hex_primary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;map_res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;take_while_m_n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;is_hex_digit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;from_hex&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;hex_color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;tag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;green&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hex_primary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hex_primary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hex_primary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Color&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;green&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blue&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}))&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the existing macros were rewritten to use the function combinators under the
hood, so existing parsers will work the same way with &lt;em&gt;nom 5&lt;/em&gt;, except for how errors
are handled, and some details around streaming parsers you’ll see below.&lt;/p&gt;

&lt;h2 id=&quot;error-management&quot;&gt;Error management&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;nom 4&lt;/em&gt; relied on an error type that would change depending on the &lt;code class=&quot;highlighter-rouge&quot;&gt;verbose-errors&lt;/code&gt;
cargo feature. If you used the feature, you would get more context on errors at the
price of performance, if you didn’t you could only see which combinator saw an error
and the input position.&lt;/p&gt;

&lt;p&gt;Unfortunately, cargo features are additive, so if any transitive dependency was using
&lt;em&gt;nom&lt;/em&gt; with that feature, it would be activated everywhere.&lt;/p&gt;

&lt;p&gt;Additionally, this error type supported a “custom” error variant that was generic,
which resulted in painful type inference errors from time to time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;nom 5&lt;/em&gt; changes this by making the error management completely generic, and the way you
write your code or call the parsers will decide on the error type. The trick is in
the &lt;code class=&quot;highlighter-rouge&quot;&gt;ParserError&amp;lt;Input&amp;gt;&lt;/code&gt; trait you saw earlier in the &lt;code class=&quot;highlighter-rouge&quot;&gt;pair&lt;/code&gt; combinator’s code.&lt;/p&gt;

&lt;p&gt;By default, the error type in &lt;em&gt;nom&lt;/em&gt; is a tuple: &lt;code class=&quot;highlighter-rouge&quot;&gt;(Input, nom::error::ErrorKind)&lt;/code&gt;. But you can
use a more precise error type, like &lt;a href=&quot;https://docs.rs/nom/latest/nom/error/struct.VerboseError.html&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;nom::error::VerboseError&lt;/code&gt;&lt;/a&gt;,
that will provide more context. Or you can define your own error type, with exactly
the information you need. You can learn more about it in the
&lt;a href=&quot;https://github.com/Geal/nom/blob/master/doc/error_management.md&quot;&gt;error management guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;streaming-vs-complete-parsers&quot;&gt;Streaming VS complete parsers&lt;/h2&gt;

&lt;p&gt;Another pain point of previous &lt;em&gt;nom&lt;/em&gt; versions was in the management of streaming or complete
input. In streaming mode, we assume that we do not have the entire data, and might get more
by reading again from a file or socket. In complete mode, we know we have the entire data.&lt;/p&gt;

&lt;p&gt;Some parsers are built differently depending on how the input data works. If you were
writing an integer parser from text, in complete mode you could just read all the digits,
even until the end of input, because you would have all of the data.
With streaming input, if you reach the end of the data, you do not know if more digits
can appear. So you have to wait until you encounter a character that is not a digit.&lt;/p&gt;

&lt;p&gt;From the beginning, &lt;em&gt;nom&lt;/em&gt; has made this distinction explicit, because it was designed
with network protocols and big file formats in mind, through the &lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt; variant
of parser results, that indicates the parser cannot decide and needs more data.&lt;/p&gt;

&lt;p&gt;Making it usable, though, was challenging, especially when building parsers for
smaller formats, like configuration files or programming languages.&lt;/p&gt;

&lt;p&gt;The first solution was to use the &lt;code class=&quot;highlighter-rouge&quot;&gt;complete!&lt;/code&gt; combinator, which transforms an
&lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt; result into an error. Then having specialized versions of some
combinators.
But this was making parsers hard to write, and figuring out which parser was returning
&lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt; was annoying.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;nom 4&lt;/em&gt; introduced a new idea: what if the input type decided if we were in
streaming or complete mode?
The &lt;code class=&quot;highlighter-rouge&quot;&gt;CompleteByteSlice&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;CompleteStr&lt;/code&gt; were the complete input versions of
&lt;code class=&quot;highlighter-rouge&quot;&gt;&amp;amp;[u8]&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;&amp;amp;str&lt;/code&gt; that were considered as streaming input.
Unfortunately, this made things worse: those types were littering the code,
converting back and forth with the underlying byte slices or strings was tedious,
and they inexplicably made parsers slower.&lt;/p&gt;

&lt;p&gt;So &lt;em&gt;nom 5&lt;/em&gt; comes with a cleaner approach:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;CompleteByteSlice&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;CompleteStr&lt;/code&gt; are gone&lt;/li&gt;
  &lt;li&gt;macros based parsers always work in streaming mode&lt;/li&gt;
  &lt;li&gt;for function combinators that would work differently in streaming or complete
mode, there are different versions in submodules, like &lt;code class=&quot;highlighter-rouge&quot;&gt;nom::bytes::streaming::tag&lt;/code&gt;
and &lt;code class=&quot;highlighter-rouge&quot;&gt;nom::bytes::complete::tag&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This way, you can choose explicitely which version you use, and can even use
a bit of both depending on the context. As an example, in a binary format,
you might use &lt;code class=&quot;highlighter-rouge&quot;&gt;nom::bytes::streaming::take&lt;/code&gt; to get a slice of the input,
then parse that slice with parsers that work on complete input.&lt;/p&gt;

&lt;h2 id=&quot;various-other-changes&quot;&gt;Various other changes&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;nom&lt;/em&gt; now uses the &lt;a href=&quot;https://crates.io/crates/lexical-core&quot;&gt;lexical crate&lt;/a&gt; for
float parsing. Having a good float parser was important, but it is a complex
enough topic to leave it to another crate.&lt;/p&gt;

&lt;p&gt;This release was also a good opportunity to clean things up. Over the years,
&lt;em&gt;nom&lt;/em&gt; accumulated a lot of code, ideas, bad function names. So a lot of parsers
were renamed or removed entirely, and the code was reorganized. For more
information, check out the &lt;a href=&quot;https://github.com/Geal/nom/blob/master/CHANGELOG.md&quot;&gt;changelog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The documentation was entirely rewritten: documentation and examples are
available for every type, function and macro. The &lt;a href=&quot;https://github.com/Geal/nom/tree/master/doc&quot;&gt;guides&lt;/a&gt;
were updated to follow the new code. And we now have cleaner &lt;a href=&quot;https://github.com/Geal/nom/tree/master/examples&quot;&gt;examples&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;a href=&quot;https://github.com/Geal/nom/blob/master/examples/json.rs&quot;&gt;JSON parsing example&lt;/a&gt;
explains how to write a parser, from simple parts to more complex parsers,
and shows how the new flexible error management can generate great error messages.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://github.com/Geal/nom/blob/master/examples/s_expression.rs&quot;&gt;S expression example&lt;/a&gt;
demonstrates how we can parse and interpret programming languages.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://github.com/Geal/nom/blob/master/examples/iterator.rs&quot;&gt;iterator example&lt;/a&gt;
shows new patterns that are available in &lt;em&gt;nom 5&lt;/em&gt;. It is now easy to make an iterator
out of a parser and input data, parsing and producing values as needed.
There is even a new &lt;a href=&quot;https://docs.rs/nom/5.0.0-beta2/nom/combinator/fn.iterator.html&quot;&gt;iterator combinator&lt;/a&gt;
to help with that.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;p&gt;Switching to functions changes the entire code layout. Now, a lot of the combinator
code can be shared, instead of being generated every time it is called.&lt;/p&gt;

&lt;p&gt;So, in most cases it will improve performance, but in a few others we might get
worse results. Future versions will smooth things out: &lt;em&gt;nom 4&lt;/em&gt; was the peak of what
was possible with the macros system, while I’m just getting started with the new
design!&lt;/p&gt;

&lt;p&gt;To reproduce the experiments, check that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;you’re using &lt;a href=&quot;https://crates.io/crates/jemallocator&quot;&gt;jemallocator&lt;/a&gt; instead of the
system allocator. Test with and without it and see how it affects the results&lt;/li&gt;
  &lt;li&gt;you’re using link time optimization (LTO) when compiling, which will reduce code size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is what I add to &lt;code class=&quot;highlighter-rouge&quot;&gt;Cargo.toml&lt;/code&gt; for benchmarks:&lt;/p&gt;

&lt;div class=&quot;language-toml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nn&quot;&gt;[profile.bench]&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;lto&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;codegen-units&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And now, some benchmarks between &lt;em&gt;nom&lt;/em&gt; 4.2.3 and 5.0.0:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;HTTP parser: &lt;strong&gt;20% faster&lt;/strong&gt; (non optimized one, runs at 500MB/s)&lt;/li&gt;
  &lt;li&gt;INI file parser: &lt;strong&gt;20% faster&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;JSON file parser: 20% slower (work in progress, allocations affect it a lot)&lt;/li&gt;
  &lt;li&gt;float parser: &lt;strong&gt;98% faster&lt;/strong&gt; thanks to the lexical crate&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;recognize_float&lt;/code&gt; parser: &lt;strong&gt;62% faster&lt;/strong&gt; (not using the lexical crate)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No comparison between parser libraries this time, I have not yet rewritten all the
&lt;em&gt;nom&lt;/em&gt; examples in the &lt;a href=&quot;https://github.com/rust-bakery/parser_benchmarks/&quot;&gt;parser benchmarks repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;thanks&quot;&gt;Thanks&lt;/h2&gt;

&lt;p&gt;The macros to functions refactoring and the documentation rewrite were huge
and required an enormous effort. I would not have been able to do this release
without all of &lt;a href=&quot;https://github.com/Geal/nom/blob/master/CHANGELOG.md#thanks-1&quot;&gt;&lt;em&gt;nom’s&lt;/em&gt; contributors&lt;/a&gt;
that helped me along the way, the developers that started using &lt;em&gt;nom 5&lt;/em&gt; while it
was still in beta.
Also I’m thankful for the support from my employer, &lt;a href=&quot;https://clever-cloud.com&quot;&gt;Clever Cloud&lt;/a&gt;,
which is now a large nom user :D&lt;/p&gt;

&lt;p&gt;I am happy that after all this time, this project is still going strong,
helping people write good parsers to power production software or weekend
projects. Happy hacking with &lt;em&gt;nom&lt;/em&gt;!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/geal/nom&quot;&gt;repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://crates.io/crates/nom&quot;&gt;crates.io page&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.rs/nom&quot;&gt;code documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Geal/nom/tree/master/doc&quot;&gt;guides&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Geal/nom/blob/master/CHANGELOG.md&quot;&gt;changelog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Geoffroy Couprie</name></author><category term="development" /><category term="Rust" /><summary type="html">nom, the Rust parser combinators library, is now available at version 5. This is the most mature version of nom. This is the one that feels “done”. This is the parser library that I wanted when I started nom 5 years ago. It’s here at last.</summary></entry><entry><title type="html">FOSS is free as in toilet</title><link href="http://unhandledexpression.com/general/2018/11/27/foss-is-free-as-in-toilet.html" rel="alternate" type="text/html" title="FOSS is free as in toilet" /><published>2018-11-27T21:00:00+01:00</published><updated>2018-11-27T21:00:00+01:00</updated><id>http://unhandledexpression.com/general/2018/11/27/foss-is-free-as-in-toilet</id><content type="html" xml:base="http://unhandledexpression.com/general/2018/11/27/foss-is-free-as-in-toilet.html">&lt;p&gt;I am a bit dissatisfied with the use of the
&lt;a href=&quot;https://en.wikipedia.org/wiki/Tragedy_of_the_commons&quot;&gt;Tragedy of the commons&lt;/a&gt;
to represent issues with free and open source software development.
It is not an abstract resource that can be depleted when overused.
It is not magically maintained if left alone.&lt;/p&gt;

&lt;p&gt;It is based on the work of people, and we should not erase those
people.&lt;/p&gt;

&lt;p&gt;Unfortunately (and it is by design), most of the licences
and the vocabulary around it are focused on the software’s
user. After all, they work by reducing the creator’s right
to empower the user.&lt;/p&gt;

&lt;p&gt;As examples of this vocabulary, we have the distinction between
“free as in beer” and “free as in speech” to show that the “free”
word in “free software” has more to do with freedom and people’s
rights to use, study, modify and share a program, than its actual
price. Although, in practice, the overwhelming majority of
FOSS will not cost you anything.&lt;/p&gt;

&lt;p&gt;This model has won, FOSS is everywhere, companies not only
use it, but even heavily rely on it, millions of devices
run with it.&lt;/p&gt;

&lt;p&gt;But at which cost? Open source developers are burning out.
Some core libraries, on which basically everything relies,
are maintained by very small teams of people working on
their free time. We still have the right to study the
software, but the most interesting parts are now in
the user’s data, which is jealously guarded by a few
huge companies.&lt;/p&gt;

&lt;p&gt;This is not a tragedy, this is a fucking farce.&lt;/p&gt;

&lt;p&gt;Let’s own up to the absurdity of talking about a personal
freedom that depends mainly on hidden people working for
free. Let’s add more ridicule to it. Let’s start using
a new expression to describe it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FOSS IS FREE AS IN TOILET&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nobody believes that a free toilet will be magically cleaned
up and maintained, somebody has to do it, and that person
would better get paid for it. Sharing a toilet means that
you flush, clean up after yourself, and always leave some
paper, it’s basic manners.
And yet, like toilets, as FOSS gets used by more and more
people, it gets more likely that you will see obnoxious
people that shit all over your commons and then complain
about it. And nobody will want to take care of it.&lt;/p&gt;

&lt;p&gt;Treating correctly the people who work on the software
you use is just basic FOSS hygiene.&lt;/p&gt;</content><author><name>Geoffroy Couprie</name></author><category term="development" /><summary type="html">I am a bit dissatisfied with the use of the Tragedy of the commons to represent issues with free and open source software development. It is not an abstract resource that can be depleted when overused. It is not magically maintained if left alone.</summary></entry><entry><title type="html">No, pest is not faster than nom</title><link href="http://unhandledexpression.com/general/2018/10/04/no-pest-is-not-faster-than-nom.html" rel="alternate" type="text/html" title="No, pest is not faster than nom" /><published>2018-10-04T15:00:00+02:00</published><updated>2018-10-04T15:00:00+02:00</updated><id>http://unhandledexpression.com/general/2018/10/04/no-pest-is-not-faster-than-nom</id><content type="html" xml:base="http://unhandledexpression.com/general/2018/10/04/no-pest-is-not-faster-than-nom.html">&lt;p&gt;As the main developer of &lt;a href=&quot;https://github.com/geal/nom&quot;&gt;nom, the Rust parser combinators
library&lt;/a&gt;, I’m usually happy to
see other parser libraries appear in Rust. The language’s
strengths play well in that space, and writing parsers is
a nice way to explore it.&lt;/p&gt;

&lt;p&gt;I’m also happy to see them &lt;a href=&quot;https://github.com/geal/parser_benchmarks&quot;&gt;compete in benchmarks with nom&lt;/a&gt;, since it keeps me on my toes, and improves performance
for everybody. &lt;a href=&quot;https://github.com/marwes/combine&quot;&gt;Combine&lt;/a&gt; has been
an interesting competitor here, showing how far one can go without using
macros.&lt;/p&gt;

&lt;p&gt;That said, &lt;a href=&quot;https://pest.rs/&quot;&gt;the recent release of pest 2.0&lt;/a&gt;,
a parsing expression grammar library, is pushing me to be petty :D&lt;/p&gt;

&lt;p&gt;That library has been using a benchmark against nom &lt;a href=&quot;https://github.com/pest-parser/pest/blob/f3d208fbf685d932e3939ada9cbf243e79c8950f/README.md#sheer-performance&quot;&gt;in its
readme&lt;/a&gt; for a long time. I wrote &lt;a href=&quot;https://github.com/Geal/pestvsnom&quot;&gt;that benchmark&lt;/a&gt;
to help in the comparison, but the results were slightly misleading:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/pestvsnom_old.svg&quot; alt=&quot;old benchmarks&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The nom benchmark and the “pest (custom AST)” benchmark were
converting the JSON file to Rust types (&lt;code class=&quot;highlighter-rouge&quot;&gt;Vec&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;HashMap&lt;/code&gt; for
objects, booleans, strings and floating point numbers).
While the simple pest parser was validating the file and
generating a list of tokens without converting them.
So, yes, of course it will be faster than nom.&lt;/p&gt;

&lt;p&gt;There was &lt;a href=&quot;https://github.com/Geal/parser_benchmarks/blob/588c2cddf9a625a7af6d34c1b4edd42536023121/json/README.md&quot;&gt;further work on the benchmarks&lt;/a&gt;,
while nom 4 was still in preparation, and pest got interesting
results:&lt;/p&gt;

&lt;p&gt;The benchmarks were run on a late 2013 Macbook Pro, quad core 2,3 GHz Intel Core i7.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;basic&lt;/th&gt;
      &lt;th&gt;canada.json&lt;/th&gt;
      &lt;th&gt;apache_builds.json&lt;/th&gt;
      &lt;th&gt;data.json&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;combine&lt;/td&gt;
      &lt;td&gt;(fails)&lt;/td&gt;
      &lt;td&gt;127,775,522 ns/iter (+/- 11,140,676) = 17 MB/s&lt;/td&gt;
      &lt;td&gt;3,732,534 ns/iter (+/- 795,836) = 34 MB/s&lt;/td&gt;
      &lt;td&gt;241,407 ns/iter (+/- 40,575) = 38 MB/s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;nom&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1,333 ns/iter (+/- 247) = 57 MB/s&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;62,971,567 ns/iter (+/- 6,311,768) = 35 MB/s&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1,209,550 ns/iter (+/- 323,936) = 105 MB/s&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;62,008 ns/iter (+/- 11,685) = 149 MB/s&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;pest&lt;/td&gt;
      &lt;td&gt;1,405 ns/iter (+/- 238) = 54 MB/s&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;27,701,820 ns/iter (+/- 3,961,221) = 81 MB/s&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;1,694,463 ns/iter (+/- 338,194) = 75 MB/s&lt;/td&gt;
      &lt;td&gt;131,851 ns/iter (+/- 22,667) = 70 MB/s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;I could have left it at that.&lt;/p&gt;

&lt;p&gt;But today (October 4th, 2018), &lt;a href=&quot;https://pest.rs/&quot;&gt;the pest website&lt;/a&gt;
featured a very misleading graph.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/pest-graph.png&quot; alt=&quot;the pest benchmark graph&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Yes, it’s very easy to make nom look bad if you:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;conveniently remove the slowest pest parser&lt;/li&gt;
  &lt;li&gt;do not put any link to the benchmark code&lt;/li&gt;
  &lt;li&gt;avoid saying how many iterations are done, on which file, etc&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;start the horizontal axis at 20ms instead of 0ms, so that nom appears twice slower&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;this is ridiculous, and deserves all the pettiness I can muster.&lt;/p&gt;

&lt;h2 id=&quot;the-real-benchmark&quot;&gt;The real benchmark&lt;/h2&gt;

&lt;p&gt;I took the &lt;a href=&quot;https://github.com/Geal/pestvsnom&quot;&gt;old benchmark&lt;/a&gt;
reused the &lt;a href=&quot;https://github.com/Geal/pestvsnom/blob/master/assets/canada.json&quot;&gt;canada.json file&lt;/a&gt;,
added the &lt;a href=&quot;https://github.com/Geal/pestvsnom/blob/master/assets/data.json&quot;&gt;data.json file&lt;/a&gt;
from &lt;a href=&quot;https://github.com/pest-parser/pest/blob/master/grammars/benches/data.json&quot;&gt;pest’s own JSON benchmark&lt;/a&gt;, applied &lt;a href=&quot;https://github.com/pest-parser/pest/blob/master/grammars/benches/json.rs&quot;&gt;pest’s current way to bench its code&lt;/a&gt; in &lt;a href=&quot;https://github.com/Geal/pestvsnom/blob/master/benches/pest.rs&quot;&gt;my benchmarks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After upgrading to nom 4 and pest 2, and fixing some old bugs in &lt;a href=&quot;https://github.com/Geal/pestvsnom/blob/master/benches/nom.rs&quot;&gt;the nom
parser&lt;/a&gt;,
I could reproduce the old results (still on a late 2013 Macbook Pro,
quad core 2,3 GHz Intel Core i7, yes my laptop is getting old):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;nom:
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;canada.json&lt;/code&gt;: 60,734,229 ns/iter (+/- 17,775,618)&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;data.json&lt;/code&gt;: 23,937 ns/iter (+/- 9,992)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;pest:
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;canada.json&lt;/code&gt;: 35,041,472 ns/iter (+/- 5,454,302)&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;data.json&lt;/code&gt;: 14,665 ns/iter (+/- 2,041)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, a pest 2.0 parser that does not convert the input to Rust types
is indeed faster than a nom 4.0 parser that does convert the input to
Rust types.&lt;/p&gt;

&lt;p&gt;But what happens if I write &lt;a href=&quot;https://github.com/Geal/pestvsnom/blob/master/benches/nom_spans.rs&quot;&gt;a nom 4.0 parser that does not convert
its input to Rust types&lt;/a&gt;?
It’s actually a bit easier if I don’t try to generates floats, bools, etc.
It’s not exactly what pest does. Pest stores indexes along with the original
slice, while this one will return sub slices for each JSON element.
Still, here are the results:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;nom:
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;canada.json&lt;/code&gt;: 60,734,229 ns/iter (+/- 17,775,618)&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;data.json&lt;/code&gt;: 23,937 ns/iter (+/- 9,992)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;pest:
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;canada.json&lt;/code&gt;: 35,041,472 ns/iter (+/- 5,454,302)&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;data.json&lt;/code&gt;: 14,665 ns/iter (+/- 2,041)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;nom_spans (returning slices instead of converting to Rust types):
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;canada.json&lt;/code&gt;: 20,623,381 ns/iter (+/- 1,952,297)&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;data.json&lt;/code&gt;: 10,757 ns/iter (+/- 1,462)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or, represented as graphs that do not actually mislead us with their axis:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/canada-json-benchmark.png&quot; alt=&quot;canada.json benchmark&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/data-json-benchmark.png&quot; alt=&quot;data.json benchmark&quot; /&gt;&lt;/p&gt;

&lt;p&gt;So, here we are, nom is actually still a lot faster than pest.
I still find pest’s parser interesting, because the grammar is
quite readable, but I tend to prefer parser combinators,
because they make it easy to &lt;a href=&quot;https://github.com/pest-parser/pest/issues/197&quot;&gt;reuse parsers and combinators&lt;/a&gt;
and allow you to write your own custom elements as you wish.&lt;/p&gt;

&lt;p&gt;So, here we are, if you see ways to improve the benchmarks,
and in the process make nom or pest faster, please do it :)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/comeatme.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;</content><author><name>Geoffroy Couprie</name></author><category term="Rust" /><category term="parser" /><category term="security" /><category term="nom" /><summary type="html">As the main developer of nom, the Rust parser combinators library, I’m usually happy to see other parser libraries appear in Rust. The language’s strengths play well in that space, and writing parsers is a nice way to explore it.</summary></entry><entry><title type="html">nom 4.0: faster, safer, simpler parsers</title><link href="http://unhandledexpression.com/general/2018/05/14/nom-4-0-faster-safer-simpler-parsers.html" rel="alternate" type="text/html" title="nom 4.0: faster, safer, simpler parsers" /><published>2018-05-14T13:00:00+02:00</published><updated>2018-05-14T13:00:00+02:00</updated><id>http://unhandledexpression.com/general/2018/05/14/nom-4-0-faster-safer-simpler-parsers</id><content type="html" xml:base="http://unhandledexpression.com/general/2018/05/14/nom-4-0-faster-safer-simpler-parsers.html">&lt;p&gt;I’m delighted to announce that &lt;a href=&quot;https://github.com/geal/nom&quot;&gt;nom&lt;/a&gt;, the extremely
fast Rust parser combinators library, has reached major version 4.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR: the new nom version is simpler, faster, has a better documentation, and you can
find a summary of what changed in
&lt;a href=&quot;https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md&quot;&gt;the upgrade documentation&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;side note: how fast is nom? it can reach &lt;a href=&quot;https://github.com/Geal/parser_benchmarks/tree/master/http&quot;&gt;2GB/s when parsing HTTP requests&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/nom.png&quot; alt=&quot;nom logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since nom is now a well established, serious project, we got a brand new logo,
courtesy of &lt;a href=&quot;https://corkami.github.io/&quot;&gt;Ange Albertini&lt;/a&gt;.
The nom monster will happily eat your data byte by byte :)&lt;/p&gt;

&lt;p&gt;It took nearly 6 months of development and the library went through nearly 5
entire rewrites. Compare that to previous major releases, which took a month at
most to do. But it was worth it! This new release cleans up a lot of old bugs
and unintuitive behaviours, simplifies some common patterns, is faster, uses less
memory, gives better errors, but the way parsers are written stay the same.
It’s like an entirely new engine under the same body work!&lt;/p&gt;

&lt;h2 id=&quot;moving-from-iresult-to-result&quot;&gt;Moving from &lt;code class=&quot;highlighter-rouge&quot;&gt;IResult&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;Result&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;This was a long standing request. nom used a three-legged enum as return type for the parsers:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;// example parser signature&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// remaining input, result value&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Done&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// indicates the parser encountered an error. E is a custom error type you can redefine&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// Incomplete contains a Needed, an enum than can represent a known quantity of input data, or unknown&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Incomplete&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Needed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Needed&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// needs more data, but we do not know how much&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Unknown&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// contains the required total data size&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// if the &quot;verbose-errors&quot; feature is not active&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// if the &quot;verbose-errors&quot; feature is active&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// An error code, represented by an ErrorKind, which can contain a custom error code represented by E&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// An error code, and the next error&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// An error code, and the input position&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Position&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// An error code, the input position and the next error&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;NodePosition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That old &lt;code class=&quot;highlighter-rouge&quot;&gt;IResult&lt;/code&gt; structure did not transform well to the commonly used &lt;code class=&quot;highlighter-rouge&quot;&gt;Result&lt;/code&gt;,
people did not want to see the &lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt; case (when the parser indicates it does
not have enough data to decide) if they do not need it. And the different error types
depending on the &lt;code class=&quot;highlighter-rouge&quot;&gt;verbose-errors&lt;/code&gt; feature were confusing and causing errors when nom
appeared multiple times in dependency trees.&lt;/p&gt;

&lt;p&gt;So I replaced it with a new, &lt;code class=&quot;highlighter-rouge&quot;&gt;Result&lt;/code&gt; based design:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// There was not enough data&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Incomplete&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Needed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// The parser had an error (recoverable)&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// The parser had an unrecoverable error&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Failure&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Needed&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// needs more data, but we do not know how much&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Unknown&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;/// contains the required additional data size&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// if the &quot;verbose-errors&quot; feature is inactive&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// if the &quot;verbose-errors&quot; feature is active&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;Code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ErrorKind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Aside from being more compatible with, like, the whole Rust ecosystem, this new design
has lots of interesting points:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the &lt;code class=&quot;highlighter-rouge&quot;&gt;Context&lt;/code&gt; enum is now extended by the &lt;code class=&quot;highlighter-rouge&quot;&gt;verbose-errors&lt;/code&gt; feature, so it is the same type&lt;/li&gt;
  &lt;li&gt;errors always store position information&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt; has moved to the error case so you can easily ignore it&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;IResult::Done(remaining, value)&lt;/code&gt; has been replaced with &lt;code class=&quot;highlighter-rouge&quot;&gt;Ok((remaining, value))&lt;/code&gt; so you could easily do &lt;code class=&quot;highlighter-rouge&quot;&gt;let (remaining, value) = parser(input)?;&lt;/code&gt; like you would do with other &lt;code class=&quot;highlighter-rouge&quot;&gt;Result&lt;/code&gt; based functions&lt;/li&gt;
  &lt;li&gt;the &lt;code class=&quot;highlighter-rouge&quot;&gt;Err&lt;/code&gt; enum now contains a&lt;code class=&quot;highlighter-rouge&quot;&gt;Failure&lt;/code&gt; case epresenting an unrecoverable error (combinators like &lt;code class=&quot;highlighter-rouge&quot;&gt;alt!&lt;/code&gt; will not try another branch)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And we get all of these benefits while keeping the same memory footprint in “simple” errors mode,
and reducing it in “verbose” errors mode:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;size of &lt;code class=&quot;highlighter-rouge&quot;&gt;IResult&lt;/code&gt;&lt;/th&gt;
      &lt;th&gt;simple errors&lt;/th&gt;
      &lt;th&gt;verbose errors&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;nom 3&lt;/td&gt;
      &lt;td&gt;40 bytes&lt;/td&gt;
      &lt;td&gt;64 bytes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;nom 4&lt;/td&gt;
      &lt;td&gt;40 bytes&lt;/td&gt;
      &lt;td&gt;48 bytes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;And it gets faster! Depending on the format, I have seen improvements between 4% and 40%!&lt;/p&gt;

&lt;h2 id=&quot;dealing-with-incomplete-usage&quot;&gt;Dealing with Incomplete usage&lt;/h2&gt;

&lt;p&gt;nom’s parsers are designed to work around streaming issues: if there is not enough data to decide, a
parser will return &lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt; instead of returning a partial value that might be false.&lt;/p&gt;

&lt;p&gt;As an example, if you want to parse alphabetic characters then digits, when you get the whole input
&lt;code class=&quot;highlighter-rouge&quot;&gt;abc123;&lt;/code&gt;, the parser will return &lt;code class=&quot;highlighter-rouge&quot;&gt;abc&lt;/code&gt; for alphabetic characters, and &lt;code class=&quot;highlighter-rouge&quot;&gt;123&lt;/code&gt; for the digits, and &lt;code class=&quot;highlighter-rouge&quot;&gt;;&lt;/code&gt;
as remaining input.&lt;/p&gt;

&lt;p&gt;But if you get that input in chunks, like &lt;code class=&quot;highlighter-rouge&quot;&gt;ab&lt;/code&gt; then &lt;code class=&quot;highlighter-rouge&quot;&gt;c123;&lt;/code&gt;, the alphabetic characters parser will
return &lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt;, because it does not know if there will be more matching characters afterwards.
If it returned &lt;code class=&quot;highlighter-rouge&quot;&gt;ab&lt;/code&gt; directly, the digit parser would fail on the rest of the input, even though the
input had the valid format.&lt;/p&gt;

&lt;p&gt;For some users, though, the input will never be partial (everything could be loaded in memory at once),
and the solution in nom 3 and before was to wrap parts of the parsers with the &lt;code class=&quot;highlighter-rouge&quot;&gt;complete!()&lt;/code&gt; combinator
that transforms &lt;code class=&quot;highlighter-rouge&quot;&gt;Incomplete&lt;/code&gt; in &lt;code class=&quot;highlighter-rouge&quot;&gt;Error&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;nom 4 is much stricter about the behaviour with partial data, but provides better tools to deal with it.
Thanks to the new &lt;code class=&quot;highlighter-rouge&quot;&gt;AtEof&lt;/code&gt; trait for input types, nom now provides the &lt;code class=&quot;highlighter-rouge&quot;&gt;CompleteByteSlice(&amp;amp;amp;[u8])&lt;/code&gt; and
&lt;code class=&quot;highlighter-rouge&quot;&gt;CompleteStr(str)&lt;/code&gt; input types, for which the &lt;code class=&quot;highlighter-rouge&quot;&gt;at_eof()&lt;/code&gt; method always returns true.
With these types, no need to put a &lt;code class=&quot;highlighter-rouge&quot;&gt;complete!()&lt;/code&gt; combinator everywhere, you can juste apply those types
like this:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nd&quot;&gt;named!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// becomes&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;named!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CompleteByteSlice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nd&quot;&gt;named!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// becomes&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;named!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CompleteStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And as an example, for a unit test:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nd&quot;&gt;assert_eq!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;abcd123&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;123&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;abcd&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;// becomes&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;assert_eq!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;CompleteStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;abcd123&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;CompleteStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;123&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;CompleteStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;abcd&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;These types allow you to correctly handle cases like text formats for which there might be a last
empty line or not, as seen in &lt;a href=&quot;https://github.com/Geal/nom/blob/87d837006467aebcdb0c37621da874a56c8562b5/tests/multiline.rs&quot;&gt;one of the examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If those types feel a bit long to write everywhere in the parsers, it’s possible
to alias them like this:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CompleteByteSlice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;'a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Input&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;CompleteByteSlice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;better-documentation&quot;&gt;Better documentation&lt;/h2&gt;

&lt;p&gt;Previous documentation was scattered around and hard to navigate, especially
when trying to find the exact combinator that would work perfectly for what we
want.&lt;/p&gt;

&lt;p&gt;So the &lt;a href=&quot;https://github.com/Geal/nom/blob/master/README.md&quot;&gt;new README&lt;/a&gt; is now
more about what can be done instead of an incomplete reference.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://docs.rs/nom&quot;&gt;documentation homepage&lt;/a&gt; is an introduction to parser
combinators and nom’s design, with some examples to show common combinators
like &lt;code class=&quot;highlighter-rouge&quot;&gt;do_parse&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There’s a brand new &lt;a href=&quot;https://github.com/Geal/nom/blob/master/doc/choosing_a_combinator.md&quot;&gt;“choosing a combinator” doc&lt;/a&gt;
to help you find what you need, arranged by categories, with example usage
and expected results.&lt;/p&gt;

&lt;p&gt;And there are new examples for a lot of parser and combinators.&lt;/p&gt;

&lt;h2 id=&quot;various-fixes&quot;&gt;Various fixes&lt;/h2&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;no_std&lt;/code&gt; usage is now working correctly. For most of nom’s combinators, you
will not need anything more than &lt;code class=&quot;highlighter-rouge&quot;&gt;core&lt;/code&gt;, and a few basic combinators like
&lt;code class=&quot;highlighter-rouge&quot;&gt;many0&lt;/code&gt; or &lt;code class=&quot;highlighter-rouge&quot;&gt;separated_list&lt;/code&gt; require &lt;code class=&quot;highlighter-rouge&quot;&gt;alloc&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Some parsers sometimes ended up in the middle of UTF-8 characters, now those
streams are handled correctly.&lt;/p&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;p&gt;Here is a comparison of nom’s internal benchmarks, between 3.2.1 and 4.0.0, in default mode:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ cargo benchcmp 3.2.1.bench 4.0.0.bench
 name             3.2.1.bench ns/iter  4.0.0.bench ns/iter  diff ns/iter   diff % 
 arithmetic       759                  469                          -290  -38.21% 
 ini              998 (109 MB/s)       897 (122 MB/s)               -101  -10.12% 
 ini_key_value    45 (400 MB/s)        47 (382 MB/s)                   2    4.44% 
 ini_keys_values  91 (483 MB/s)        85 (529 MB/s)                  -6   -6.59% 
 ini_str          1,399 (77 MB/s)      1,396 (78 MB/s)                -3   -0.21% 
 json_bench       2,149                1,694                        -455  -21.17% 
 http_test        700                  659                           -41   -5.86% 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And here is the difference for verbose errors:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ cargo benchcmp 3.2.1.bench 4.0.0.bench 
 name             3.2.1.bench ns/iter  4.0.0.bench ns/iter  diff ns/iter   diff % 
 arithmetic       1,731                1,523                        -208  -12.02% 
 ini              1,199 (90 MB/s)      1,061 (103 MB/s)             -138  -11.51% 
 ini_key_value    70 (257 MB/s)        60 (300 MB/s)                 -10  -14.29% 
 ini_keys_values  133 (330 MB/s)       111 (405 MB/s)                -22  -16.54% 
 ini_str          1,525 (71 MB/s)      1,621 (67 MB/s)                96    6.30% 
 json_bench       2,905                2,193                        -712  -24.51% 
 http_test        854                  941                            87   10.19% 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;nom 4 is not faster everywhere, ther are still some small regressions that will be fixed
soon, but overall we see great improvements.&lt;/p&gt;

&lt;p&gt;With nom 4 and some recent work around integrating lookup tables and vectorization
in nom parsers, we can get &lt;a href=&quot;https://github.com/Geal/parser_benchmarks/tree/master/http&quot;&gt;impressive results in HTTP parsing&lt;/a&gt;
(comparing with &lt;a href=&quot;https://github.com/Marwes/combine&quot;&gt;combine&lt;/a&gt;, &lt;a href=&quot;https://github.com/joyent/http-parser&quot;&gt;Joyent’s http-parser&lt;/a&gt;,
&lt;a href=&quot;https://github.com/seanmonstar/httparse&quot;&gt;httparse&lt;/a&gt; and &lt;a href=&quot;https://github.com/h2o/picohttpparser&quot;&gt;picohttpparser&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/http_benchmarks.png&quot; alt=&quot;HTTP benchmarks&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-future-for-nom&quot;&gt;the future for nom&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/geal/nom&quot;&gt;nom 4&lt;/a&gt; is a huge release, and the new design
will probably take some time to settle. There’s probably a lot of low hanging
fruit on the performance side, and I look forward to my next obsessive
profiling sessions.&lt;/p&gt;

&lt;p&gt;Meanwhile, nom will continue to happily munch bytes for you, as one of the
fastest, most reliable parsing libraries available.&lt;/p&gt;

&lt;p&gt;In the future, I’m interested in supporting more use cases, like zero
allocation parsers or Web Assembly usage, and integrating more SIMD work,
like what was done for the HTTP parser. The goal is to get nom parsers to
secure data access in more and more systems, whatever the language or platform.&lt;/p&gt;</content><author><name>Geoffroy Couprie</name></author><category term="Rust" /><category term="parser" /><category term="security" /><category term="nom" /><summary type="html">I’m delighted to announce that nom, the extremely fast Rust parser combinators library, has reached major version 4.</summary></entry><entry><title type="html">PoC: compiling to eBPF from Rust</title><link href="http://unhandledexpression.com/general/rust/2018/02/02/poc-compiling-to-ebpf-from-rust.html" rel="alternate" type="text/html" title="PoC: compiling to eBPF from Rust" /><published>2018-02-02T21:33:08+01:00</published><updated>2018-02-02T21:33:08+01:00</updated><id>http://unhandledexpression.com/general/rust/2018/02/02/poc-compiling-to-ebpf-from-rust</id><content type="html" xml:base="http://unhandledexpression.com/general/rust/2018/02/02/poc-compiling-to-ebpf-from-rust.html">&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;guess who has an eBPF tracer written in Rust? This guy👍 &lt;a href=&quot;https://t.co/3aE1giGWeK&quot;&gt;pic.twitter.com/3aE1giGWeK&lt;/a&gt;&lt;/p&gt;&amp;mdash; Geoffroy Couprie (@gcouprie) &lt;a href=&quot;https://twitter.com/gcouprie/status/957332988462882819?ref_src=twsrc%5Etfw&quot;&gt;January 27, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt;

&lt;p&gt;I have been playing with &lt;a href=&quot;https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;eBPF (extended Berkeley Packet Filters)&lt;/a&gt;, a neat feature present in recent Linux versions (it evolved from the much older BPF filters). It is a virtual machine running in the kernel, to which you can send code from userland, and that code can be used to filter packets or trace parts of the kernel code.&lt;/p&gt;

&lt;p&gt;What makes eBPF really nice is how the kernel handles it. You send a program in bytecode format to the kernel, it then checks it, verifying, for example, that there are no loops, thus guaranteeing that the program will terminate, and it will then apply JIT compilation, making the resulting code quite fast. Even better, that code can be loaded and unloaded at any time through a syscall, and you can set up shared data structures between the eBPF program and your own, to efficiently gather data.&lt;/p&gt;

&lt;p&gt;As an example, you can use eBPF (and the &lt;a href=&quot;https://www.iovisor.org/technology/xdp&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;XDP - eXpress Data Path -&lt;/a&gt; feature) to write very efficient firewalls, or employ &lt;a href=&quot;http://www.brendangregg.com/ebpf.html&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;BCC (BPF Compiler Collection)&lt;/a&gt; to trace a process’s IO events.&lt;/p&gt;

&lt;p&gt;I’m looking at how we could use that to trace applications on our infrastructure at &lt;a href=&quot;https://www.clever-cloud.com/&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;Clever Cloud&lt;/a&gt;. There are a few things we should know about the tooling first.&lt;/p&gt;

&lt;p&gt;At the beginning, people wrote their program &lt;a href=&quot;https://github.com/systemd/systemd/blob/master/src/core/bpf-firewall.c#L89-L129&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;using the bytecode directly&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/* Compare IPv4 with one word instruction (32bit)*/&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bpf_insn&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;insn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;cm&quot;&gt;/* If skb-&amp;gt;protocol != ETH_P_IP, skip this whole block. The offset will be set later. */&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_JMP_IMM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_JNE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;htobe16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;protocol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;cm&quot;&gt;/*
   * Call into BPF_FUNC_skb_load_bytes to load the dst/src IP address
   *
   * R1: Pointer to the skb
   * R2: Data offset
   * R3: Destination buffer on the stack (r10 - 4)
   * R4: Number of bytes to read (4)
   */&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_MOV64_REG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_REG_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_MOV32_IMM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_REG_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;addr_offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_MOV64_REG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_REG_3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_ALU64_IMM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_ADD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addr_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_MOV32_IMM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_REG_4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;addr_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_RAW_INSN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_JMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_CALL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_FUNC_skb_load_bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;cm&quot;&gt;/*
   * Call into BPF_FUNC_map_lookup_elem to see if the address matches any entry in the
   * LPM trie map. For this to work, the prefixlen field of 'struct bpf_lpm_trie_key'
   * has to be set to the maximum possible value.&amp;lt;
   *
   * On success, the looked up value is stored in R0. For this application, the actual
   * value doesn't matter, however; we just set the bit in @verdict in R8 if we found any
   * matching value.
   */&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_LD_MAP_FD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_REG_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;map_fd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_MOV64_REG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_REG_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_ALU64_IMM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_ADD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addr_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_ST_MEM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_W&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;addr_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_RAW_INSN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_JMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_CALL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_FUNC_map_lookup_elem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_JMP_IMM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_JEQ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BPF_ALU32_IMM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BPF_OR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_REG_8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;verdict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is a bit raw, and somewhat complex to write, so people worked on C to eBPF compilers, and the feature landed in LLVM: we can use clang to write eBPF programs! It will generate the bytecode, that can then be loaded through the &lt;a href=&quot;http://man7.org/linux/man-pages/man2/bpf.2.html&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;bpf() syscall&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is still a bit complex, since the eBPF program might need access to some internal data structures of the kernel, and those change depending on kernel versions and configuration options. And we still need to set up the shared data structures with the userland program that will gather data.&lt;/p&gt;

&lt;p&gt;That’s why the &lt;a href=&quot;https://github.com/iovisor/bcc&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;BCC project&lt;/a&gt; provides an easy to use interface to compile and load eBPF programs. They made it so simple that you can write a python script to compile, load and interact with your program:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;bcc&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BPF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'int kprobe__sys_clone(void *ctx) { bpf_trace_printk(&quot;Hello, World!&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;); return 0; }'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trace_print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;They provide a lot of &lt;a href=&quot;https://github.com/iovisor/bcc/tree/master/examples&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;useful examples&lt;/a&gt; and &lt;a href=&quot;https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;a nice tutorial&lt;/a&gt; to get started writing eBPF tracers.&lt;/p&gt;

&lt;p&gt;Unfortunately, those tools make a tradeoff that’s slightly annoying for me: they require installing BCC, which requires Python, LLVM and the complete Linux sources, on the target machines. It might be possible to precompile the programs though, but it does not look like it’s a common use case with BCC.&lt;/p&gt;

&lt;p&gt;So, maybe there’s a nice way to precompile those programs, store them as bytecode, then load them with a small agent that does not need LLVM and the kernel sources to work? It turns out it is possible, thanks to the &lt;a href=&quot;https://github.com/iovisor/gobpf/&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;gobpf project&lt;/a&gt;, who &lt;a href=&quot;https://github.com/iovisor/gobpf/pull/6/commits/869e637f483f499254d57d443e7aaadad50dce24&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;split their ELF loading code&lt;/a&gt; from the BCC part a year ago.&lt;/p&gt;

&lt;p&gt;And, now, you’ll see where I am going with this. Being one of those annoying Rust developers who want to rewrite everything in their favorite language, I thought “hey, maybe I can Rust that thing too!”&lt;/p&gt;

&lt;p&gt;Since it is possible to compile to eBPF bytecode from C, it is possible to compile LLVM IR (the kind of bytecode LLVM generates from the code before compiling it to the target CPU’s assembly) to eBPF. Look for “LLVM IR debugging” in &lt;a href=&quot;https://cilium.readthedocs.io/en/latest/bpf/#llvm&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;this link&lt;/a&gt; for an example. And I know I can compile Rust to that LLVM IR, and everything should work out, as long as Rust’s LLVM version is the same as the system’s version.&lt;/p&gt;

&lt;p&gt;So I created a small Rust project (&lt;a href=&quot;https://github.com/Geal/rust-ebpf-demo&quot;&gt;code available here&lt;/a&gt;), and wrote the following build script:&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;#!/bin/sh&lt;/span&gt;
cargo rustc &lt;span class=&quot;nt&quot;&gt;--release&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--emit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;llvm-ir
&lt;span class=&quot;nb&quot;&gt;cp &lt;/span&gt;target/release/deps/hello-&lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;.ll hello.ll
cargo rustc &lt;span class=&quot;nt&quot;&gt;--release&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--emit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;llvm-bc
&lt;span class=&quot;nb&quot;&gt;cp &lt;/span&gt;target/release/deps/hello-&lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;.bc hello.bc
llc-4.0 hello.bc &lt;span class=&quot;nt&quot;&gt;-march&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;bpf &lt;span class=&quot;nt&quot;&gt;-filetype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;obj &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; hello.o
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I generated the ll file to take a look at the IR in text form)&lt;/p&gt;

&lt;p&gt;And now, the code!
I used the example program from &lt;a href=&quot;https://kinvolk.io/blog/2017/09/an-update-on-gobpf---elf-loading-uprobes-more-program-types/&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;this blog post&lt;/a&gt; as inspiration, and came up with this:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;mem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transmute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;ffi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;#[no_mangle]&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;#[link_section&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;license&quot;&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_license&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;71u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;80&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;76&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;//b&quot;GPL\0&quot;&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;#[no_mangle]&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;#[link_section&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_version&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0xFFFFFFFE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;#[no_mangle]&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;#[link_section&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;kprobe/SyS_clone&quot;&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extern&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;C&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;kprobe__sys_clone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_FUNC_trace_printk&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nn&quot;&gt;transmute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;104u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;101&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;108&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;108&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;111&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;102&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;114&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;111&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;109&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;114&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;117&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;115&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;116&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;//b&quot;hello from Rust\0&quot;&lt;/span&gt;
  &lt;span class=&quot;nf&quot;&gt;BPF_FUNC_trace_printk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.as_ptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, the constants are in their own ELF section, this is expected by gobpf’s elf loader. Apparently, I cannot write &lt;code class=&quot;highlighter-rouge&quot;&gt;pub static _license: &amp;amp;'static [u8] = b&quot;GPL\0&quot;&lt;/code&gt;, because the &lt;code class=&quot;highlighter-rouge&quot;&gt;_license&lt;/code&gt; symbol would then be a relocation of the actual string.&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_mangle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;#[link_section&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;license&quot;&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_license&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;71u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;80&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;76&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;&lt;span class=&quot;c&quot;&gt;//b&quot;GPL\0&quot;;&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;#[no_mangle]&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;#[link_section&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_version&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u32&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0xFFFFFFFE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, the function: gobpf expects a section with the name of the function we will try to hook later.&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_mangle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;#[link_section&lt;/span&gt; &lt;span class=&quot;nd&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;kprobe/SyS_clone&quot;&lt;/span&gt;&lt;span class=&quot;nd&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extern&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;C&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;kprobe__sys_clone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we might need to import functions. They are &lt;a href=&quot;http://elixir.free-electrons.com/linux/v4.7/source/include/uapi/linux/bpf.h#L153&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;defined in a C enum&lt;/a&gt;, but interpreted as function pointers. So calling the printk function from there amounts writing the instruction &lt;code class=&quot;highlighter-rouge&quot;&gt;call 6&lt;/code&gt;.
So we transmute the number &lt;code class=&quot;highlighter-rouge&quot;&gt;6&lt;/code&gt; into a function. I know, ewww, but it works :D&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BPF_FUNC_trace_printk&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nn&quot;&gt;transmute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And now, the last part, actually printing something. To avoid the previous issue of a constant string appearing as a relocated symbol on which gobpf will throw an error, I defined it as a local constant, then called &lt;code class=&quot;highlighter-rouge&quot;&gt;BPF_FUNC_trace_printk&lt;/code&gt; on it. That should be harmless, right?&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;104u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;101&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;108&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;108&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;111&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;102&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;114&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;111&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;109&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;114&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;117&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;115&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;116&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;//b&quot;hello from Rust\0&quot;&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;BPF_FUNC_trace_printk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.as_ptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So now, let’s take a look at the generated LLVM IR (the ll file):&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;; ModuleID = 'hello0-b4990b5a434d0f01306c6e79c17f427.rs'
source_filename = &quot;hello0-b4990b5a434d0f01306c6e79c17f427.rs&quot;
target datalayout = &quot;e-m:e-i64:64-f80:128-n8:16:32:64-S128&quot;
target triple = &quot;x86_64-unknown-linux-gnu&quot;
@_license = local_unnamed_addr constant [4 x i8] c&quot;GPL\00&quot;, section &quot;license&quot;, align 1
@_version = local_unnamed_addr constant i32 -2, section &quot;version&quot;, align 4

; Function Attrs: nounwind uwtable
define i32 @kprobe__sys_clone(i8* nocapture readnone %ctx) unnamed_addr #0 section &quot;kprobe/SyS_clone&quot; {
start:
  %msg = alloca [17 x i8], align 16
  %0 = getelementptr inbounds [17 x i8], [17 x i8]* %msg, i64 0, i64 0
  call void @llvm.lifetime.start(i64 17, i8* nonnull %0)
  %1 = bitcast [17 x i8]* %msg to *
  store  , * %1, align 16
  %2 = getelementptr inbounds [17 x i8], [17 x i8]* %msg, i64 0, i64 16
  store i8 0, i8* %2, align 16
  %3 = call i32 (i8*, i64, ...) inttoptr (i64 6 to i32 (i8*, i64, ...)*)(i8* nonnull %0, i64 17) #2
  call void @llvm.lifetime.end(i64 17, i8* nonnull %0)
  ret i32 0
}

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start(i64, i8* nocapture) #1

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end(i64, i8* nocapture) #1

attributes #0 = { nounwind uwtable &quot;probe-stack&quot;=&quot;__rust_probestack&quot; }
attributes #1 = { argmemonly nounwind }
attributes #2 = { nounwind }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, it generates the correct symbols and sections for &lt;code class=&quot;highlighter-rouge&quot;&gt;_license&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;_version&lt;/code&gt;. It apparently generates a big store instruction for the string we’ll print. And the function call looks like this &lt;code class=&quot;highlighter-rouge&quot;&gt;call i32 (i8*, i64, ...) inttoptr (i64 6 to i32 (i8*, i64, ...)*)(i8* nonnull %0, i64 17)&lt;/code&gt; where we cast 6 to a function pointer: &lt;code class=&quot;highlighter-rouge&quot;&gt;inttoptr (i64 6 to i32 (i8*, i64, ...)*)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So that should generate correct BPF bytecode, right? Let’s check that with the command &lt;code class=&quot;highlighter-rouge&quot;&gt;llvm-objdump-4.0 -S hello.o&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;hello.o:    file format ELF64-BPF

Disassembly of section kprobe/SyS_clone:
kprobe__sys_clone:
       0:   b7 01 00 00 0a 00 00 00     r1 = 10
       1:   73 1a ef ff 00 00 00 00     *(u8 *)(r10 - 17) = r1
       2:   b7 01 00 00 74 00 00 00     r1 = 116
       3:   73 1a ee ff 00 00 00 00     *(u8 *)(r10 - 18) = r1
       4:   b7 01 00 00 73 00 00 00     r1 = 115
       5:   73 1a ed ff 00 00 00 00     *(u8 *)(r10 - 19) = r1
       6:   b7 01 00 00 75 00 00 00     r1 = 117
       7:   73 1a ec ff 00 00 00 00     *(u8 *)(r10 - 20) = r1
       8:   b7 01 00 00 6d 00 00 00     r1 = 109
       9:   73 1a e9 ff 00 00 00 00     *(u8 *)(r10 - 23) = r1
      10:   b7 01 00 00 72 00 00 00     r1 = 114
      11:   73 1a eb ff 00 00 00 00     *(u8 *)(r10 - 21) = r1
      12:   73 1a e7 ff 00 00 00 00     *(u8 *)(r10 - 25) = r1
      13:   b7 01 00 00 66 00 00 00     r1 = 102
      14:   73 1a e6 ff 00 00 00 00     *(u8 *)(r10 - 26) = r1
      15:   b7 01 00 00 20 00 00 00     r1 = 32
      16:   73 1a ea ff 00 00 00 00     *(u8 *)(r10 - 22) = r1
      17:   73 1a e5 ff 00 00 00 00     *(u8 *)(r10 - 27) = r1
      18:   b7 01 00 00 6f 00 00 00     r1 = 111
      19:   73 1a e8 ff 00 00 00 00     *(u8 *)(r10 - 24) = r1
      20:   73 1a e4 ff 00 00 00 00     *(u8 *)(r10 - 28) = r1
      21:   b7 01 00 00 6c 00 00 00     r1 = 108
      22:   73 1a e3 ff 00 00 00 00     *(u8 *)(r10 - 29) = r1
      23:   73 1a e2 ff 00 00 00 00     *(u8 *)(r10 - 30) = r1
      24:   b7 01 00 00 65 00 00 00     r1 = 101
      25:   73 1a e1 ff 00 00 00 00     *(u8 *)(r10 - 31) = r1
      26:   b7 01 00 00 68 00 00 00     r1 = 104
      27:   73 1a e0 ff 00 00 00 00     *(u8 *)(r10 - 32) = r1
      28:   b7 01 00 00 00 00 00 00     r1 = 0
      29:   73 1a f0 ff 00 00 00 00     *(u8 *)(r10 - 16) = r1
      30:   bf a1 00 00 00 00 00 00     r1 = r10
      31:   07 01 00 00 e0 ff ff ff     r1 += -32
      32:   b7 02 00 00 11 00 00 00     r2 = 17
      33:   85 00 00 00 06 00 00 00     call 6
      34:   b7 00 00 00 00 00 00 00     r0 = 0
      35:   95 00 00 00 00 00 00 00     exit
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, the lines 0 to 29 are there to load our string (instead of pointing to a constant somewhere). The expected &lt;code class=&quot;highlighter-rouge&quot;&gt;call 6&lt;/code&gt; instruction is on line 33!&lt;/p&gt;

&lt;p&gt;To load that program, we can now use &lt;a href=&quot;https://kinvolk.io/blog/2017/09/an-update-on-gobpf---elf-loading-uprobes-more-program-types/&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;the example Go program from the blog post&lt;/a&gt; (slightly modified to accept a program name as argument): &lt;code class=&quot;highlighter-rouge&quot;&gt;sudo ./bpf-load hello.o&lt;/code&gt;. The eBPF program will the hook the &lt;code class=&quot;highlighter-rouge&quot;&gt;sys_clone&lt;/code&gt; function and print a hello every time it is called. You can see the trace with the command &lt;code class=&quot;highlighter-rouge&quot;&gt;sudo cat /sys/kernel/debug/tracing/trace_pipe&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And now you can be a happy Rust developer like me because again, you put some Rust where you were not supposed to!&lt;/p&gt;
&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;you: *sobbing* please stop...you can&amp;#39;t rewrite everything in Rust!&lt;br /&gt;me: *👉video games* Rust&lt;br /&gt;you: nooo😢&lt;br /&gt;me: *👉kernel* Rust&lt;br /&gt;you: stop that😭&lt;br /&gt;me: *👉frontend apps* Rust &amp;amp; wasm&lt;br /&gt;you: 😱&lt;/p&gt;&amp;mdash; Geoffroy Couprie (@gcouprie) &lt;a href=&quot;https://twitter.com/gcouprie/status/951526092279607297?ref_src=twsrc%5Etfw&quot;&gt;January 11, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt;

&lt;p&gt;Now, this is a small hack. To make it more useful, here is what we would need:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;a small library to import BPF functions instead of transmuting the number every time&lt;/li&gt;
  &lt;li&gt;that small library should also have a nice way to interact with BPF maps to transmit data to userland&lt;/li&gt;
  &lt;li&gt;a userland library (in Rust, of course) that can set up maps and load eBPF programs. I should mention here that Julia Evans is currently working on a port of gobpf’s BCC part in Rust &lt;a href=&quot;https://github.com/jvns/ruby-mem-watcher-demo/blob/master/src/bin/bpf.rs&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;for a Ruby profiling tool&lt;/a&gt;! The ELF part might not be too far :)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s all, I’ll post more once I get more useful code working!
PS/ if you want to learn more about BPF, &lt;a href=&quot;https://qmonnet.github.io/whirl-offload/2016/09/01/dive-into-bpf/&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;read this great list&lt;/a&gt;!&lt;/p&gt;</content><author><name>{&quot;login&quot;=&gt;&quot;geal&quot;, &quot;email&quot;=&gt;&quot;geo.couprie@gmail.com&quot;, &quot;display_name&quot;=&gt;&quot;Géal&quot;, &quot;first_name&quot;=&gt;&quot;&quot;, &quot;last_name&quot;=&gt;&quot;&quot;}</name><email>geo.couprie@gmail.com</email></author><summary type="html">guess who has an eBPF tracer written in Rust? This guy👍 pic.twitter.com/3aE1giGWeK&amp;mdash; Geoffroy Couprie (@gcouprie) January 27, 2018</summary></entry><entry><title type="html">Rust 2018: maybe don’t be too stable</title><link href="http://unhandledexpression.com/rust/2018/01/10/rust-2018-maybe-dont-be-too-stable.html" rel="alternate" type="text/html" title="Rust 2018: maybe don't be too stable" /><published>2018-01-10T19:25:14+01:00</published><updated>2018-01-10T19:25:14+01:00</updated><id>http://unhandledexpression.com/rust/2018/01/10/rust-2018-maybe-dont-be-too-stable</id><content type="html" xml:base="http://unhandledexpression.com/rust/2018/01/10/rust-2018-maybe-dont-be-too-stable.html">&lt;p&gt;I initially did not want to write a post with what I want and foresee for &lt;a href=&quot;https://blog.rust-lang.org/2018/01/03/new-years-rust-a-call-for-community-blogposts.html&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;Rust in 2018&lt;/a&gt;, because I'm already very happy with it! I have spent more than 4 years tinkering with the language, experimenting, and I love the freedom I get when playing with low level stuff. In those 4 years, I discovered a wonderful, welcoming community and made some awesome friends. So, yes, I'm happy with Rust as it is :)&lt;/p&gt;
&lt;p&gt;But some of the recent #Rust2018 posts made me react a bit. I'm interested in learning what other people see in Rust, so I read almost all of them, and there's an easy trend to follow. Rust should be stabilized. Rust should be boring and safe. Crates should be stabilized. We should have definitive crates for some purposes like HTTP clients or async programming.&lt;br /&gt;
This is not surprising, since there's already been a lot of focus on stability in 2017, with the &lt;a href=&quot;https://blog.rust-lang.org/2017/09/18/impl-future-for-rust.html&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;impl period&lt;/a&gt;, the merge of the &lt;a href=&quot;https://github.com/rust-lang/rfcs/blob/master/text/2052-epochs.md&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;Rust epochs RFC&lt;/a&gt;, and the fact that more and more companies start relying on Rust.&lt;br /&gt;
We want Rust to be appealing to (big(ger)) companies, and to that end we need good compatibility between Rust versions, a high quality ecosystem of crates that work on stable Rust versions. We want newcomers to have a well prepared toolbox for their first projects.&lt;/p&gt;
&lt;p&gt;Before that stabilization goal appeared, Rust looked a bit chaotic, with new features coming every 6 weeks, new crates popping up here and there, people hacking something quickly and publishing it the next minute. And this is something I love about this language.&lt;br /&gt;
People try stuff, cargo lets them publish it easily, Rust makes sure it's running smoothly. Sure, there's a lot of redundant crates, most of them are far from the big &quot;1.0 stable&quot; target, but it's fine.&lt;br /&gt;
This language and its community are full of that unabashed optimism that makes newcomers go &quot;hey, should I really try to write my own kernel? OF COURSE I SHOULD&quot;. Should I try to make cool stuff with Web Assembly while it barely landed in nightly? YESSSSSS&lt;br /&gt;
I have seen over and over shitposting on twitter that ends up with people hacking on a cool new project. I have seen people publish a crate competing with another well known one, that will then send a PR for their idea to the bigger crate the next day.&lt;br /&gt;
I am overly enthusiastic about this, to the point that opening &lt;a href=&quot;http://reddit.com/r/rust/&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;/r/rust&lt;/a&gt;  often feels like Christmas: what new toys will we get today?&lt;/p&gt;
&lt;p&gt;So, to be clear, I am all for getting more stuff stable. We need a stable, asynchronous hyper. We need futures to work. We need impl trait and various other Rust features that will appear in the following months or years. What we do not need is the attitude that wants everything to crystallize.&lt;br /&gt;
How many times have I seen people criticising the &quot;yet another&quot; asynchronous IO/command line argument system/web framework/parser, with the usual arguments that this is lost focus, redundant, that why didn't they try to do that in $BIG_PROJECT. This is fine.&lt;br /&gt;
Go on, make other parser libraries to compete with nom, keep me on my toes. Try other approaches than tokio. Test different approaches to writing web applications.&lt;/p&gt;
&lt;p&gt;The underlying idea for me is that Rust is still incredibly young, extremely enthusiastic, and we still don't fully know how to write Rust. So, yes, we need some parts of Rust to stabilize, but we must balance that with its movement. What is stable and &quot;the way we do things&quot; now might not be the way to go in a year or so.&lt;/p&gt;
&lt;p&gt;Let people experiment and lose focus. Keep hacking on cool stuff.&lt;/p&gt;</content><author><name>{&quot;login&quot;=&gt;&quot;geaaal&quot;, &quot;email&quot;=&gt;&quot;geo.couprie@gmail.com&quot;, &quot;display_name&quot;=&gt;&quot;Géal&quot;, &quot;first_name&quot;=&gt;&quot;&quot;, &quot;last_name&quot;=&gt;&quot;&quot;}</name><email>geo.couprie@gmail.com</email></author><summary type="html">I initially did not want to write a post with what I want and foresee for Rust in 2018, because I'm already very happy with it! I have spent more than 4 years tinkering with the language, experimenting, and I love the freedom I get when playing with low level stuff. In those 4 years, I discovered a wonderful, welcoming community and made some awesome friends. So, yes, I'm happy with Rust as it is :) But some of the recent #Rust2018 posts made me react a bit. I'm interested in learning what other people see in Rust, so I read almost all of them, and there's an easy trend to follow. Rust should be stabilized. Rust should be boring and safe. Crates should be stabilized. We should have definitive crates for some purposes like HTTP clients or async programming. This is not surprising, since there's already been a lot of focus on stability in 2017, with the impl period, the merge of the Rust epochs RFC, and the fact that more and more companies start relying on Rust. We want Rust to be appealing to (big(ger)) companies, and to that end we need good compatibility between Rust versions, a high quality ecosystem of crates that work on stable Rust versions. We want newcomers to have a well prepared toolbox for their first projects. Before that stabilization goal appeared, Rust looked a bit chaotic, with new features coming every 6 weeks, new crates popping up here and there, people hacking something quickly and publishing it the next minute. And this is something I love about this language. People try stuff, cargo lets them publish it easily, Rust makes sure it's running smoothly. Sure, there's a lot of redundant crates, most of them are far from the big &quot;1.0 stable&quot; target, but it's fine. This language and its community are full of that unabashed optimism that makes newcomers go &quot;hey, should I really try to write my own kernel? OF COURSE I SHOULD&quot;. Should I try to make cool stuff with Web Assembly while it barely landed in nightly? YESSSSSS I have seen over and over shitposting on twitter that ends up with people hacking on a cool new project. I have seen people publish a crate competing with another well known one, that will then send a PR for their idea to the bigger crate the next day. I am overly enthusiastic about this, to the point that opening /r/rust often feels like Christmas: what new toys will we get today? So, to be clear, I am all for getting more stuff stable. We need a stable, asynchronous hyper. We need futures to work. We need impl trait and various other Rust features that will appear in the following months or years. What we do not need is the attitude that wants everything to crystallize. How many times have I seen people criticising the &quot;yet another&quot; asynchronous IO/command line argument system/web framework/parser, with the usual arguments that this is lost focus, redundant, that why didn't they try to do that in $BIG_PROJECT. This is fine. Go on, make other parser libraries to compete with nom, keep me on my toes. Try other approaches than tokio. Test different approaches to writing web applications. The underlying idea for me is that Rust is still incredibly young, extremely enthusiastic, and we still don't fully know how to write Rust. So, yes, we need some parts of Rust to stabilize, but we must balance that with its movement. What is stable and &quot;the way we do things&quot; now might not be the way to go in a year or so. Let people experiment and lose focus. Keep hacking on cool stuff.</summary></entry><entry><title type="html">Adventures in logging</title><link href="http://unhandledexpression.com/general/2017/08/23/adventures-in-logging.html" rel="alternate" type="text/html" title="Adventures in logging" /><published>2017-08-23T16:26:05+02:00</published><updated>2017-08-23T16:26:05+02:00</updated><id>http://unhandledexpression.com/general/2017/08/23/adventures-in-logging</id><content type="html" xml:base="http://unhandledexpression.com/general/2017/08/23/adventures-in-logging.html">&lt;p&gt;&lt;img src=&quot;/assets/truck_loaded_with_logs_on_logging_road_bc_1936.jpg&quot; alt=&quot;&quot; width=&quot;525&quot; height=&quot;279&quot; class=&quot;aligncenter size-full wp-image-1000&quot; /&gt;&lt;/p&gt;
&lt;p&gt;After working on the &lt;a href=&quot;https://github.com/sozu-proxy/sozu&quot;&gt;Sōzu HTTP reverse proxy&lt;/a&gt; for a while, I came up with &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs&quot;&gt;an interesting approach to logging&lt;/a&gt;. Now why would I come up with my own logger, when there are existing solutions in Rust? Mainly, &lt;a href=&quot;https://crates.io/crates/log&quot;&gt;log&lt;/a&gt; and &lt;a href=&quot;https://crates.io/crates/slog&quot;&gt;slog&lt;/a&gt;. That logging library grew up from testing things out with &lt;code&gt;log&lt;/code&gt;, and changing requirements along the way.&lt;/p&gt;
&lt;h1&gt;Beginning with log and env_logger&lt;/h1&gt;
&lt;p&gt;Like a lot of other Rust developers, I started out with &lt;code&gt;log&lt;/code&gt; and &lt;code&gt;env_logger&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
#[macro_use]&lt;br /&gt;
extern crate log;&lt;br /&gt;
extern crate env_logger;&lt;/p&gt;
&lt;p&gt;fn main() {&lt;br /&gt;
  env_logger::init().unwrap();&lt;/p&gt;
&lt;p&gt;  info!(&amp;quot;starting up&amp;quot;);&lt;br /&gt;
}&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;It's nice and easy: every library that depends on &lt;code&gt;log&lt;/code&gt; will use that same set of logging macros (error, info, warn, debug, trace) that will use whatever global logger was defined. Here we use &lt;code&gt;env_logger&lt;/code&gt; to define one.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;env_logger&lt;/code&gt; is useful because it can apply a filter to the log, from an&lt;br /&gt;
environment variable:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
# will show the logs above info level&lt;br /&gt;
RUST_LOG=info ./main&lt;/p&gt;
&lt;p&gt;# will show the logs above info level, but also logs above debug level&lt;br /&gt;
# for the dependency &amp;#039;dep1&amp;#039;&lt;br /&gt;
RUST_LOG=info,dep1=debug ./main&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;You can also define the filter &lt;a href=&quot;http://rust-lang-nursery.github.io/log/env_logger/#enabling-logging&quot;&gt;by module&lt;/a&gt; or &lt;a href=&quot;http://rust-lang-nursery.github.io/log/env_logger/#filtering-results&quot;&gt;apply a regular expression&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Custom formatter&lt;/h1&gt;
&lt;p&gt;&lt;code&gt;env_logger&lt;/code&gt; allows you to &lt;a href=&quot;https://docs.rs/env_logger/0.4.2/env_logger/struct.LogBuilder.html&quot;&gt;build your own log formatter&lt;/a&gt;. This feature is especially important for me, as I like to &lt;a href=&quot;https://www.clever-cloud.com/blog/engineering/2016/05/23/let-your-logs-help-you/&quot;&gt;add metadata to my logs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Defining a custom formatter with &lt;code&gt;env_logger&lt;/code&gt; is quite straightforward:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
let format = |record: &amp;amp;LogRecord| {&lt;br /&gt;
  format!(&amp;quot;{} - {}&amp;quot;, record.level(), record.args())&lt;br /&gt;
};&lt;/p&gt;
&lt;p&gt;let mut builder = LogBuilder::new();&lt;br /&gt;
builder.format(format).filter(None, LogLevelFilter::Info);&lt;/p&gt;
&lt;p&gt;if env::var(&amp;quot;RUST_LOG&amp;quot;).is_ok() {&lt;br /&gt;
  builder.parse(&amp;amp;env::var(&amp;quot;RUST_LOG&amp;quot;).unwrap());&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;builder.init().unwrap();&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;It is easily combined with the filtering and usage of the &lt;code&gt;RUST_LOG&lt;/code&gt; environment variable.&lt;/p&gt;
&lt;h1&gt;Where things get annoying: reducing allocations&lt;/h1&gt;
&lt;p&gt;If you take a look at &lt;code&gt;env_logger&lt;/code&gt;, you'll realize that it will allocate a &lt;code&gt;String&lt;/code&gt; for every log line that will be written, using a &lt;a href=&quot;https://docs.rs/env_logger/0.4.2/env_logger/struct.LogBuilder.html#method.format&quot;&gt;formatting closure&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let's get one thing out of the way first: I completely agree with the idea you should not try to optimize stuff too much. But I'm in the case of a networking component that will handle a lot of traffic. I had debugging sessions where I generated tens of gigabytes of logs in a few seconds, and needed almost all of them, to debug async IO issues. In those cases, the time spent allocating and deallocating log lines becomes relevant.&lt;/p&gt;
&lt;p&gt;So, how would I get a custom log formatter that does not allocate much? As it turns out, when you tell &lt;code&gt;log&lt;/code&gt; to use your logger with &lt;a href=&quot;https://docs.rs/log/0.3.8/log/fn.set_logger.html&quot;&gt;&lt;code&gt;log::set_logger&lt;/code&gt;&lt;/a&gt;, it requires something that implements &lt;a href=&quot;https://docs.rs/log/0.3.8/log/trait.Log.html&quot;&gt;Log&lt;/a&gt;. The logger's &lt;code&gt;log&lt;/code&gt; method receives a &lt;a href=&quot;https://docs.rs/log/0.3.8/log/struct.LogRecord.html&quot;&gt;LogRecord&lt;/a&gt;, a structure that's created on the fly from &lt;a href=&quot;https://docs.rs/log/0.3.8/log/struct.LogLocation.html&quot;&gt;LogLocation&lt;/a&gt;, &lt;a href=&quot;https://docs.rs/log/0.3.8/log/struct.LogMetadata.html&quot;&gt;LogMetadata&lt;/a&gt; and &lt;a href=&quot;https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html&quot;&gt;Arguments&lt;/a&gt;.&lt;br /&gt;
The first two are internal to &lt;code&gt;log&lt;/code&gt;, I can't create them myself. The last one is interesting.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Arguments&lt;/code&gt; can be created from the &lt;a href=&quot;https://doc.rust-lang.org/nightly/std/macro.format_args.html&quot;&gt;format_args macro&lt;/a&gt;. That structure will roughly contain the format string split in the various substrings that appear between arguments. if you do &lt;code&gt;println!(&quot;hello {}!&quot;, name)&lt;/code&gt;, you would get a structure that contains &lt;code&gt;&quot;hello &quot;&lt;/code&gt;, the content of &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;&quot;!&quot;&lt;/code&gt;. &lt;code&gt;println!&lt;/code&gt; and other macros use this.&lt;/p&gt;
&lt;p&gt;You can then use that &lt;code&gt;Arguments&lt;/code&gt; with &lt;a href=&quot;https://doc.rust-lang.org/nightly/std/io/trait.Write.html#method.write_fmt&quot;&gt;&lt;code&gt;io::Write::write_fmt&lt;/code&gt;&lt;/a&gt; to write it directly to, say, a file or a socket. And it is implemented so that &lt;a href=&quot;https://doc.rust-lang.org/nightly/src/core/fmt/mod.rs.html#948-986&quot;&gt;the individual parts are written one after another instead of allocating one big string&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So, how do I use that?&lt;/p&gt;
&lt;p&gt;Well, it turns out that, basically, I can't. If I implement &lt;a href=&quot;https://docs.rs/log/0.3.8/log/trait.Log.html&quot;&gt;Log&lt;/a&gt;, I can get a &lt;a href=&quot;https://docs.rs/log/0.3.8/log/struct.LogRecord.html&quot;&gt;Logrecord&lt;/a&gt; which gives me a &lt;code&gt;&amp;amp;amp;Arguments&lt;/code&gt;, while &lt;a href=&quot;https://doc.rust-lang.org/nightly/std/fmt/fn.write.html&quot;&gt;write&lt;/a&gt; requires a &lt;code&gt;Arguments&lt;/code&gt;. So now I have to clone it, which defeats a bit the purpose.&lt;/p&gt;
&lt;h1&gt;So let's write our own then&lt;/h1&gt;
&lt;p&gt;There was another reason for the custom logging library: &lt;a href=&quot;https://github.com/rust-lang-nursery/log/commit/a16173429dab789407328b682c3db30d84c5a03c&quot;&gt;using a custom logging backend&lt;/a&gt;. Having the option between stdout and stderr is fine, but I might want to send them to a file or a socket.&lt;/p&gt;
&lt;p&gt;So I started writing a specific logging library for sozu. First, &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs#L95-L343&quot;&gt;copying the log filtering from env_logger&lt;/a&gt;. That part is mostly straightforward, but that's still a lot of code to copy around.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs#L387-L445&quot;&gt;logging macros&lt;/a&gt; specify a logging level then they all call the same common macro.&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
#[macro_export]&lt;br /&gt;
macro_rules! error {&lt;br /&gt;
    ($format:expr, $($arg:tt)*) =&amp;gt; {&lt;br /&gt;
        log!($crate::logging::LogLevel::Error, $format, &amp;quot;ERROR&amp;quot;, $($arg)*);&lt;br /&gt;
    };&lt;br /&gt;
    ($format:expr) =&amp;gt; {&lt;br /&gt;
        log!($crate::logging::LogLevel::Error, $format, &amp;quot;ERROR&amp;quot;);&lt;br /&gt;
    };&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs#L348-L365&quot;&gt;main logging macro&lt;/a&gt; has two interesting parts. First, we define some static metadata (that's coming from the log crate):&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
static _META: $crate::logging::LogMetadata = $crate::logging::LogMetadata {&lt;br /&gt;
  level:  $lvl,&lt;br /&gt;
  target: module_path!(),&lt;br /&gt;
};&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;That object will not be allocated over and over, all the data in there will be defined at compile time.&lt;/p&gt;
&lt;p&gt;Then we call the logger itself (ignore the line with &lt;code&gt;try_lock&lt;/code&gt; for now):&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
if let Ok(mut logger) = LOGGER.try_lock() {&lt;br /&gt;
  logger.log(&lt;br /&gt;
    &amp;amp;_META,&lt;br /&gt;
    format_args!(&lt;br /&gt;
      concat!(&amp;quot;{}\t{}\t{}\t{}\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;),&lt;br /&gt;
        ::time::now_utc().rfc3339(), ::time::precise_time_ns(), *$crate::logging::PID,&lt;br /&gt;
        $level_tag, *$crate::logging::TAG));&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;So we give this metadata structure to our logger, then we make an &lt;code&gt;Arguments&lt;/code&gt; structure with &lt;code&gt;format_args!&lt;/code&gt;. The &lt;a href=&quot;https://doc.rust-lang.org/std/macro.concat.html&quot;&gt;concat!&lt;/a&gt; macro is there to concatenate the formatting string with the custom prefix. That way, I could write &lt;code&gt;debug!(&quot;hello {}&quot;, name)&lt;/code&gt; and have the resulting format string be &lt;code&gt;&quot;{}\t{}\t{}\t{}\t{}\thello {}\n&quot;&lt;/code&gt;, generated at compile time and transformed through the &lt;code&gt;format_args&lt;/code&gt; call.&lt;/p&gt;
&lt;p&gt;I added the date in ISO format, along with a monotonic timestamp (that becomes handy when multiple workers might write logs concurrently), the process identifier, the log level and a process wide logging tag (to better identify workers).&lt;/p&gt;
&lt;p&gt;So this starts looking good, right? Now how do we write this to configurable backends? Some backends already implement &lt;code&gt;io::Write&lt;/code&gt;, &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs#L49-L68&quot;&gt;others will need an intermediary buffer&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
pub fn log&amp;lt;&amp;#039;a&amp;gt;(&amp;amp;mut self, meta: &amp;amp;LogMetadata, args: Arguments) {&lt;br /&gt;
    if self.enabled(meta) {&lt;br /&gt;
        match self.backend {&lt;br /&gt;
            LoggerBackend::Stdout(ref mut stdout) =&amp;gt; {&lt;br /&gt;
                stdout.write_fmt(args);&lt;br /&gt;
            },&lt;br /&gt;
            LoggerBackend::Unix(ref mut socket) =&amp;gt; {&lt;br /&gt;
                socket.send(format(args).as_bytes());&lt;br /&gt;
            },&lt;br /&gt;
            LoggerBackend::Udp(ref mut socket, ref address) =&amp;gt; {&lt;br /&gt;
                socket.send_to(format(args).as_bytes(), address);&lt;br /&gt;
            }&lt;br /&gt;
            LoggerBackend::Tcp(ref mut socket) =&amp;gt; {&lt;br /&gt;
                socket.write_fmt(args);&lt;br /&gt;
            },&lt;br /&gt;
        }&lt;br /&gt;
    }&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;For Unix sockets and UDP, instead of allocating on the fly, it should probably use a buffer (hey, anyone wants to implement that?). Stdout and a TcpStream can be written to directly. Adding buffers might still be a good idea here, depending on what you want, because that write could fail. Would you like a logger that will send a partial log if it can't write on the socket, or one using a buffer that can be filled up?&lt;/p&gt;
&lt;p&gt;So, now, what's next? Originally, sozu worked as one process with multiple threads, but evolved as a bunch of single threaded processes. But that raises an interesting question. How do you write logs concurrently?&lt;/p&gt;
&lt;h1&gt;Highly concurrent logs&lt;/h1&gt;
&lt;p&gt;It turns out that problem is not really easy. Most solutions end up in this list:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;every thread or process writes to stdout or a file at the same time&lt;/li&gt;
&lt;li&gt;synchronized access to the logging output&lt;/li&gt;
&lt;li&gt;one common logger everybody sends to&lt;/li&gt;
&lt;li&gt;every thread or process has its own connection to syslog (or even its own file to write to)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first solution is easy, but has a few issues. First, writing to stdout is slow, and it can quickly overwhelm your terminal (yes, I know you can redirect to a file). Second, it's not really synchronized, so you might end up with incoherently interleaved log lines.&lt;/p&gt;
&lt;p&gt;So we often move to the second solution, where access to the logging stream is protected by a mutex. Now you get multiple threads or processes that might spend their time waiting on each other for serializing and writing logs. Having all threads sharing one lock can quickly affect your performance. It's generally a better idea to have every thread or process running independently from the others (it's one of the principles in sozu's architecture, you can learn more about it &lt;a href=&quot;https://www.youtube.com/watch?v=Cl_fqWZTYUA&quot;&gt;in this french talk&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Alright then, moving on to the third solution: let's have one of the threads or processes handle the logging, and send the logs to it via cross thread channels or IPC. That will surely be easier than having everybody share a lock, right? This is also intersting because you can offload serialization and filtering to the logging thread. Unfortunately, that means one thread will handle &lt;em&gt;all&lt;/em&gt; the logs, and it can be overwhelming. That also means a lot of data moving between workers (if using processes).&lt;/p&gt;
&lt;p&gt;The last solution relies on external services: use the syslog daemon on your system, or a log aggregator somewhere on another machine, that every worker will send the logs to. Let that service interleave logs, and maybe scale up if necessary. Since in a large architecture, you might have such an aggregator, talking to it directly might be a good idea (oh hey BTW I wrote &lt;a href=&quot;https://github.com/geal/rust-syslog&quot;&gt;a syslog crate&lt;/a&gt; if you need).&lt;/p&gt;
&lt;p&gt;With sozu, I ended up with a mixed solution. You can send the logs to various backends. If you choose stdout, then all workers will write to it directly without synchronization which will be mostly fine if you don't have a large traffic. But if you want, each worker can open its own connection to the logging service.&lt;/p&gt;
&lt;p&gt;Now that concurrency is taken care of, there's a last issue that has annoyed me for months: how to use the logger from everywhere in the code, when it's obviously one big global mutable object?&lt;/p&gt;
&lt;h1&gt;the dreaded logging singleton&lt;/h1&gt;
&lt;p&gt;One thing I like about the macros from the log crate: you can use them anywhere in your code, and it will work. The other approach, used in languages like Haskell, or proposed by &lt;a href=&quot;https://crates.io/crates/slog&quot;&gt;slog&lt;/a&gt;, is to carry your logger around, in function arguments or structure members. I can understand the idea, but I don't like it much, because I'll often need to add a debug call anywhere in the code, and when it's deep in a serie of five method calls, with those methods coming from traits implemented here and there, updating the types quickly gets annoying.&lt;/p&gt;
&lt;p&gt;So, even if the idea of that global mutable logger singleton usually looks like a bad pattern, it can still be useful. In log, the &lt;a href=&quot;https://github.com/rust-lang-nursery/log/blob/master/src/macros.rs#L41-L57&quot;&gt;log macro&lt;/a&gt; calls the &lt;code&gt;log::Log::log&lt;/code&gt; method, getting the logger instance from the &lt;a href=&quot;https://github.com/rust-lang-nursery/log/blob/master/src/lib.rs#L1217-L1226&quot;&gt;logger method&lt;/a&gt;. That method gets the logger instance from &lt;a href=&quot;https://github.com/rust-lang-nursery/log/blob/master/src/lib.rs#L292-L297&quot;&gt;a global pointer to the logger&lt;/a&gt;, with an atomic integer used to indicate if the logger was initialized:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
static mut LOGGER: *const Log = &amp;amp;NopLogger;&lt;br /&gt;
static STATE: AtomicUsize = ATOMIC_USIZE_INIT;&lt;/p&gt;
&lt;p&gt;const UNINITIALIZED: usize = 0;&lt;br /&gt;
const INITIALIZING: usize = 1;&lt;br /&gt;
const INITIALIZED: usize = 2;&lt;/p&gt;
&lt;p&gt;[...]&lt;/p&gt;
&lt;p&gt;pub fn logger() -&amp;gt; &amp;amp;&amp;#039;static Log {&lt;br /&gt;
    unsafe {&lt;br /&gt;
        if STATE.load(Ordering::SeqCst) != INITIALIZED {&lt;br /&gt;
            static NOP: NopLogger = NopLogger;&lt;br /&gt;
            &amp;amp;NOP&lt;br /&gt;
        } else {&lt;br /&gt;
            &amp;amp;*LOGGER&lt;br /&gt;
        }&lt;br /&gt;
    }&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;So how can that global mutable pointer be used from any thread? That's because &lt;a href=&quot;https://github.com/rust-lang-nursery/log/blob/c4faf3dbb003d004699f378ee395c81ea03f9619/src/lib.rs#L921&quot;&gt;the Log trait requires Send and Sync&lt;/a&gt;. As a &lt;a href=&quot;https://doc.rust-lang.org/beta/nomicon/send-and-sync.html&quot;&gt;reminder&lt;/a&gt;, &lt;code&gt;Send&lt;/code&gt; means it can be sent to another thread, &lt;code&gt;Sync&lt;/code&gt; means it can be shared between threads.&lt;/p&gt;
&lt;p&gt;That's a cool trick, but usually we employ &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs#L13-L17&quot;&gt;another pattern&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
lazy_static! {&lt;br /&gt;
  pub static ref LOGGER: Mutex&amp;lt;Logger&amp;gt; = Mutex::new(Logger::new());&lt;br /&gt;
  pub static ref PID:    i32           = unsafe { libc::getpid() };&lt;br /&gt;
  pub static ref TAG:    String        = LOGGER.lock().unwrap().tag.clone();&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://crates.io/crates/lazy_static&quot;&gt;lazy_static&lt;/a&gt; allows you to define static variables that will be initialized at runtime, at first access (using the same pattern as log with &lt;a href=&quot;https://doc.rust-lang.org/std/sync/struct.Once.html&quot;&gt;std::sync::Once&lt;/a&gt;. Since our logger's &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs#L49&quot;&gt;log method&lt;/a&gt; needs an &lt;code&gt;&amp;amp;amp;mut&lt;/code&gt;, it's wrapped in a Mutex. That's where the &lt;a href=&quot;https://github.com/sozu-proxy/sozu/blob/57f99560a85b707c867de6206635b1f8f166e078/lib/src/logging.rs#L356&quot;&gt;call to try_lock&lt;/a&gt; comes from.&lt;/p&gt;
&lt;p&gt;We might think the mutex is costly compared to log's solution, but remember that the logger instance has to be &lt;code&gt;Sync&lt;/code&gt;, so depending on your implementation, there might be some synchronization somewhere. Except that for sozu, it's not the case! Each worker is single threaded, and has its own instance of the logger (possibly with each of them a connection to the logging service). Can't we have a logging system that does not require that mutex used everywhere?&lt;/p&gt;
&lt;h1&gt;Removing the last Mutex&lt;/h1&gt;
&lt;p&gt;This is a problem that annoyed me for months. It's not that I really mind the cost of that mutex (since no other thread ever touches it). It's just that I'd feel better not using one when I don't need it :)&lt;/p&gt;
&lt;p&gt;And the solution to that problem got quite interesting. To mess around a bit, here's a playground of the logging solution based on &lt;a href=&quot;http://play.integer32.com/?gist=4e9a8f30cdaae10b096e445f1e2ec8bd&quot;&gt;lazy_static&lt;/a&gt;. You'll see why the code in &lt;code&gt;Foo::modify&lt;/code&gt; is important.&lt;/p&gt;
&lt;p&gt;There's a feature you can use to have a global variable available anywhere in a thread: &lt;a href=&quot;https://doc.rust-lang.org/std/thread/struct.LocalKey.html&quot;&gt;thread_local&lt;/a&gt;. It uses thread local storage with an &lt;a href=&quot;https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html&quot;&gt;UnsafeCell&lt;/a&gt; to initialize and provide a variable specific to each thread:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
thread_local!(static FOO: RefCell&amp;lt;u32&amp;gt; = RefCell::new(1));&lt;/p&gt;
&lt;p&gt;FOO.with(|f| {&lt;br /&gt;
    assert_eq!(*f.borrow(), 1);&lt;br /&gt;
    *f.borrow_mut() = 2;&lt;br /&gt;
});&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;So I tried to use this with my logger, but encountered an interesting bug, as you can see in &lt;a href=&quot;http://play.integer32.com/?gist=fa4995e189ea3394a6f1115b17ba8ee7&quot;&gt;another playground&lt;/a&gt;. I replaced my logging macro with this:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
#[macro_export]&lt;br /&gt;
macro_rules! log {&lt;br /&gt;
    ($format:expr, $($arg:tt)+) =&amp;gt; ({&lt;br /&gt;
        {&lt;br /&gt;
            LOGGER.with(|l| {&lt;br /&gt;
                l.borrow_mut().log(&lt;br /&gt;
                    format_args!(&lt;br /&gt;
                        concat!(&amp;quot;\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;),&lt;br /&gt;
                        &amp;quot;a&amp;quot;, $($arg)+));&lt;br /&gt;
            });&lt;br /&gt;
        }&lt;br /&gt;
    });&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;And got this error:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
error[E0502]: cannot borrow `self` as immutable because `self.opt.0` is also borrowed as mutable&lt;br /&gt;
  --&amp;gt; src/main.rs:36:21&lt;br /&gt;
   |&lt;br /&gt;
36 |         LOGGER.with(|l| {&lt;br /&gt;
   |                     ^^^ immutable borrow occurs here&lt;br /&gt;
...&lt;br /&gt;
54 |         if let Some(ref mut s) = self.opt {&lt;br /&gt;
   |                     --------- mutable borrow occurs here&lt;br /&gt;
55 |             log!(&amp;quot;changing {} to {}&amp;quot;, self.bar, new);&lt;br /&gt;
   |             -----------------------------------------&lt;br /&gt;
   |             |                         |&lt;br /&gt;
   |             |                         borrow occurs due to use of `self` in closure&lt;br /&gt;
   |             in this macro invocation&lt;br /&gt;
56 |         }&lt;br /&gt;
   |         - mutable borrow ends here&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;When implementing this inside sozu's source, I got about 330 errors like this one... So what happened? That &lt;a href=&quot;https://doc.rust-lang.org/std/thread/struct.LocalKey.html#method.with&quot;&gt;with method&lt;/a&gt; requires a closure. Since we use a macro, if we use &lt;code&gt;self.bar&lt;/code&gt; as argument, it will appear inside the closure. That becomes an issue with anything that has been already mutably borrowed somewhere.&lt;/p&gt;
&lt;p&gt;I tried a few things, like calling &lt;code&gt;format_args&lt;/code&gt; outside the closure, but I get the error &lt;code&gt;error[E0597]: borrowed value does not live long enough&lt;/code&gt;. This is apparently &lt;a href=&quot;https://github.com/rust-lang/rust/issues/42253&quot;&gt;a common problem with format_args&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But the real solution came from &lt;a href=&quot;https://twitter.com/tomaka17&quot;&gt;tomaka&lt;/a&gt;, with some macro trickery, as seen in &lt;a href=&quot;http://play.integer32.com/?gist=ea040e83ea680678c75113a73ad9ef82&quot;&gt;one last playground&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
#[macro_export]&lt;br /&gt;
macro_rules! log {&lt;br /&gt;
    ($format:expr $(, $args:expr)*) =&amp;gt; ({&lt;br /&gt;
        log!(__inner__ $format, [], [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v]&lt;br /&gt;
             $(, $args)*)&lt;br /&gt;
    });&lt;/p&gt;
&lt;p&gt;    (__inner__ $format:expr, [$($transformed_args:ident),*], [$first_ident:ident $(, $other_idents:ident)*], $first_arg:expr $(, $other_args:expr)*) =&amp;gt; ({&lt;br /&gt;
        let $first_ident = &amp;amp;$first_arg;&lt;br /&gt;
        log!(__inner__ $format, [$($transformed_args,)* $first_ident], [$($other_idents),*] $(, $other_args)*);&lt;br /&gt;
    });&lt;/p&gt;
&lt;p&gt;    (__inner__ $format:expr, [$($final_args:ident),*], [$($idents:ident),*]) =&amp;gt; ({&lt;br /&gt;
        LOGGER.with(move |l| {&lt;br /&gt;
          //let mut logger = *l.borrow_mut();&lt;br /&gt;
          l.borrow_mut().log(&lt;br /&gt;
            format_args!(&lt;br /&gt;
              concat!(&amp;quot;\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;),&lt;br /&gt;
              &amp;quot;a&amp;quot; $(, $final_args)*));&lt;br /&gt;
        });&lt;br /&gt;
    });&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;The basic idea is that we could avoid the borrowing issue by doing an additional borrow. But since some of the arguments might by expressions (like &lt;code&gt;1+1&lt;/code&gt; or &lt;code&gt;self.size&lt;/code&gt;), we will store the reference to it in a local variable, with &lt;code&gt;let $first_ident = &amp;amp;amp;$first_arg;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We cannot create variable or function names in macros out of thin air (sadly, because that would be extremely useful), so we instead do recursive macros calls, consuming arguments one after another.&lt;br /&gt;
In &lt;code&gt;[$($transformed_args:ident),*], [$first_ident:ident $(, $other_idents:ident)*], $first_arg:expr $(, $other_args:expr)*&lt;/code&gt;,&lt;br /&gt;
&lt;code&gt;transformed_args&lt;/code&gt; accumulates the idents (variable names) in which we stored the data.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;[$first_ident:ident $(, $other_idents:ident)*]&lt;/code&gt; is matching on the list that started as &lt;code&gt;[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v]&lt;/code&gt;, to get the first one in that list (storing it in &lt;code&gt;$first_ident&lt;/code&gt;), and using it as variable names. As you might have guessed, that means I won't be able to use a log line with more than 21 arguments. That's a limitation I can live with.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;$first_arg:expr $(, $other_args:expr)*&lt;/code&gt; part matches on the log call's arguments, and gets the first in the list as &lt;code&gt;$first_arg&lt;/code&gt;. We then use those in the line &lt;code&gt;let $first_ident = &amp;amp;amp;$first_arg;&lt;/code&gt; and recursively call &lt;code&gt;log!&lt;/code&gt;, adding the variable name &lt;code&gt;$first_ident&lt;/code&gt; to the list of transformed arguments, and the rest of the variable names list and the log call's arguments:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
log!(__inner__ $format, [$($transformed_args,)* $first_ident], [$($other_idents),*] $(, $other_args)*);&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;Once all of the logger's arguments are consumed, we can call &lt;code&gt;format_args&lt;/code&gt; on the list of &lt;code&gt;$transformed_args&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;[code lang=text]&lt;br /&gt;
(__inner__ $format:expr, [$($transformed_args:ident),*], [$($idents:ident),*]) =&amp;gt; ({&lt;br /&gt;
    LOGGER.with(move |l| {&lt;br /&gt;
            //let mut logger = *l.borrow_mut();&lt;br /&gt;
            l.borrow_mut().log(&lt;br /&gt;
                    format_args!(&lt;br /&gt;
                        concat!(&amp;quot;\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;),&lt;br /&gt;
                        &amp;quot;a&amp;quot; $(, $transformed_args)*));&lt;br /&gt;
            });&lt;br /&gt;
    });&lt;br /&gt;
});&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;and it works!&lt;/p&gt;
&lt;p&gt;So, that last part may not be completely relevant to your logger implementation, but I thought&lt;br /&gt;
it was quite cool :)&lt;/p&gt;
&lt;p&gt;Despite my issues with the log crate, it's quite useful and supported by a lot of libraries. It is currently getting a lot better as part of the &lt;a href=&quot;https://internals.rust-lang.org/t/crate-evaluation-for-2017-05-16-log/5185&quot;&gt;libz blitz&lt;/a&gt;. I'd also encourage you to check out &lt;a href=&quot;https://crates.io/crates/slog&quot;&gt;slog&lt;/a&gt;. I haven't felt the need to integrate it in sozu yet, but it can be interesting for new projects, as it comes with an ecosystem of composable libraries to extend it.&lt;/p&gt;</content><author><name>{&quot;login&quot;=&gt;&quot;geaaal&quot;, &quot;email&quot;=&gt;&quot;geo.couprie@gmail.com&quot;, &quot;display_name&quot;=&gt;&quot;Géal&quot;, &quot;first_name&quot;=&gt;&quot;&quot;, &quot;last_name&quot;=&gt;&quot;&quot;}</name><email>geo.couprie@gmail.com</email></author><category term="Rust" /><summary type="html">After working on the Sōzu HTTP reverse proxy for a while, I came up with an interesting approach to logging. Now why would I come up with my own logger, when there are existing solutions in Rust? Mainly, log and slog. That logging library grew up from testing things out with log, and changing requirements along the way. Beginning with log and env_logger Like a lot of other Rust developers, I started out with log and env_logger: #[macro_use] extern crate log; extern crate env_logger; fn main() { env_logger::init().unwrap(); info!(&amp;quot;starting up&amp;quot;); } It's nice and easy: every library that depends on log will use that same set of logging macros (error, info, warn, debug, trace) that will use whatever global logger was defined. Here we use env_logger to define one. env_logger is useful because it can apply a filter to the log, from an environment variable: [code lang=text] # will show the logs above info level RUST_LOG=info ./main # will show the logs above info level, but also logs above debug level # for the dependency &amp;#039;dep1&amp;#039; RUST_LOG=info,dep1=debug ./main [/code] You can also define the filter by module or apply a regular expression. Custom formatter env_logger allows you to build your own log formatter. This feature is especially important for me, as I like to add metadata to my logs. Defining a custom formatter with env_logger is quite straightforward: [code lang=text] let format = |record: &amp;amp;LogRecord| { format!(&amp;quot;{} - {}&amp;quot;, record.level(), record.args()) }; let mut builder = LogBuilder::new(); builder.format(format).filter(None, LogLevelFilter::Info); if env::var(&amp;quot;RUST_LOG&amp;quot;).is_ok() { builder.parse(&amp;amp;env::var(&amp;quot;RUST_LOG&amp;quot;).unwrap()); } builder.init().unwrap(); [/code] It is easily combined with the filtering and usage of the RUST_LOG environment variable. Where things get annoying: reducing allocations If you take a look at env_logger, you'll realize that it will allocate a String for every log line that will be written, using a formatting closure. Let's get one thing out of the way first: I completely agree with the idea you should not try to optimize stuff too much. But I'm in the case of a networking component that will handle a lot of traffic. I had debugging sessions where I generated tens of gigabytes of logs in a few seconds, and needed almost all of them, to debug async IO issues. In those cases, the time spent allocating and deallocating log lines becomes relevant. So, how would I get a custom log formatter that does not allocate much? As it turns out, when you tell log to use your logger with log::set_logger, it requires something that implements Log. The logger's log method receives a LogRecord, a structure that's created on the fly from LogLocation, LogMetadata and Arguments. The first two are internal to log, I can't create them myself. The last one is interesting. Arguments can be created from the format_args macro. That structure will roughly contain the format string split in the various substrings that appear between arguments. if you do println!(&quot;hello {}!&quot;, name), you would get a structure that contains &quot;hello &quot;, the content of name and &quot;!&quot;. println! and other macros use this. You can then use that Arguments with io::Write::write_fmt to write it directly to, say, a file or a socket. And it is implemented so that the individual parts are written one after another instead of allocating one big string. So, how do I use that? Well, it turns out that, basically, I can't. If I implement Log, I can get a Logrecord which gives me a &amp;amp;amp;Arguments, while write requires a Arguments. So now I have to clone it, which defeats a bit the purpose. So let's write our own then There was another reason for the custom logging library: using a custom logging backend. Having the option between stdout and stderr is fine, but I might want to send them to a file or a socket. So I started writing a specific logging library for sozu. First, copying the log filtering from env_logger. That part is mostly straightforward, but that's still a lot of code to copy around. The logging macros specify a logging level then they all call the same common macro. [code lang=text] #[macro_export] macro_rules! error { ($format:expr, $($arg:tt)*) =&amp;gt; { log!($crate::logging::LogLevel::Error, $format, &amp;quot;ERROR&amp;quot;, $($arg)*); }; ($format:expr) =&amp;gt; { log!($crate::logging::LogLevel::Error, $format, &amp;quot;ERROR&amp;quot;); }; } [/code] The main logging macro has two interesting parts. First, we define some static metadata (that's coming from the log crate): [code lang=text] static _META: $crate::logging::LogMetadata = $crate::logging::LogMetadata { level: $lvl, target: module_path!(), }; [/code] That object will not be allocated over and over, all the data in there will be defined at compile time. Then we call the logger itself (ignore the line with try_lock for now): [code lang=text] if let Ok(mut logger) = LOGGER.try_lock() { logger.log( &amp;amp;_META, format_args!( concat!(&amp;quot;{}\t{}\t{}\t{}\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;), ::time::now_utc().rfc3339(), ::time::precise_time_ns(), *$crate::logging::PID, $level_tag, *$crate::logging::TAG)); } [/code] So we give this metadata structure to our logger, then we make an Arguments structure with format_args!. The concat! macro is there to concatenate the formatting string with the custom prefix. That way, I could write debug!(&quot;hello {}&quot;, name) and have the resulting format string be &quot;{}\t{}\t{}\t{}\t{}\thello {}\n&quot;, generated at compile time and transformed through the format_args call. I added the date in ISO format, along with a monotonic timestamp (that becomes handy when multiple workers might write logs concurrently), the process identifier, the log level and a process wide logging tag (to better identify workers). So this starts looking good, right? Now how do we write this to configurable backends? Some backends already implement io::Write, others will need an intermediary buffer: [code lang=text] pub fn log&amp;lt;&amp;#039;a&amp;gt;(&amp;amp;mut self, meta: &amp;amp;LogMetadata, args: Arguments) { if self.enabled(meta) { match self.backend { LoggerBackend::Stdout(ref mut stdout) =&amp;gt; { stdout.write_fmt(args); }, LoggerBackend::Unix(ref mut socket) =&amp;gt; { socket.send(format(args).as_bytes()); }, LoggerBackend::Udp(ref mut socket, ref address) =&amp;gt; { socket.send_to(format(args).as_bytes(), address); } LoggerBackend::Tcp(ref mut socket) =&amp;gt; { socket.write_fmt(args); }, } } } [/code] For Unix sockets and UDP, instead of allocating on the fly, it should probably use a buffer (hey, anyone wants to implement that?). Stdout and a TcpStream can be written to directly. Adding buffers might still be a good idea here, depending on what you want, because that write could fail. Would you like a logger that will send a partial log if it can't write on the socket, or one using a buffer that can be filled up? So, now, what's next? Originally, sozu worked as one process with multiple threads, but evolved as a bunch of single threaded processes. But that raises an interesting question. How do you write logs concurrently? Highly concurrent logs It turns out that problem is not really easy. Most solutions end up in this list: every thread or process writes to stdout or a file at the same time synchronized access to the logging output one common logger everybody sends to every thread or process has its own connection to syslog (or even its own file to write to) The first solution is easy, but has a few issues. First, writing to stdout is slow, and it can quickly overwhelm your terminal (yes, I know you can redirect to a file). Second, it's not really synchronized, so you might end up with incoherently interleaved log lines. So we often move to the second solution, where access to the logging stream is protected by a mutex. Now you get multiple threads or processes that might spend their time waiting on each other for serializing and writing logs. Having all threads sharing one lock can quickly affect your performance. It's generally a better idea to have every thread or process running independently from the others (it's one of the principles in sozu's architecture, you can learn more about it in this french talk). Alright then, moving on to the third solution: let's have one of the threads or processes handle the logging, and send the logs to it via cross thread channels or IPC. That will surely be easier than having everybody share a lock, right? This is also intersting because you can offload serialization and filtering to the logging thread. Unfortunately, that means one thread will handle all the logs, and it can be overwhelming. That also means a lot of data moving between workers (if using processes). The last solution relies on external services: use the syslog daemon on your system, or a log aggregator somewhere on another machine, that every worker will send the logs to. Let that service interleave logs, and maybe scale up if necessary. Since in a large architecture, you might have such an aggregator, talking to it directly might be a good idea (oh hey BTW I wrote a syslog crate if you need). With sozu, I ended up with a mixed solution. You can send the logs to various backends. If you choose stdout, then all workers will write to it directly without synchronization which will be mostly fine if you don't have a large traffic. But if you want, each worker can open its own connection to the logging service. Now that concurrency is taken care of, there's a last issue that has annoyed me for months: how to use the logger from everywhere in the code, when it's obviously one big global mutable object? the dreaded logging singleton One thing I like about the macros from the log crate: you can use them anywhere in your code, and it will work. The other approach, used in languages like Haskell, or proposed by slog, is to carry your logger around, in function arguments or structure members. I can understand the idea, but I don't like it much, because I'll often need to add a debug call anywhere in the code, and when it's deep in a serie of five method calls, with those methods coming from traits implemented here and there, updating the types quickly gets annoying. So, even if the idea of that global mutable logger singleton usually looks like a bad pattern, it can still be useful. In log, the log macro calls the log::Log::log method, getting the logger instance from the logger method. That method gets the logger instance from a global pointer to the logger, with an atomic integer used to indicate if the logger was initialized: [code lang=text] static mut LOGGER: *const Log = &amp;amp;NopLogger; static STATE: AtomicUsize = ATOMIC_USIZE_INIT; const UNINITIALIZED: usize = 0; const INITIALIZING: usize = 1; const INITIALIZED: usize = 2; [...] pub fn logger() -&amp;gt; &amp;amp;&amp;#039;static Log { unsafe { if STATE.load(Ordering::SeqCst) != INITIALIZED { static NOP: NopLogger = NopLogger; &amp;amp;NOP } else { &amp;amp;*LOGGER } } } [/code] So how can that global mutable pointer be used from any thread? That's because the Log trait requires Send and Sync. As a reminder, Send means it can be sent to another thread, Sync means it can be shared between threads. That's a cool trick, but usually we employ another pattern: [code lang=text] lazy_static! { pub static ref LOGGER: Mutex&amp;lt;Logger&amp;gt; = Mutex::new(Logger::new()); pub static ref PID: i32 = unsafe { libc::getpid() }; pub static ref TAG: String = LOGGER.lock().unwrap().tag.clone(); } [/code] lazy_static allows you to define static variables that will be initialized at runtime, at first access (using the same pattern as log with std::sync::Once. Since our logger's log method needs an &amp;amp;amp;mut, it's wrapped in a Mutex. That's where the call to try_lock comes from. We might think the mutex is costly compared to log's solution, but remember that the logger instance has to be Sync, so depending on your implementation, there might be some synchronization somewhere. Except that for sozu, it's not the case! Each worker is single threaded, and has its own instance of the logger (possibly with each of them a connection to the logging service). Can't we have a logging system that does not require that mutex used everywhere? Removing the last Mutex This is a problem that annoyed me for months. It's not that I really mind the cost of that mutex (since no other thread ever touches it). It's just that I'd feel better not using one when I don't need it :) And the solution to that problem got quite interesting. To mess around a bit, here's a playground of the logging solution based on lazy_static. You'll see why the code in Foo::modify is important. There's a feature you can use to have a global variable available anywhere in a thread: thread_local. It uses thread local storage with an UnsafeCell to initialize and provide a variable specific to each thread: [code lang=text] thread_local!(static FOO: RefCell&amp;lt;u32&amp;gt; = RefCell::new(1)); FOO.with(|f| { assert_eq!(*f.borrow(), 1); *f.borrow_mut() = 2; }); [/code] So I tried to use this with my logger, but encountered an interesting bug, as you can see in another playground. I replaced my logging macro with this: [code lang=text] #[macro_export] macro_rules! log { ($format:expr, $($arg:tt)+) =&amp;gt; ({ { LOGGER.with(|l| { l.borrow_mut().log( format_args!( concat!(&amp;quot;\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;), &amp;quot;a&amp;quot;, $($arg)+)); }); } }); } [/code] And got this error: [code lang=text] error[E0502]: cannot borrow `self` as immutable because `self.opt.0` is also borrowed as mutable --&amp;gt; src/main.rs:36:21 | 36 | LOGGER.with(|l| { | ^^^ immutable borrow occurs here ... 54 | if let Some(ref mut s) = self.opt { | --------- mutable borrow occurs here 55 | log!(&amp;quot;changing {} to {}&amp;quot;, self.bar, new); | ----------------------------------------- | | | | | borrow occurs due to use of `self` in closure | in this macro invocation 56 | } | - mutable borrow ends here [/code] When implementing this inside sozu's source, I got about 330 errors like this one... So what happened? That with method requires a closure. Since we use a macro, if we use self.bar as argument, it will appear inside the closure. That becomes an issue with anything that has been already mutably borrowed somewhere. I tried a few things, like calling format_args outside the closure, but I get the error error[E0597]: borrowed value does not live long enough. This is apparently a common problem with format_args. But the real solution came from tomaka, with some macro trickery, as seen in one last playground: [code lang=text] #[macro_export] macro_rules! log { ($format:expr $(, $args:expr)*) =&amp;gt; ({ log!(__inner__ $format, [], [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v] $(, $args)*) }); (__inner__ $format:expr, [$($transformed_args:ident),*], [$first_ident:ident $(, $other_idents:ident)*], $first_arg:expr $(, $other_args:expr)*) =&amp;gt; ({ let $first_ident = &amp;amp;$first_arg; log!(__inner__ $format, [$($transformed_args,)* $first_ident], [$($other_idents),*] $(, $other_args)*); }); (__inner__ $format:expr, [$($final_args:ident),*], [$($idents:ident),*]) =&amp;gt; ({ LOGGER.with(move |l| { //let mut logger = *l.borrow_mut(); l.borrow_mut().log( format_args!( concat!(&amp;quot;\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;), &amp;quot;a&amp;quot; $(, $final_args)*)); }); }); } [/code] The basic idea is that we could avoid the borrowing issue by doing an additional borrow. But since some of the arguments might by expressions (like 1+1 or self.size), we will store the reference to it in a local variable, with let $first_ident = &amp;amp;amp;$first_arg;. We cannot create variable or function names in macros out of thin air (sadly, because that would be extremely useful), so we instead do recursive macros calls, consuming arguments one after another. In [$($transformed_args:ident),*], [$first_ident:ident $(, $other_idents:ident)*], $first_arg:expr $(, $other_args:expr)*, transformed_args accumulates the idents (variable names) in which we stored the data. [$first_ident:ident $(, $other_idents:ident)*] is matching on the list that started as [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v], to get the first one in that list (storing it in $first_ident), and using it as variable names. As you might have guessed, that means I won't be able to use a log line with more than 21 arguments. That's a limitation I can live with. The $first_arg:expr $(, $other_args:expr)* part matches on the log call's arguments, and gets the first in the list as $first_arg. We then use those in the line let $first_ident = &amp;amp;amp;$first_arg; and recursively call log!, adding the variable name $first_ident to the list of transformed arguments, and the rest of the variable names list and the log call's arguments: [code lang=text] log!(__inner__ $format, [$($transformed_args,)* $first_ident], [$($other_idents),*] $(, $other_args)*); [/code] Once all of the logger's arguments are consumed, we can call format_args on the list of $transformed_args: [code lang=text] (__inner__ $format:expr, [$($transformed_args:ident),*], [$($idents:ident),*]) =&amp;gt; ({ LOGGER.with(move |l| { //let mut logger = *l.borrow_mut(); l.borrow_mut().log( format_args!( concat!(&amp;quot;\t{}\t&amp;quot;, $format, &amp;#039;\n&amp;#039;), &amp;quot;a&amp;quot; $(, $transformed_args)*)); }); }); }); [/code] and it works! So, that last part may not be completely relevant to your logger implementation, but I thought it was quite cool :) Despite my issues with the log crate, it's quite useful and supported by a lot of libraries. It is currently getting a lot better as part of the libz blitz. I'd also encourage you to check out slog. I haven't felt the need to integrate it in sozu yet, but it can be interesting for new projects, as it comes with an ecosystem of composable libraries to extend it.</summary></entry><entry><title type="html">How to rewrite your project in Rust</title><link href="http://unhandledexpression.com/general/rust/2017/07/12/how-to-rewrite-you-project-in-rust.html" rel="alternate" type="text/html" title="How to rewrite your project in Rust" /><published>2017-07-12T17:15:01+02:00</published><updated>2017-07-12T17:15:01+02:00</updated><id>http://unhandledexpression.com/general/rust/2017/07/12/how-to-rewrite-you-project-in-rust</id><content type="html" xml:base="http://unhandledexpression.com/general/rust/2017/07/12/how-to-rewrite-you-project-in-rust.html">&lt;p&gt;In a &lt;a href=&quot;https://unhandledexpression.com/2017/07/10/why-you-should-actually-rewrite-it-in-rust/&quot;&gt;previous post&lt;/a&gt;, I explained why rewriting existing software in Rust could be a good idea. The main point being that you should not rewrite the whole application, but replace the weaker parts without disturbing most of the code, to strengthen the codebase without disruption.&lt;/p&gt;
&lt;p&gt;I also provided pointers to projects where other people and I did it succesfully, but without giving too many details. So let's get a real introduction to Rust rewrites now. This article requires a little bit of knowledge about Rust, but you should be able to follow it even as a&lt;br /&gt;
beginner.&lt;/p&gt;
&lt;p&gt;As a reminder, here are the benefits Rust bring into a rewrite:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it can easily call C code&lt;/li&gt;
&lt;li&gt;it can easily be called by C code (it can export C compatible functions and structures)&lt;/li&gt;
&lt;li&gt;it does not need a garbage collector&lt;/li&gt;
&lt;li&gt;if you want, it does not even need to handle allocations&lt;/li&gt;
&lt;li&gt;the Rust compiler can produce static and dynamic libraries, and even object files&lt;/li&gt;
&lt;li&gt;the Rust compiler avoids most of the memory vulnerabilities you get in C (yes, I had to mention it)&lt;/li&gt;
&lt;li&gt;Rust is easier to maintain than C (this is discutable, but not the point of this article)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As it turns out, this is more or less the plan to replace C code with Rust:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;import C structures and functions in Rust&lt;/li&gt;
&lt;li&gt;import Rust structures and functions from C&lt;/li&gt;
&lt;li&gt;reuse the host application's memory allocations whenever possible&lt;/li&gt;
&lt;li&gt;write code (yes, we have to do it at some point)&lt;/li&gt;
&lt;li&gt;produce artefacts that can be linked with the host application&lt;/li&gt;
&lt;li&gt;integrate with the build system&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We'll see how to apply this with examples from the Rust VLC plugin.&lt;/p&gt;
&lt;h2&gt;Import C structures and functions in Rust&lt;/h2&gt;
&lt;p&gt;Rust can easily use C code directly, by writing functions and structures definitions. A lot of the techniques you would use for this come from the &lt;a href=&quot;https://doc.rust-lang.org/book/second-edition/ch19-01-unsafe-rust.html&quot;&gt;&quot;unsafe Rust&quot; chapter&lt;/a&gt; of &quot;The Rust Programming Language&quot; book. For the following C code:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
struct vlc_object_t {&lt;br /&gt;
    const char   *object_type;&lt;br /&gt;
    char         *header;&lt;br /&gt;
    int           flags;&lt;br /&gt;
    bool          force;&lt;br /&gt;
    libvlc_int_t *libvlc;&lt;br /&gt;
    vlc_object_t *parent;&lt;br /&gt;
};&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;You would get the following Rust structure:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
extern crate libc;&lt;br /&gt;
use libc::c_char;&lt;/p&gt;
&lt;p&gt;#[repr(C)]&lt;br /&gt;
pub struct vlc_object_t {&lt;br /&gt;
  pub psz_object_type: *const c_char,&lt;br /&gt;
  pub psz_header:      *mut c_char,&lt;br /&gt;
  pub i_flags:         c_int,&lt;br /&gt;
  pub b_force:         bool,&lt;br /&gt;
  pub p_libvlc:        *mut libvlc_int_t,&lt;br /&gt;
  pub p_parent:        *mut vlc_object_t,&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;the &lt;code&gt;#[repr(C)]&lt;/code&gt; tag indicates to the compiler that the structure should have a memory layout similar to the one generated by a C&lt;br /&gt;
compiler. We import types from the libc crate, like &lt;code&gt;c_char&lt;/code&gt;. Those types are platform dependent (with their different form already handled in libc). Here, we use a lot of raw pointers (indicated by &lt;code&gt;*&lt;/code&gt;), which means by using this structure directly, we're basically writing C, which is no good! A good approach, as we'll see later, is to write safer wrappers above those C bindings.&lt;/p&gt;
&lt;p&gt;Importing C functions is quite straightforward too:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
ssize_t  vlc_stream_Peek(stream_t *, const uint8_t **, size_t);&lt;br /&gt;
ssize_t  vlc_stream_Read(stream_t *, void *buf, size_t len);&lt;br /&gt;
uint64_t vlc_stream_Tell(const stream_t *);&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;These C function declarations would get translated to:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
#[link(name = &amp;quot;vlccore&amp;quot;)]&lt;br /&gt;
extern {&lt;br /&gt;
  pub fn vlc_stream_Peek(stream: *mut stream_t, buf: *mut *const uint8_t, size: size_t) -&amp;gt; ssize_t;&lt;br /&gt;
  pub fn vlc_stream_Read(stream: *mut stream_t, buf: *const c_void, size: size_t) -&amp;gt; ssize_t;&lt;br /&gt;
  pub fn vlc_stream_Tell(stream: *const stream_t) -&amp;gt; uint64_t;&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;#[link(name = &quot;vlccore&quot;)]&lt;/code&gt; tag indicates to which library we are linking. It is equivalent to passing a &lt;code&gt;-lvlccore&lt;/code&gt; argument to the linker. Libvlccore is a library all VLC plugins must link to. Those functions are declared like regular Rust functions, but like the previous structures, will mainly work on raw pointers.&lt;/p&gt;
&lt;h3&gt;bindgen&lt;/h3&gt;
&lt;p&gt;You can always write all your bindings manually like this, but when the amount of code to import is a bit large, it can be a good idea to employ the awesome &lt;a href=&quot;https://github.com/servo/rust-bindgen&quot;&gt;bindgen&lt;/a&gt; tool, that will generate Rust code from C headers.&lt;/p&gt;
&lt;p&gt;It can work as a command line tool, but can also work at compile time from a &lt;a href=&quot;http://doc.crates.io/build-script.html&quot;&gt;build script&lt;/a&gt;. First, add the dependency to your &lt;code&gt;Cargo.toml&lt;/code&gt; file:&lt;/p&gt;
&lt;p&gt;[code lang=toml]&lt;br /&gt;
[build-dependencies.bindgen]&lt;br /&gt;
version = &amp;quot;^0.25&amp;quot;&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;You can then write your build script like this:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
extern crate bindgen;&lt;br /&gt;
use std::fs::File;&lt;br /&gt;
use std::io::Write;&lt;br /&gt;
use std::path::Path;&lt;/p&gt;
&lt;p&gt;fn main() {&lt;br /&gt;
  let include_arg = concat!(&amp;quot;-I&amp;quot;, env!(&amp;quot;INCLUDE_DIR&amp;quot;));&lt;br /&gt;
  let vlc_common_path = concat!(env!(&amp;quot;INCLUDE_DIR&amp;quot;), &amp;quot;/vlc_common.h&amp;quot;);&lt;/p&gt;
&lt;p&gt;  let _ = bindgen::builder()&lt;br /&gt;
    .clang_arg(include_arg)&lt;br /&gt;
    .clang_arg(&amp;quot;-include&amp;quot;)&lt;br /&gt;
    .clang_arg(vlc_common_path)&lt;br /&gt;
    .header(concat!(env!(&amp;quot;INCLUDE_DIR&amp;quot;), &amp;quot;/vlc_block.h&amp;quot;))&lt;br /&gt;
    .hide_type(&amp;quot;vlc_object_t&amp;quot;)&lt;br /&gt;
    .whitelist_recursively(true)&lt;br /&gt;
    .whitelisted_type(&amp;quot;block_t&amp;quot;)&lt;br /&gt;
    .whitelisted_function(&amp;quot;block_Init&amp;quot;)&lt;br /&gt;
    .raw_line(&amp;quot;use ffi::common::vlc_object_t;&amp;quot;)&lt;br /&gt;
    .use_core()&lt;br /&gt;
    .generate().unwrap()&lt;br /&gt;
    .write_to_file(&amp;quot;src/ffi/block.rs&amp;quot;);&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;So there's a lot to unpack here, because bindgen is very flexible:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;we use &lt;code&gt;clang_arg&lt;/code&gt; to pass the include folder path and pre include a header everywhere (&lt;code&gt;vlc_common.h&lt;/code&gt; is included pretty puch everywhere in VLC)&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;header&lt;/code&gt; method specifies the header from which we will import definitions&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hide_type&lt;/code&gt; prevents redefinition of elements we already defined (liek the ones from the common header)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;whitelisted_type&lt;/code&gt; and &lt;code&gt;whitelisted_function&lt;/code&gt; specify types and functions for which bindgen will create definitions&lt;/li&gt;
&lt;li&gt;&lt;code&gt;raw_line&lt;/code&gt; writes its argument at the top of the file. I apply it to reuse definitions from other files&lt;/li&gt;
&lt;li&gt;&lt;code&gt;write_to_file&lt;/code&gt; writes the whole definition to the specified path&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can apply that process to any C header you must import. With the build script, it can run every time the library is compiled, but be careful, generating a lot of headers can take some time. It might be a good idea to pregenerate them and commit the generated files, and update them from time to time.&lt;/p&gt;
&lt;p&gt;It is usually a good idea to separate the imported definitions in another crate with the &lt;code&gt;-sys&lt;/code&gt; suffix, and write the safe code in the main crate.&lt;br /&gt;
As an example, see the crates &lt;a href=&quot;https://crates.io/crates/openssl&quot;&gt;openssl&lt;/a&gt; and &lt;a href=&quot;https://crates.io/crates/openssl-sys&quot;&gt;openssl-sys&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Writing safe wrappers&lt;/h3&gt;
&lt;p&gt;Previously, we imported the C function &lt;code&gt;ssize_t vlc_stream_Read(stream_t *, void *buf, size_t len)&lt;/code&gt; as the Rust version &lt;code&gt;pub fn vlc_stream_Read(stream: *mut stream_t, buf: *const c_void, size: size_t) -&amp;amp;gt; ssize_t&lt;/code&gt; but kept an unsafe interface. Since we want to use those functions safely, we can now make a better wrapper:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
use ffi;&lt;/p&gt;
&lt;p&gt;pub fn stream_Read(stream: *mut stream_t, buf: &amp;amp;mut [u8]) -&amp;gt; ssize_t {&lt;br /&gt;
  unsafe {&lt;br /&gt;
    ffi::vlc_stream_Read(stream, buf.as_mut_ptr() as *mut c_void, buf.len())&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;Here we replaced the raw pointer to memory and the length with a mutable slice. We still use a raw pointer to the &lt;code&gt;stream_t&lt;/code&gt; instance, maybe we can do better:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
use ffi;&lt;/p&gt;
&lt;p&gt;pub struct Stream(*mut stream_t);&lt;/p&gt;
&lt;p&gt;pub fn stream_Read(stream: Stream, buf: &amp;amp;mut [u8]) -&amp;gt; ssize_t {&lt;br /&gt;
  unsafe {&lt;br /&gt;
    ffi::vlc_stream_Read(stream.0, buf.as_mut_ptr() as *mut c_void, buf.len())&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;Be careful if you plan to implement &lt;a href=&quot;https://doc.rust-lang.org/std/mem/fn.drop.html&quot;&gt;&lt;code&gt;Drop&lt;/code&gt;&lt;/a&gt; for this type: is the Rust code supposed to free that object? Is there some reference counting involved? Here is an example of &lt;code&gt;Drop&lt;/code&gt; implementation from the openssl crate:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
pub struct SslContextBuilder(*mut ffi::SSL_CTX);&lt;/p&gt;
&lt;p&gt;impl Drop for SslContextBuilder {&lt;br /&gt;
    fn drop(&amp;amp;mut self) {&lt;br /&gt;
        unsafe { ffi::SSL_CTX_free(self.as_ptr()) }&lt;br /&gt;
    }&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;Remember that it's likely the host application has a lot of infrastructure to keep track of memory, and as a rule, we should reuse the tools it offers for the code at the interface between Rust and C. See the &lt;a href=&quot;http://jakegoulding.com/rust-ffi-omnibus/&quot;&gt;Rust FFI omnibus&lt;/a&gt; for more examples of safe wrappers you can write.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Side note: as of now (2017/07/10) &lt;a href=&quot;https://github.com/rust-lang/rust/issues/32838&quot;&gt;custom allocators&lt;/a&gt; are still not stable&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Exporting Rust code to be called from C&lt;/h2&gt;
&lt;p&gt;Since the host application is written in C, it might need to call your code. This is quite easy in Rust: you need to write unsafe wrappers.&lt;/p&gt;
&lt;p&gt;Here we will use as example the &lt;a href=&quot;https://github.com/Geal/rust-devoxx2016&quot;&gt;inverted index library for mobile apps&lt;/a&gt; I wrote for a conference. In this library, we have an &lt;code&gt;Index&lt;/code&gt; type that we want to use from Java. Here is its definition:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
#[repr(C)]&lt;br /&gt;
pub struct Index {&lt;br /&gt;
  pub index: HashMap&amp;lt;String, HashSet&amp;lt;i32&amp;gt;&amp;gt;,&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;This type has a few method we want to provide:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
impl Index {&lt;br /&gt;
  pub fn new() -&amp;gt; Index {&lt;br /&gt;
    Index {&lt;br /&gt;
      index: HashMap::new(),&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;/p&gt;
&lt;p&gt;  pub fn insert(&amp;amp;mut self, id: i32, data: &amp;amp;str) {&lt;br /&gt;
    [...]&lt;br /&gt;
  }&lt;/p&gt;
&lt;p&gt;  pub fn search_word(&amp;amp;self, word: &amp;amp;str) -&amp;gt; Option&amp;lt;&amp;amp;HashSet&amp;lt;i32&amp;gt;&amp;gt; {&lt;br /&gt;
    self.index.get(word)&lt;br /&gt;
  }&lt;/p&gt;
&lt;p&gt;  pub fn search(&amp;amp;self, text: &amp;amp;str) -&amp;gt; HashSet&amp;lt;i32&amp;gt; {&lt;br /&gt;
    [...]&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;First, we need to write the functions to allocate and deallocate our index. Every use from C will be wrapped in a &lt;a href=&quot;https://doc.rust-lang.org/std/boxed/struct.Box.html&quot;&gt;&lt;code&gt;Box&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
#[no_mangle]&lt;br /&gt;
pub extern &amp;quot;C&amp;quot; fn index_create() -&amp;gt; *mut Index {&lt;br /&gt;
  Box::into_raw(Box::new(Index::new()))&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;Box&lt;/code&gt; type indicates and owns a heap allocation. When the box is dropped, the underlying data is dropped as well and the memory is freed. The following function takes ownership of its argument, so it is dropped at the end.&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
#[no_mangle]&lt;br /&gt;
pub extern &amp;quot;C&amp;quot; fn index_free(ptr: *mut Index) {&lt;br /&gt;
    let _ = unsafe { Box::from_raw(ptr) };&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;Now that allocation is handled, we can work on a real method. The following method takes an index, and id for a text, and the text itself, as a C string (ie, terminated by a null character).&lt;/p&gt;
&lt;p&gt;Since we're kinda writing C in Rust here, we have to first check if the pointers are null. Then we can transform the C string in a slice. Then we check if it is correctly encoded as UTF-8 before inserting it into our index.&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
#[no_mangle]&lt;br /&gt;
pub extern &amp;quot;C&amp;quot; fn index_insert(index: *mut Index, id: i32, raw_text: *const c_char) {&lt;br /&gt;
  unsafe { if index.is_null() || raw_text.is_null() { return } };&lt;br /&gt;
  let slice = unsafe { CStr::from_ptr(raw_text).to_bytes() };&lt;br /&gt;
  if let Ok(text) = str::from_utf8(slice) {&lt;br /&gt;
    (*index).insert(id, text);&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;Most of the code for those kinds of wrappers is just there to transform between C and Rust types and checking that the arguments coming from C code are correct. Even if we have to trust the host application, we should program defensively at the boundary.&lt;/p&gt;
&lt;p&gt;There are &lt;a href=&quot;https://github.com/Geal/rust-devoxx2016/blob/master/inverted_index/src/lib.rs#L96-L121&quot;&gt;other methods we could implement&lt;/a&gt; for the index, we'll leave those as exercise for the reader :)&lt;/p&gt;
&lt;p&gt;Now, we need to write the C definitions to import those functions and types:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
typedef struct Index Index;&lt;/p&gt;
&lt;p&gt;Index* index_create();&lt;br /&gt;
void   index_free(Index* index);&lt;br /&gt;
void   index_insert(Index* index, int32_t id, char const* raw_text);&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;We defined &lt;code&gt;Index&lt;/code&gt; as an opaque type here. Since Rust structures can be compatible with C structures, we could export the real type, but since it only contains a Rust specific type, &lt;code&gt;HashMap&lt;/code&gt;, it is better to hide it completely and write accessors and wrappers.&lt;/p&gt;
&lt;h3&gt;Generating bindings with rusty-cheddar&lt;/h3&gt;
&lt;p&gt;Writing function imports from C to Rust is tedious, so we have bindgen for this. We also have a great tool to go the other way: &lt;a href=&quot;https://github.com/Sean1708/rusty-cheddar&quot;&gt;rusty-cheddar&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the same way, it can be used from a build script:&lt;/p&gt;
&lt;p&gt;[code lang=C]&lt;br /&gt;
extern crate cheddar;&lt;/p&gt;
&lt;p&gt;fn main() {&lt;br /&gt;
  cheddar::Cheddar::new().expect(&amp;quot;could not read definitions&amp;quot;)&lt;br /&gt;
    .run_build(&amp;quot;include/main.h&amp;quot;);&lt;br /&gt;
  cheddar::Cheddar::new().expect(&amp;quot;could not read definitions&amp;quot;)&lt;br /&gt;
    .module(&amp;quot;index&amp;quot;).expect(&amp;quot;malformed module path&amp;quot;)&lt;br /&gt;
    .insert_code(&amp;quot;#include \&amp;quot;main.h\&amp;quot;&amp;quot;)&lt;br /&gt;
    .run_build(&amp;quot;include/index.h&amp;quot;);&lt;br /&gt;
}&lt;br /&gt;
[/code]&lt;/p&gt;
&lt;p&gt;Here we run rusty-cheddar a first time without specifying the module: it will default to generate a header for the definitions in &lt;code&gt;src/lib.rs&lt;/code&gt;.&lt;br /&gt;
The second run specifies a different module, and can insert a file inclusion at the top.&lt;/p&gt;
&lt;p&gt;It can be a good idea to commit the generated headers, since you will see immediately if you changed the interface in a breaking way.&lt;/p&gt;
&lt;h2&gt;Integrating with the build system&lt;/h2&gt;
&lt;p&gt;As you might know, we can make dynamic libraries and executables with rustc and cargo. But often, the host application will have its own build system, and it might disagree with the way cargo builds its projects. So we have multiple strategies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;build Rust code separately, store libraries and headers in Maven or something (don't laugh, I've worked with such a system once, and it was actually great)&lt;/li&gt;
&lt;li&gt;try to let rustc build dynamic libraries from inside the build system. We tried that for VLC and it was not great at all&lt;/li&gt;
&lt;li&gt;build a static library from inside or outside the build system, include it in the libraries at link. This was done in &lt;a href=&quot;https://github.com/rusticata/rusticata&quot;&gt;Rusticata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;build an object file and let the build system link it. This is what we ended up doing with VLC&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Building a static library is as easy as specifying &lt;code&gt;crate-type = [&quot;staticlib&quot;]&lt;/code&gt; in your &lt;code&gt;Cargo.toml&lt;/code&gt; file. To build an object file, use the command &lt;code&gt;cargo rustc --release -- --emit obj&lt;/code&gt;. You can see how we added it to the &lt;a href=&quot;https://github.com/Geal/vlc/blob/rust/modules/demux/Makefile.am#L464-L477&quot;&gt;autotools usage in VLC&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Unfortunately, for this part we still do not have automated ways to fix the issues. Maybe with some time, people will write scripts for autotools,&lt;br /&gt;
CMake and others to handle Rust and Cargo.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Side note on reproducible builds: if you want to fix the set of Rust dependencies used in your project and make them always available, you can use &lt;a href=&quot;https://github.com/alexcrichton/cargo-vendor&quot;&gt;cargo-vendor&lt;/a&gt; to store them in a specific folder&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As you might have guessed, this is the most complex part, for which I have no good generic answer. I'd recommend that you spend the most time on this during the project's prototyping phase: import very little C code, export very little Rust code, try to make it build entirely from within the host application's build system. Once this is done, extending the project will get much easier. You really don't want to discover this task at the end of your project and try to retrofit your code in there.&lt;/p&gt;
&lt;h2&gt;Going further&lt;/h2&gt;
&lt;p&gt;While this article just explores the surface of Rust rewrites, I hope it provides a good starting point on the tools and techniques you can apply.&lt;br /&gt;
Any rewrite will be a large and complex project, but the result is worth the effort. The code you will write will be stronger, and Rust's type system will force you to review the assumptions made in the C version. You might even find better ways to write it once you start refactoring your code in a more Rusty way, safely hidden behind your wrappers.&lt;/p&gt;</content><author><name>{&quot;login&quot;=&gt;&quot;geaaal&quot;, &quot;email&quot;=&gt;&quot;geo.couprie@gmail.com&quot;, &quot;display_name&quot;=&gt;&quot;Géal&quot;, &quot;first_name&quot;=&gt;&quot;&quot;, &quot;last_name&quot;=&gt;&quot;&quot;}</name><email>geo.couprie@gmail.com</email></author><category term="security" /><summary type="html">In a previous post, I explained why rewriting existing software in Rust could be a good idea. The main point being that you should not rewrite the whole application, but replace the weaker parts without disturbing most of the code, to strengthen the codebase without disruption. I also provided pointers to projects where other people and I did it succesfully, but without giving too many details. So let's get a real introduction to Rust rewrites now. This article requires a little bit of knowledge about Rust, but you should be able to follow it even as a beginner. As a reminder, here are the benefits Rust bring into a rewrite: it can easily call C code it can easily be called by C code (it can export C compatible functions and structures) it does not need a garbage collector if you want, it does not even need to handle allocations the Rust compiler can produce static and dynamic libraries, and even object files the Rust compiler avoids most of the memory vulnerabilities you get in C (yes, I had to mention it) Rust is easier to maintain than C (this is discutable, but not the point of this article) As it turns out, this is more or less the plan to replace C code with Rust: import C structures and functions in Rust import Rust structures and functions from C reuse the host application's memory allocations whenever possible write code (yes, we have to do it at some point) produce artefacts that can be linked with the host application integrate with the build system We'll see how to apply this with examples from the Rust VLC plugin. Import C structures and functions in Rust Rust can easily use C code directly, by writing functions and structures definitions. A lot of the techniques you would use for this come from the &quot;unsafe Rust&quot; chapter of &quot;The Rust Programming Language&quot; book. For the following C code: [code lang=C] struct vlc_object_t { const char *object_type; char *header; int flags; bool force; libvlc_int_t *libvlc; vlc_object_t *parent; }; [/code] You would get the following Rust structure: [code lang=C] extern crate libc; use libc::c_char; #[repr(C)] pub struct vlc_object_t { pub psz_object_type: *const c_char, pub psz_header: *mut c_char, pub i_flags: c_int, pub b_force: bool, pub p_libvlc: *mut libvlc_int_t, pub p_parent: *mut vlc_object_t, } [/code] the #[repr(C)] tag indicates to the compiler that the structure should have a memory layout similar to the one generated by a C compiler. We import types from the libc crate, like c_char. Those types are platform dependent (with their different form already handled in libc). Here, we use a lot of raw pointers (indicated by *), which means by using this structure directly, we're basically writing C, which is no good! A good approach, as we'll see later, is to write safer wrappers above those C bindings. Importing C functions is quite straightforward too: [code lang=C] ssize_t vlc_stream_Peek(stream_t *, const uint8_t **, size_t); ssize_t vlc_stream_Read(stream_t *, void *buf, size_t len); uint64_t vlc_stream_Tell(const stream_t *); [/code] These C function declarations would get translated to: [code lang=C] #[link(name = &amp;quot;vlccore&amp;quot;)] extern { pub fn vlc_stream_Peek(stream: *mut stream_t, buf: *mut *const uint8_t, size: size_t) -&amp;gt; ssize_t; pub fn vlc_stream_Read(stream: *mut stream_t, buf: *const c_void, size: size_t) -&amp;gt; ssize_t; pub fn vlc_stream_Tell(stream: *const stream_t) -&amp;gt; uint64_t; } [/code] The #[link(name = &quot;vlccore&quot;)] tag indicates to which library we are linking. It is equivalent to passing a -lvlccore argument to the linker. Libvlccore is a library all VLC plugins must link to. Those functions are declared like regular Rust functions, but like the previous structures, will mainly work on raw pointers. bindgen You can always write all your bindings manually like this, but when the amount of code to import is a bit large, it can be a good idea to employ the awesome bindgen tool, that will generate Rust code from C headers. It can work as a command line tool, but can also work at compile time from a build script. First, add the dependency to your Cargo.toml file: [code lang=toml] [build-dependencies.bindgen] version = &amp;quot;^0.25&amp;quot; [/code] You can then write your build script like this: [code lang=C] extern crate bindgen; use std::fs::File; use std::io::Write; use std::path::Path; fn main() { let include_arg = concat!(&amp;quot;-I&amp;quot;, env!(&amp;quot;INCLUDE_DIR&amp;quot;)); let vlc_common_path = concat!(env!(&amp;quot;INCLUDE_DIR&amp;quot;), &amp;quot;/vlc_common.h&amp;quot;); let _ = bindgen::builder() .clang_arg(include_arg) .clang_arg(&amp;quot;-include&amp;quot;) .clang_arg(vlc_common_path) .header(concat!(env!(&amp;quot;INCLUDE_DIR&amp;quot;), &amp;quot;/vlc_block.h&amp;quot;)) .hide_type(&amp;quot;vlc_object_t&amp;quot;) .whitelist_recursively(true) .whitelisted_type(&amp;quot;block_t&amp;quot;) .whitelisted_function(&amp;quot;block_Init&amp;quot;) .raw_line(&amp;quot;use ffi::common::vlc_object_t;&amp;quot;) .use_core() .generate().unwrap() .write_to_file(&amp;quot;src/ffi/block.rs&amp;quot;); } [/code] So there's a lot to unpack here, because bindgen is very flexible: we use clang_arg to pass the include folder path and pre include a header everywhere (vlc_common.h is included pretty puch everywhere in VLC) the header method specifies the header from which we will import definitions hide_type prevents redefinition of elements we already defined (liek the ones from the common header) whitelisted_type and whitelisted_function specify types and functions for which bindgen will create definitions raw_line writes its argument at the top of the file. I apply it to reuse definitions from other files write_to_file writes the whole definition to the specified path You can apply that process to any C header you must import. With the build script, it can run every time the library is compiled, but be careful, generating a lot of headers can take some time. It might be a good idea to pregenerate them and commit the generated files, and update them from time to time. It is usually a good idea to separate the imported definitions in another crate with the -sys suffix, and write the safe code in the main crate. As an example, see the crates openssl and openssl-sys. Writing safe wrappers Previously, we imported the C function ssize_t vlc_stream_Read(stream_t *, void *buf, size_t len) as the Rust version pub fn vlc_stream_Read(stream: *mut stream_t, buf: *const c_void, size: size_t) -&amp;amp;gt; ssize_t but kept an unsafe interface. Since we want to use those functions safely, we can now make a better wrapper: [code lang=C] use ffi; pub fn stream_Read(stream: *mut stream_t, buf: &amp;amp;mut [u8]) -&amp;gt; ssize_t { unsafe { ffi::vlc_stream_Read(stream, buf.as_mut_ptr() as *mut c_void, buf.len()) } } [/code] Here we replaced the raw pointer to memory and the length with a mutable slice. We still use a raw pointer to the stream_t instance, maybe we can do better: [code lang=C] use ffi; pub struct Stream(*mut stream_t); pub fn stream_Read(stream: Stream, buf: &amp;amp;mut [u8]) -&amp;gt; ssize_t { unsafe { ffi::vlc_stream_Read(stream.0, buf.as_mut_ptr() as *mut c_void, buf.len()) } } [/code] Be careful if you plan to implement Drop for this type: is the Rust code supposed to free that object? Is there some reference counting involved? Here is an example of Drop implementation from the openssl crate: [code lang=C] pub struct SslContextBuilder(*mut ffi::SSL_CTX); impl Drop for SslContextBuilder { fn drop(&amp;amp;mut self) { unsafe { ffi::SSL_CTX_free(self.as_ptr()) } } } [/code] Remember that it's likely the host application has a lot of infrastructure to keep track of memory, and as a rule, we should reuse the tools it offers for the code at the interface between Rust and C. See the Rust FFI omnibus for more examples of safe wrappers you can write. Side note: as of now (2017/07/10) custom allocators are still not stable Exporting Rust code to be called from C Since the host application is written in C, it might need to call your code. This is quite easy in Rust: you need to write unsafe wrappers. Here we will use as example the inverted index library for mobile apps I wrote for a conference. In this library, we have an Index type that we want to use from Java. Here is its definition: [code lang=C] #[repr(C)] pub struct Index { pub index: HashMap&amp;lt;String, HashSet&amp;lt;i32&amp;gt;&amp;gt;, } [/code] This type has a few method we want to provide: [code lang=C] impl Index { pub fn new() -&amp;gt; Index { Index { index: HashMap::new(), } } pub fn insert(&amp;amp;mut self, id: i32, data: &amp;amp;str) { [...] } pub fn search_word(&amp;amp;self, word: &amp;amp;str) -&amp;gt; Option&amp;lt;&amp;amp;HashSet&amp;lt;i32&amp;gt;&amp;gt; { self.index.get(word) } pub fn search(&amp;amp;self, text: &amp;amp;str) -&amp;gt; HashSet&amp;lt;i32&amp;gt; { [...] } } [/code] First, we need to write the functions to allocate and deallocate our index. Every use from C will be wrapped in a Box. [code lang=C] #[no_mangle] pub extern &amp;quot;C&amp;quot; fn index_create() -&amp;gt; *mut Index { Box::into_raw(Box::new(Index::new())) } [/code] The Box type indicates and owns a heap allocation. When the box is dropped, the underlying data is dropped as well and the memory is freed. The following function takes ownership of its argument, so it is dropped at the end. [code lang=C] #[no_mangle] pub extern &amp;quot;C&amp;quot; fn index_free(ptr: *mut Index) { let _ = unsafe { Box::from_raw(ptr) }; } [/code] Now that allocation is handled, we can work on a real method. The following method takes an index, and id for a text, and the text itself, as a C string (ie, terminated by a null character). Since we're kinda writing C in Rust here, we have to first check if the pointers are null. Then we can transform the C string in a slice. Then we check if it is correctly encoded as UTF-8 before inserting it into our index. [code lang=C] #[no_mangle] pub extern &amp;quot;C&amp;quot; fn index_insert(index: *mut Index, id: i32, raw_text: *const c_char) { unsafe { if index.is_null() || raw_text.is_null() { return } }; let slice = unsafe { CStr::from_ptr(raw_text).to_bytes() }; if let Ok(text) = str::from_utf8(slice) { (*index).insert(id, text); } } [/code] Most of the code for those kinds of wrappers is just there to transform between C and Rust types and checking that the arguments coming from C code are correct. Even if we have to trust the host application, we should program defensively at the boundary. There are other methods we could implement for the index, we'll leave those as exercise for the reader :) Now, we need to write the C definitions to import those functions and types: [code lang=C] typedef struct Index Index; Index* index_create(); void index_free(Index* index); void index_insert(Index* index, int32_t id, char const* raw_text); [/code] We defined Index as an opaque type here. Since Rust structures can be compatible with C structures, we could export the real type, but since it only contains a Rust specific type, HashMap, it is better to hide it completely and write accessors and wrappers. Generating bindings with rusty-cheddar Writing function imports from C to Rust is tedious, so we have bindgen for this. We also have a great tool to go the other way: rusty-cheddar. In the same way, it can be used from a build script: [code lang=C] extern crate cheddar; fn main() { cheddar::Cheddar::new().expect(&amp;quot;could not read definitions&amp;quot;) .run_build(&amp;quot;include/main.h&amp;quot;); cheddar::Cheddar::new().expect(&amp;quot;could not read definitions&amp;quot;) .module(&amp;quot;index&amp;quot;).expect(&amp;quot;malformed module path&amp;quot;) .insert_code(&amp;quot;#include \&amp;quot;main.h\&amp;quot;&amp;quot;) .run_build(&amp;quot;include/index.h&amp;quot;); } [/code] Here we run rusty-cheddar a first time without specifying the module: it will default to generate a header for the definitions in src/lib.rs. The second run specifies a different module, and can insert a file inclusion at the top. It can be a good idea to commit the generated headers, since you will see immediately if you changed the interface in a breaking way. Integrating with the build system As you might know, we can make dynamic libraries and executables with rustc and cargo. But often, the host application will have its own build system, and it might disagree with the way cargo builds its projects. So we have multiple strategies: build Rust code separately, store libraries and headers in Maven or something (don't laugh, I've worked with such a system once, and it was actually great) try to let rustc build dynamic libraries from inside the build system. We tried that for VLC and it was not great at all build a static library from inside or outside the build system, include it in the libraries at link. This was done in Rusticata build an object file and let the build system link it. This is what we ended up doing with VLC Building a static library is as easy as specifying crate-type = [&quot;staticlib&quot;] in your Cargo.toml file. To build an object file, use the command cargo rustc --release -- --emit obj. You can see how we added it to the autotools usage in VLC. Unfortunately, for this part we still do not have automated ways to fix the issues. Maybe with some time, people will write scripts for autotools, CMake and others to handle Rust and Cargo. Side note on reproducible builds: if you want to fix the set of Rust dependencies used in your project and make them always available, you can use cargo-vendor to store them in a specific folder As you might have guessed, this is the most complex part, for which I have no good generic answer. I'd recommend that you spend the most time on this during the project's prototyping phase: import very little C code, export very little Rust code, try to make it build entirely from within the host application's build system. Once this is done, extending the project will get much easier. You really don't want to discover this task at the end of your project and try to retrofit your code in there. Going further While this article just explores the surface of Rust rewrites, I hope it provides a good starting point on the tools and techniques you can apply. Any rewrite will be a large and complex project, but the result is worth the effort. The code you will write will be stronger, and Rust's type system will force you to review the assumptions made in the C version. You might even find better ways to write it once you start refactoring your code in a more Rusty way, safely hidden behind your wrappers.</summary></entry><entry><title type="html">Why you should, actually, rewrite it in Rust</title><link href="http://unhandledexpression.com/rust/2017/07/10/why-you-should-actually-rewrite-it-in-rust.html" rel="alternate" type="text/html" title="Why you should, actually, rewrite it in Rust" /><published>2017-07-10T16:04:16+02:00</published><updated>2017-07-10T16:04:16+02:00</updated><id>http://unhandledexpression.com/rust/2017/07/10/why-you-should-actually-rewrite-it-in-rust</id><content type="html" xml:base="http://unhandledexpression.com/rust/2017/07/10/why-you-should-actually-rewrite-it-in-rust.html">&lt;p&gt;You might have seen those obnoxious &quot;you should rewrite it in Rust comments&quot; here and there:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;in a &lt;a href=&quot;https://transitiontech.ca/random/RIIR&quot;&gt;blogpost&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;in a &lt;a href=&quot;https://github.com/fc00/go-fc00/issues/1&quot;&gt;Github issue&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;you might have heard about &lt;a href=&quot;http://n-gate.com/hackernews/2017/02/21/0/&quot;&gt;that Rust Evangelism Strike Force&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;it even has a joke &lt;a href=&quot;https://twitter.com/rustevangelism&quot;&gt;twitter account&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It's like at every new memory vulnerability in well known software, there’s that one person saying Rust would have avoided the issue. We get it, it’s annoying, and it does not help us grow Rust. This attitude is generally frowned upon in the Rust community. You can't just show up into someone’s project telling them to rewrite everything.&lt;/p&gt;
&lt;p&gt;so, why am I writing this? Why would I try to convince you, now, that you should actually rewrite your software in Rust?&lt;/p&gt;
&lt;p&gt;That's because I have been working on this subject for a long time now:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I did &lt;a href=&quot;https://www.youtube.com/watch?v=YTy_JOxGOd4&quot;&gt;multiple&lt;/a&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=e92Yrp3W_2I&quot;&gt;talks&lt;/a&gt; on it  &lt;/li&gt;
&lt;li&gt;I even co-wrote a &lt;a href=&quot;http://spw17.langsec.org/papers.html#parsers2017&quot;&gt;paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I did it both as client and personal work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, I'm commited to this, and yes, I believe you should rewrite some code in Rust. But there's a right way to do it.&lt;/p&gt;
&lt;h2&gt;Why rewrite stuff?&lt;/h2&gt;
&lt;p&gt;Our software systems are built on sand. We got pretty good at maintaining and fixing them over the years, but the cracks are showing. We still have not fixed definitely most of the low level vulnerabilities: stack buffer overflow (yes, those still exist), heap overflow, use after free, double free, off by one; the list goes on. We have some tools, like DEP, ASLR, stack canaries, control flow integrity, fuzzing. Large projects with funding, like Chrome, can resort to sandboxing parts of their application. The rest of us can still run those applications inside a virtual machine. This situation will not improve. &lt;strong&gt;There's a huge amount of old (think 90s), bad quality, barely maintained code that we reuse everywhere endlessly&lt;/strong&gt;. The good thing with hardware is that at some point, it gets replaced. Software just gets copied again. Worse, with the development of IoT, a lot of the code that ships will never be updated. It's likely that some of those old libraries will still be there 15, 20 years from now.&lt;/p&gt;
&lt;p&gt;Let's not shy away from the issue here. Most of those are written in C or C++ (and usually an old version). It is well known that it is hard to write correct, reliable software in those languages. Think of all the security related things you have to keep track of in a C codebase:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pointer arithmetic&lt;/li&gt;
&lt;li&gt;allocations and deallocations&lt;/li&gt;
&lt;li&gt;data is mutable by default&lt;/li&gt;
&lt;li&gt;functions return integers to mean pointers and error codes. Errors can be implicitely ignored&lt;/li&gt;
&lt;li&gt;type casts, overflows and underflows are hard to track&lt;/li&gt;
&lt;li&gt;buffer bounds in indexing and copying&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.regehr.org/archives/1520&quot;&gt;all the undefined behaviours&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, some developers can do this work. Of course, there are sanitizers. But it's an enormous effort to perform everyday for every project.&lt;/p&gt;
&lt;p&gt;Those languages are well suited for low level programming, but require extreme care and expertise to avoid most of those issues. And even then, we assume the developers will always be well rested, focused and careful. We're only humans, after all. Note that in 2017, there are still people claiming that a C developer with sufficient expertise would avoid all those issues. It's time we put this idea to rest. Yes, some projects can avoid a lot of vulnerabilities, with a team of good developers, frequent code reviews, a restricted set of features, funding, tools, etc. Most projects cannot. And as I said earlier, a lot of the code is not even maintained.&lt;/p&gt;
&lt;p&gt;So we have to do something. We must make our software foundations stronger. That means fixing operating systems, drivers, libraries, command line tools, servers, everything. We might not be able to fix most of it today, or the next year, but maybe 10 years from now the situation will have improved.&lt;/p&gt;
&lt;p&gt;Unfortunately, we cannot rewrite everything. If you ever attempted to rewrite a project from scratch, you'd know that while you can avoid some of the mistakes you made before, you will probably &lt;a href=&quot;https://daniel.haxx.se/blog/2017/03/27/curl-is-c/&quot;&gt;introduce a lot of regressions and new bugs&lt;/a&gt;. It's also wrong on the human side: if there are maintainers for the projects, they would need to work on the new and old one at the same time. Worse, you would have to teach them the new approach, the new language (which they might not like), and plan for an upgrade to the new project for all users.&lt;/p&gt;
&lt;p&gt;This is not doable, and this is the part most people asking for project rewrites in Rust do not understand. What I'm advocating for is much simpler: &lt;strong&gt;surgically replace weaker parts but keep most of the project intact&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;How&lt;/h2&gt;
&lt;p&gt;Most of the issues will happen around IO and input data handling, so it makes sense to focus on it. It happens there because that's where the code manipulates buffers, parsers, and uses a lot of pointer calculations. It is also the least interesting part for software maintainers, since it is usually not where you add useful features, business logic, etc. And this logic is usually working well, so you do not want to replace it. If we could rewrite a small part of an application or library without disrupting the rest of the code, we would get most of the benefits without the issues of a full rewrite. It is the exact same project, with the same interface, same distribution packaging as before, same developer team. We would just make an annoying part of the software stronger and more maintainable.&lt;/p&gt;
&lt;p&gt;This is where Rust comes in. It is focused on providing memory safety, thread safety while keeping the code performant and the developer productive. As such, it is generally easier to get safe, reliable code in production while writing basic Rust, than a competent, well rested C developer using all the tools available could do.&lt;/p&gt;
&lt;p&gt;Most of the other safe languages have strong requirements, like a runtime and a garbage collector. And usually, they expect to be the host application (how many languages assume they will handle the process's entry point?). Here, we are guests in someone else's house. We must integrate nicely and quietly.&lt;/p&gt;
&lt;p&gt;Rust is a strong candidate for this because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it can easily call C code&lt;/li&gt;
&lt;li&gt;it can easily be called by C code (it can export C compatible functions and structures)&lt;/li&gt;
&lt;li&gt;it does not need a garbage collector&lt;/li&gt;
&lt;li&gt;if you want, it does not even need to handle allocations&lt;/li&gt;
&lt;li&gt;the Rust compiler can produce static and dynamic libraries, and even object files&lt;/li&gt;
&lt;li&gt;the Rust compiler avoids most of the memory vulnerabilities you get in C (yes, I had to mention it)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So you can actually take a piece of C code inside an existing project, import the C structures and functions to access them from Rust, rewrite the code in Rust, export the functions and structures from Rust, compile it and link it with the rest of the project.&lt;/p&gt;
&lt;p&gt;If you don't believe it's possible, take a look at these two examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/rusticata/rusticata&quot;&gt;Rusticata&lt;/a&gt; integrates Rust parsers written with nom in Suricata, an intrusion detection system&lt;/li&gt;
&lt;li&gt;a &lt;a href=&quot;https://github.com/geal/rust-vlc-demux&quot;&gt;VLC media player plugin&lt;/a&gt; to parse FLV files, written entirely in Rust &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You get a lot of benefits from this approach. First, Rust has great package management with Cargo and crates.io. That means you can separate some of the work in different libraries. See as an example the &lt;a href=&quot;https://github.com/rusticata&quot;&gt;list of parsers&lt;/a&gt; from the Rusticata project. You can test them independently, and even reuse them in other projects. The &lt;a href=&quot;https://github.com/rust-av/flavors&quot;&gt;FLV parser&lt;/a&gt; I wrote for VLC can also work in a &lt;a href=&quot;https://github.com/sdroege/gst-plugin-rs/tree/master/gst-plugin-flv&quot;&gt;Rust GStreamer plugin&lt;/a&gt; You can also make a separate library for the glue with the host application. I'm working on &lt;a href=&quot;https://github.com/Geal/vlc_module.rs&quot;&gt;vlc_module&lt;/a&gt; exactly for that purpose: making Rust VLC plugins easier to write.&lt;/p&gt;
&lt;p&gt;This approach works well for applications with a plugin oriented architecture, but you can also rewrite core parts of an application or library. The biggest issue is high coupling of C code, but it is usually easy to rewrite bit by bit by keeping a common interface. Whenever you have rewritten some coupled parts of of a project, you can take time to refactor it in a more Rusty way, and leverage the type system to help you. A good example of this is the &lt;a href=&quot;https://github.com/carols10cents/rust-out-your-c-talk&quot;&gt;rewrite of the Zopfli library from C to Rust&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This brings us to another important part of that infrastructure rewrite work: while we can rewrite part of an existing project without being too intrusive, we can also rewrite a library entirely, keeping exactly the same C API. You can have a Rust library, dynamic or static, with the exact same C header, that you could import in a project to replace the C one. This is a huge result. It's like replacing a load-bearing wall in an existing building. This is not an easy thing to realize, but once it's done, you can improve a lot of projects at once, provided your distribution's package manager supports that replacement, or other projects take the time to upgrade.&lt;/p&gt;
&lt;p&gt;This is a lot of work, but every time we advance a little, everybody can benefit from it, and it will add up over the years. So we might as well start now.&lt;/p&gt;
&lt;p&gt;Currently, I'm focused on VLC. This is a good target because it's a popular application that's often part of the basic stack of any computer (browser, office suite, media player). So it's a big target. But take a look at the list of dependencies in most web applications, or the dependency graph of common distributions. There is a lot of low hanging fruit there.&lt;/p&gt;
&lt;p&gt;Now, how would you actually perform those rewrites? You can &lt;a href=&quot;https://unhandledexpression.com/2017/07/12/how-to-rewrite-you-project-in-rust/&quot;&gt;check out the next post&lt;/a&gt; and &lt;a href=&quot;http://spw17.langsec.org/papers/chifflier-parsing-in-2017.pdf&quot;&gt;the paper explaining how we did it in Rusticata and VLC&lt;/a&gt;.&lt;/p&gt;</content><author><name>{&quot;login&quot;=&gt;&quot;geaaal&quot;, &quot;email&quot;=&gt;&quot;geo.couprie@gmail.com&quot;, &quot;display_name&quot;=&gt;&quot;Géal&quot;, &quot;first_name&quot;=&gt;&quot;&quot;, &quot;last_name&quot;=&gt;&quot;&quot;}</name><email>geo.couprie@gmail.com</email></author><category term="security" /><category term="vulnerabilities" /><summary type="html">You might have seen those obnoxious &quot;you should rewrite it in Rust comments&quot; here and there: in a blogpost in a Github issue you might have heard about that Rust Evangelism Strike Force it even has a joke twitter account It's like at every new memory vulnerability in well known software, there’s that one person saying Rust would have avoided the issue. We get it, it’s annoying, and it does not help us grow Rust. This attitude is generally frowned upon in the Rust community. You can't just show up into someone’s project telling them to rewrite everything. so, why am I writing this? Why would I try to convince you, now, that you should actually rewrite your software in Rust? That's because I have been working on this subject for a long time now: I did multiple talks on it I even co-wrote a paper I did it both as client and personal work So, I'm commited to this, and yes, I believe you should rewrite some code in Rust. But there's a right way to do it. Why rewrite stuff? Our software systems are built on sand. We got pretty good at maintaining and fixing them over the years, but the cracks are showing. We still have not fixed definitely most of the low level vulnerabilities: stack buffer overflow (yes, those still exist), heap overflow, use after free, double free, off by one; the list goes on. We have some tools, like DEP, ASLR, stack canaries, control flow integrity, fuzzing. Large projects with funding, like Chrome, can resort to sandboxing parts of their application. The rest of us can still run those applications inside a virtual machine. This situation will not improve. There's a huge amount of old (think 90s), bad quality, barely maintained code that we reuse everywhere endlessly. The good thing with hardware is that at some point, it gets replaced. Software just gets copied again. Worse, with the development of IoT, a lot of the code that ships will never be updated. It's likely that some of those old libraries will still be there 15, 20 years from now. Let's not shy away from the issue here. Most of those are written in C or C++ (and usually an old version). It is well known that it is hard to write correct, reliable software in those languages. Think of all the security related things you have to keep track of in a C codebase: pointer arithmetic allocations and deallocations data is mutable by default functions return integers to mean pointers and error codes. Errors can be implicitely ignored type casts, overflows and underflows are hard to track buffer bounds in indexing and copying all the undefined behaviours Of course, some developers can do this work. Of course, there are sanitizers. But it's an enormous effort to perform everyday for every project. Those languages are well suited for low level programming, but require extreme care and expertise to avoid most of those issues. And even then, we assume the developers will always be well rested, focused and careful. We're only humans, after all. Note that in 2017, there are still people claiming that a C developer with sufficient expertise would avoid all those issues. It's time we put this idea to rest. Yes, some projects can avoid a lot of vulnerabilities, with a team of good developers, frequent code reviews, a restricted set of features, funding, tools, etc. Most projects cannot. And as I said earlier, a lot of the code is not even maintained. So we have to do something. We must make our software foundations stronger. That means fixing operating systems, drivers, libraries, command line tools, servers, everything. We might not be able to fix most of it today, or the next year, but maybe 10 years from now the situation will have improved. Unfortunately, we cannot rewrite everything. If you ever attempted to rewrite a project from scratch, you'd know that while you can avoid some of the mistakes you made before, you will probably introduce a lot of regressions and new bugs. It's also wrong on the human side: if there are maintainers for the projects, they would need to work on the new and old one at the same time. Worse, you would have to teach them the new approach, the new language (which they might not like), and plan for an upgrade to the new project for all users. This is not doable, and this is the part most people asking for project rewrites in Rust do not understand. What I'm advocating for is much simpler: surgically replace weaker parts but keep most of the project intact. How Most of the issues will happen around IO and input data handling, so it makes sense to focus on it. It happens there because that's where the code manipulates buffers, parsers, and uses a lot of pointer calculations. It is also the least interesting part for software maintainers, since it is usually not where you add useful features, business logic, etc. And this logic is usually working well, so you do not want to replace it. If we could rewrite a small part of an application or library without disrupting the rest of the code, we would get most of the benefits without the issues of a full rewrite. It is the exact same project, with the same interface, same distribution packaging as before, same developer team. We would just make an annoying part of the software stronger and more maintainable. This is where Rust comes in. It is focused on providing memory safety, thread safety while keeping the code performant and the developer productive. As such, it is generally easier to get safe, reliable code in production while writing basic Rust, than a competent, well rested C developer using all the tools available could do. Most of the other safe languages have strong requirements, like a runtime and a garbage collector. And usually, they expect to be the host application (how many languages assume they will handle the process's entry point?). Here, we are guests in someone else's house. We must integrate nicely and quietly. Rust is a strong candidate for this because: it can easily call C code it can easily be called by C code (it can export C compatible functions and structures) it does not need a garbage collector if you want, it does not even need to handle allocations the Rust compiler can produce static and dynamic libraries, and even object files the Rust compiler avoids most of the memory vulnerabilities you get in C (yes, I had to mention it) So you can actually take a piece of C code inside an existing project, import the C structures and functions to access them from Rust, rewrite the code in Rust, export the functions and structures from Rust, compile it and link it with the rest of the project. If you don't believe it's possible, take a look at these two examples: Rusticata integrates Rust parsers written with nom in Suricata, an intrusion detection system a VLC media player plugin to parse FLV files, written entirely in Rust You get a lot of benefits from this approach. First, Rust has great package management with Cargo and crates.io. That means you can separate some of the work in different libraries. See as an example the list of parsers from the Rusticata project. You can test them independently, and even reuse them in other projects. The FLV parser I wrote for VLC can also work in a Rust GStreamer plugin You can also make a separate library for the glue with the host application. I'm working on vlc_module exactly for that purpose: making Rust VLC plugins easier to write. This approach works well for applications with a plugin oriented architecture, but you can also rewrite core parts of an application or library. The biggest issue is high coupling of C code, but it is usually easy to rewrite bit by bit by keeping a common interface. Whenever you have rewritten some coupled parts of of a project, you can take time to refactor it in a more Rusty way, and leverage the type system to help you. A good example of this is the rewrite of the Zopfli library from C to Rust. This brings us to another important part of that infrastructure rewrite work: while we can rewrite part of an existing project without being too intrusive, we can also rewrite a library entirely, keeping exactly the same C API. You can have a Rust library, dynamic or static, with the exact same C header, that you could import in a project to replace the C one. This is a huge result. It's like replacing a load-bearing wall in an existing building. This is not an easy thing to realize, but once it's done, you can improve a lot of projects at once, provided your distribution's package manager supports that replacement, or other projects take the time to upgrade. This is a lot of work, but every time we advance a little, everybody can benefit from it, and it will add up over the years. So we might as well start now. Currently, I'm focused on VLC. This is a good target because it's a popular application that's often part of the basic stack of any computer (browser, office suite, media player). So it's a big target. But take a look at the list of dependencies in most web applications, or the dependency graph of common distributions. There is a lot of low hanging fruit there. Now, how would you actually perform those rewrites? You can check out the next post and the paper explaining how we did it in Rusticata and VLC.</summary></entry><entry><title type="html">This year in nom: 2.0 is here!</title><link href="http://unhandledexpression.com/general/rust/security/2016/11/25/this-year-in-nom-2-0-is-here.html" rel="alternate" type="text/html" title="This year in nom: 2.0 is here!" /><published>2016-11-25T10:02:05+01:00</published><updated>2016-11-25T10:02:05+01:00</updated><id>http://unhandledexpression.com/general/rust/security/2016/11/25/this-year-in-nom-2-0-is-here</id><content type="html" xml:base="http://unhandledexpression.com/general/rust/security/2016/11/25/this-year-in-nom-2-0-is-here.html">&lt;p&gt;Nearly one year ago, on November 15th 2015, I released the 1.0 version of &lt;a href=&quot;https://github.com/Geal/nom&quot;&gt;nom, the fast parser combinators library&lt;/a&gt; I wrote in &lt;a href=&quot;http://rust-lang.org/&quot;&gt;Rust&lt;/a&gt;. A lot happened around that project, and I have been really happy to interact with nom users around the world.&lt;/p&gt;
&lt;p&gt;&lt;!--more--&gt;&lt;/p&gt;
&lt;p&gt;TL;DR: it's new nom day! The 2.0 release is here! Read the &lt;a href=&quot;https://github.com/Geal/nom/blob/master/CHANGELOG.md&quot;&gt;changelog&lt;/a&gt;. Follow the &lt;a href=&quot;https://github.com/Geal/nom/blob/ca1398538b0050b4009f67151063405766e0c84f/doc/upgrading_to_nom_2.md&quot;&gt;upgrade documentation&lt;/a&gt; if it breaks stuff.&lt;/p&gt;
&lt;p&gt;&lt;img class=&quot;aligncenter size-full wp-image-954&quot; src=&quot;/assets/celebrate.gif&quot; alt=&quot;celebrate&quot; width=&quot;499&quot; height=&quot;285&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Interesting usage&lt;/h2&gt;
&lt;p&gt;I wouldn't be able to list &lt;a href=&quot;https://github.com/search?utf8=%E2%9C%93&amp;amp;q=filename%3ACargo.toml+nom&quot;&gt;all the projects using nom&lt;/a&gt; on this page, even &lt;a href=&quot;https://crates.io/crates/nom/reverse_dependencies&quot;&gt;the subset present on crates.io&lt;/a&gt;, but here are a few examples of what people built with nom:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://crates.io/crates/semver&quot;&gt;semver&lt;/a&gt; briefly shipped with nom in February thanks to &lt;a href=&quot;http://twitter.com/steveklabnik&quot;&gt;Steve Klabnik&lt;/a&gt;, until he replaced it with a regexp based solution (no hard feelings, I'd have done the same)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/joelself/tomllib&quot;&gt;tomllib&lt;/a&gt;, a complete TOML implementation written by &lt;a href=&quot;https://twitter.com/JoelSelf&quot;&gt;Joel Self&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;a &lt;a href=&quot;https://github.com/maxmcc/rust-jvm&quot;&gt;JVM&lt;/a&gt;, because why not! Great work coming from a team of students at the University of Pennsylvania&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/tagua-vm/parser&quot;&gt;Tagua VM&lt;/a&gt;, a great PHP implementation in Rust by &lt;a href=&quot;https://twitter.com/mnt_io&quot;&gt;Ivan Enderlin&lt;/a&gt;&lt;/li&gt;
&lt;li&gt; &lt;a href=&quot;https://github.com/dtolnay/syn&quot;&gt;syn&lt;/a&gt;, the Rust item parser written by &lt;a href=&quot;https://github.com/dtolnay&quot;&gt;David Tolnay&lt;/a&gt; everybody uses with the &lt;a href=&quot;https://github.com/rust-lang/rfcs/blob/master/text/1681-macros-1.1.md&quot;&gt;macros 1.1 feature&lt;/a&gt; to generate code from structures or enums, &lt;a href=&quot;https://github.com/dtolnay/syn/blob/7184b1381ea1552cd02336775a6fbf47e4bc9dfc/src/nom.rs&quot;&gt;actually ships with its own fork of nom&lt;/a&gt;! It was forked to remove the incomplete data handling, and reduce compilation times&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/deech/shen-rust&quot;&gt;shen-rust&lt;/a&gt;, a complete implementation of the Shen language in Rust that was &lt;a href=&quot;https://www.thestrangeloop.com/2016/rusty-runtimes-building-languages-in-rust.html&quot;&gt;presented at Strangeloop 2016&lt;/a&gt; by &lt;a href=&quot;https://twitter.com/deech&quot;&gt;Aditya Siram&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/rusticata&quot;&gt;a series of parsers (DER, NTP, SNMP, IPSec, TLS)&lt;/a&gt; were developed for its integration in the &lt;a href=&quot;https://suricata-ids.org/&quot;&gt;Suricata&lt;/a&gt; network analysis tool. This work was presented at &lt;a href=&quot;http://suricon.net/wp-content/uploads/2016/11/SuriCon2016_PierreChifflier.pdf&quot;&gt;Suricon 2016&lt;/a&gt; by &lt;a href=&quot;https://twitter.com/pollux7&quot;&gt;Pierre Chifflier&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And a lot of other projects. As a side note, people apparently like to build parsers for flac, bittorrent and bitcoin stuff, LISP and Scheme tokenizers and, oddly, ASN.1 libraries :D&lt;/p&gt;
&lt;p&gt;I have been really humbled by what people achieved with this little library, and I hope it will enable even more awesome projects!&lt;/p&gt;
&lt;h2&gt;Growth and stabilization&lt;/h2&gt;
&lt;p&gt;The goal before 1.0 was to get a usable parsing library, and after 1.0, to &lt;a href=&quot;https://github.com/Geal/nom/blob/master/CHANGELOG.md&quot;&gt;add features people were missing&lt;/a&gt; and explore new ideas. A lot of code was contributed for bitstream and string parsing, and adding a lot of useful combinators like &quot;peek!&quot;, &quot;separated_list!&quot; or &quot;tuple!&quot;.&lt;/p&gt;
&lt;p&gt;Unfortunately, a few parts of nom got increasingly painful to maintain and support, so the 2.0 was a good opportunity to clean them up, and add more features while we're at it.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.chain!.html&quot;&gt;&quot;chain!&quot; combinator&lt;/a&gt;, which everybody uses to parse a sequence of things and accumulate the results in structs or tuple, is now deprecated, and will be replaced by &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.do_parse!.html&quot;&gt;&quot;do_parse!&quot;&lt;/a&gt;, a simpler alternative. There are also a lot of specific helpers to make your code nicer, like &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.pair!.html&quot;&gt;&quot;pair!&quot;&lt;/a&gt;, &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.preceded!.html&quot;&gt;&quot;preceded!&quot;&lt;/a&gt;, &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.delimited!.html&quot;&gt;&quot;delimited!&quot;&lt;/a&gt;, &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.separated_pair!.html&quot;&gt;&quot;separated_pair!&quot;&lt;/a&gt;, &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.separated_list!.html&quot;&gt;&quot;separated_list!&quot;&lt;/a&gt; and &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.delimited!.html&quot;&gt;&quot;delimited!&quot;&lt;/a&gt;. Yes, I went to great lengths to make sure you stop using chain :)&lt;/p&gt;
&lt;p&gt;The &quot;length_value!&quot; and other associated combinators were refactored, to have more sensible names and behaviours. &quot;eof&quot;, eol&quot; and the basic token parsers like &quot;digit&quot; or &quot;alphanumeric&quot; got the same treatment. Those can be a source of issues in the upgrade to 2.0, but if the new behaviour does not work in your project, replacing them is still easy with the &quot;is_a!&quot; combinator and others.&lt;/p&gt;
&lt;p&gt;At last, I changed the name of the &quot;error!&quot; macro that was conflicting with the one from the log crate. I hoped that by waiting long enough, the log people would change their macro, but it looks like I lost :p&lt;/p&gt;
&lt;h2&gt;New combinators&lt;/h2&gt;
&lt;p&gt;A few new simple combinators are here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the previously mentioned &quot;do_parse!&quot; makes nicer code than &quot;chain!&quot;:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &quot;chain!&quot; version uses this weird closure-like syntax (while not actually using a closure) with a comma ending the parser list:&lt;/p&gt;
&lt;pre&gt;named!(filetype_parser&amp;lt;&amp;amp;[u8],FileType&amp;gt;, 
 chain!( 
 m: brand_name ~ 
 v: take!(4) ~ 
 c: many0!(brand_name) , 
 ||{ FileType{
   major_brand: m,
   major_brand_version:v,
   compatible_brands: c
 } } 
));&lt;/pre&gt;
&lt;p&gt;The &quot;do_parse!&quot; version only uses &quot;&amp;gt;&amp;gt;&quot; as separating token, and returns a value as a tuple. If the tuple contains only value, (A) is conveniently equivalent to A.&lt;/p&gt;
&lt;pre&gt;named!(filetype_parser&amp;lt;&amp;amp;[u8],FileType&amp;gt;, 
 do_parse!( 
   m: brand_name &amp;gt;&amp;gt; 
   v: take!(4) &amp;gt;&amp;gt; 
   c: many0!(brand_name) &amp;gt;&amp;gt; 
   (FileType{
     major_brand: m,
     major_brand_version:v,
     compatible_brands: c
   }) 
));&lt;/pre&gt;
&lt;p&gt;&quot;chain!&quot; had too many features, like a &quot;?&quot; indicating a parser was optional (which you can now do with &quot;opt!&quot;), and you could declare one of the values as mutable. All of those and the awkward syntax made it hard to maintain. Still, it was one of the first useful combinators in nom, and it can now happily retire&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.permutation!.html&quot;&gt;&quot;permutation!&quot;&lt;/a&gt; applies its child parser in any order, as long as all of them succeed once&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;  fn permutation() {
    named!(perm&amp;lt;(&amp;amp;[u8], &amp;amp;[u8], &amp;amp;[u8])&amp;gt;,
      permutation!(tag!(&quot;abcd&quot;), tag!(&quot;efg&quot;), tag!(&quot;hi&quot;))
    );

    let expected = (&amp;amp;b&quot;abcd&quot;[..], &amp;amp;b&quot;efg&quot;[..], &amp;amp;b&quot;hi&quot;[..]);

    let a = &amp;amp;b&quot;abcdefghijk&quot;[..];
    assert_eq!(perm(a), Done(&amp;amp;b&quot;jk&quot;[..], expected));
    let b = &amp;amp;b&quot;efgabcdhijk&quot;[..];
    assert_eq!(perm(b), Done(&amp;amp;b&quot;jk&quot;[..], expected));
    let c = &amp;amp;b&quot;hiefgabcdjk&quot;[..];
    assert_eq!(perm(c), Done(&amp;amp;b&quot;jk&quot;[..], expected)
}&lt;/pre&gt;
&lt;p&gt;This one was very interesting to write :)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.tag_no_case!.html&quot;&gt;&quot;tag_no_case!&quot;&lt;/a&gt; works like &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.tag!.html&quot;&gt;&quot;tag!&quot;&lt;/a&gt;, but compares independently from the case. This works great for ASCII strings, since the comparison requires no allocation, but the UTF-8 case is trickier, and I'm still looking for a correct way to handle it&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.named_attr!.html&quot;&gt;&lt;code&gt;&quot;&lt;/code&gt;named_attr!&quot;&lt;/a&gt; creates functions like &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.named!.html&quot;&gt;&quot;named!&quot;&lt;/a&gt;&lt;code&gt;&lt;/code&gt; but can add attributes like documentation. This was a big pain point, now nom parsers can have documentation generated by rustdoc&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&lt;/code&gt;&quot;&lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.many_till!.html&quot;&gt;many_till!&quot;&lt;/a&gt; applies repeatedly its first child parser until the second succeeds&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Whitespace separated formats&lt;/h2&gt;
&lt;p&gt;This is one of the biggest new additions, and a feature that people wanted for a long time. A lot of the other Rust parser libraries are designed with programming languages parsing in mind, while I started nom mainly to parse binary formats, like video containers. Those libraries usually handle whitespace parsing for you, and you only need to specify the different elements of your grammars. You essentially work on a list of already separated elements.&lt;/p&gt;
&lt;p&gt;Previously, with nom, you had to explicitely parse the spaces, tabs and end of lines, which made the parsers harder to maintain. What we want in the following example is to recognize a &quot;(&quot;, an expression, then a &quot;)&quot;, and return the expression, but we have to introduce a lot more code:&lt;/p&gt;
&lt;pre&gt;named!(parens&amp;lt;i64&amp;gt;, delimited!(
    delimited!(opt!(multispace), tag!(&quot;(&quot;), opt!(multispace)),
    expr,
    delimited!(opt!(multispace), tag!(&quot;)&quot;), opt!(multispace))
  )
);&lt;/pre&gt;
&lt;p&gt;This new release introduces &lt;a href=&quot;http://rust.unhandledexpression.com/nom/macro.ws!.html&quot;&gt;&quot;ws!&quot;&lt;/a&gt;, a combinator that will automatically insert the separators everywhere:&lt;/p&gt;
&lt;pre&gt;named!(parens&amp;lt;i64&amp;gt;, ws!(delimited!( tag!(&quot;(&quot;), expr, tag!(&quot;)&quot;) )) );&lt;/pre&gt;
&lt;p&gt;&lt;img class=&quot;aligncenter size-full wp-image-953&quot; src=&quot;/assets/magic.gif&quot; alt=&quot;magic&quot; width=&quot;350&quot; height=&quot;196&quot; /&gt;By default, it removes spaces, tabs, carriage returns and line feed, but you can easily specify your own separator parser and make your own version of &quot;ws!&quot;.&lt;/p&gt;
&lt;p&gt;This makes whitespace separated formats very easy to write. See for example the &lt;a href=&quot;https://github.com/Geal/nom/blob/ac8fe712b9f8b3da661828ffc5b97a825007b590/tests/json.rs&quot;&gt;quickly put together, probably not spec compliant JSON parser&lt;/a&gt; I added as test.&lt;/p&gt;
&lt;p&gt;If you're working on a language parsers, this should help you greatly.&lt;/p&gt;
&lt;h2&gt;Architecture changes&lt;/h2&gt;
&lt;h3&gt;Error management&lt;/h3&gt;
&lt;p&gt;The error management system that accumulated errors and input positions as it backtracks through the parser tree is great for some projects like language parsers, but others were not using it and got a penalty because of vectors allocation and deallocation.&lt;/p&gt;
&lt;p&gt;In the 2.0 release, this error management system is now activated by the &quot;verbose-errors&quot; feature. Projects that don't use it should build correctly right away, and their parsers could get 30% to 50% faster!&lt;/p&gt;
&lt;h3&gt;Input types&lt;/h3&gt;
&lt;p&gt;One of nom's original assumptions was that it should work on byte slices and strings instead of byte or char iterators, because the CPU likes contiguous data. As always, the reality is a bit more complex than that, but it worked well and made the code very simple: I only passed subslices from one parser to the next.&lt;/p&gt;
&lt;p&gt;But I wrongly assumed that because of that design, nom could only work on contiguous data. &lt;a href=&quot;https://twitter.com/carllerche&quot;&gt;Carl Lerche&lt;/a&gt; made the interesting point that there are few points where nom actually needs to read a serie of bytes or chars and those could accomodate other data structures like ropes or a list of buffers.&lt;/p&gt;
&lt;p&gt;So I got to work on an abstraction for input types that would work for &amp;amp;[u8] and &amp;amp;str, but also for other types. In the process, I was able to factor most of the &amp;amp;str specific combinators with the &amp;amp;[u8] ones. This will make them easier to maintain in the future.&lt;/p&gt;
&lt;p&gt;The result of that work is &lt;a href=&quot;https://github.com/Geal/nom/blob/2e2730cdb451a555f68ff8cc27f852d3d292df42/src/traits.rs&quot;&gt;a list of traits&lt;/a&gt; that any input type should implement to be usable with nom. I &lt;a href=&quot;https://github.com/Geal/nom/blob/2e2730cdb451a555f68ff8cc27f852d3d292df42/tests/blockbuf-arithmetic.rs#L17-L187&quot;&gt;experimented a bit with the BlockBuf type&lt;/a&gt;, and this approach looks promising. I expect that people will find cool applications for this, like parsers returning references to not yet loaded data, or blocking a coroutine on a tag comparison until the data is available.&lt;/p&gt;
&lt;h2&gt;A smooth upgrade process&lt;/h2&gt;
&lt;p&gt;For the 1.0 release, I choose a few projects using nom, and tried to build them to test the new version and document the upgrade. This was so useful that I did it again for 2.0, so if you're lucky, you maintain one of the 30 crates I tested, and you received a pull request doing that upgrade for you. Otherwise, I wrote &lt;a href=&quot;https://github.com/Geal/nom/blob/ca1398538b0050b4009f67151063405766e0c84f/doc/upgrading_to_nom_2.md&quot;&gt;an upgrade documentation&lt;/a&gt; that you can follow to fix the migration issues. You're still lucky, though, because most crates will build (or only require a one line fix in Cargo.toml).&lt;/p&gt;
&lt;p&gt;&lt;img class=&quot;aligncenter size-full wp-image-955&quot; src=&quot;/assets/fixingstuff.gif&quot; alt=&quot;fixingstuff&quot; width=&quot;240&quot; height=&quot;180&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I'll write soon about that process and the benefits you can get by applying it to your projects.&lt;/p&gt;
&lt;h2&gt;The future&lt;/h2&gt;
&lt;p&gt;I have a lot of ideas for the next version, also a lot of pull requests to merge and issues to fix. Not everything could make it into the 2.0, otherwise I would never have released it.&lt;/p&gt;
&lt;p&gt;In short, the plan:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;rewrite completely the producers and consumers system. It is not very usable right now. It could be replaced by an implementation based on futures&lt;/li&gt;
&lt;li&gt;improve the performance. I got a good enough library by choosing the most naive solutions, but there are lots of points I could improve (especially in helping LLVM generate faster code)&lt;/li&gt;
&lt;li&gt;implement a new serialization library. I believe there is some room for a serialization system that does not rely on automatic code generation, and it would go well with nom&lt;/li&gt;
&lt;li&gt;continue my work on writing &lt;a href=&quot;https://github.com/Geal/rust-vlc-demux&quot;&gt;nom demuxers for VLC media player&lt;/a&gt;. I have a good proof of concept, now I need to make it production ready&lt;/li&gt;
&lt;li&gt;add new, interesting examples: indentation based programming languages, tokio integration, integration in high performance networking systems&lt;/li&gt;
&lt;li&gt;I'll release very soon a large networking tool that relies heavily on nom. Expect some big news :)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's it, now go and upgrade your code, you'll enjoy this new version!&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content><author><name>{&quot;login&quot;=&gt;&quot;geaaal&quot;, &quot;email&quot;=&gt;&quot;geo.couprie@gmail.com&quot;, &quot;display_name&quot;=&gt;&quot;Géal&quot;, &quot;first_name&quot;=&gt;&quot;&quot;, &quot;last_name&quot;=&gt;&quot;&quot;}</name><email>geo.couprie@gmail.com</email></author><category term="nom" /><category term="parser" /><category term="performance" /><category term="security" /><summary type="html">Nearly one year ago, on November 15th 2015, I released the 1.0 version of nom, the fast parser combinators library I wrote in Rust. A lot happened around that project, and I have been really happy to interact with nom users around the world. TL;DR: it's new nom day! The 2.0 release is here! Read the changelog. Follow the upgrade documentation if it breaks stuff. Interesting usage I wouldn't be able to list all the projects using nom on this page, even the subset present on crates.io, but here are a few examples of what people built with nom: semver briefly shipped with nom in February thanks to Steve Klabnik, until he replaced it with a regexp based solution (no hard feelings, I'd have done the same) tomllib, a complete TOML implementation written by Joel Self a JVM, because why not! Great work coming from a team of students at the University of Pennsylvania Tagua VM, a great PHP implementation in Rust by Ivan Enderlin  syn, the Rust item parser written by David Tolnay everybody uses with the macros 1.1 feature to generate code from structures or enums, actually ships with its own fork of nom! It was forked to remove the incomplete data handling, and reduce compilation times shen-rust, a complete implementation of the Shen language in Rust that was presented at Strangeloop 2016 by Aditya Siram a series of parsers (DER, NTP, SNMP, IPSec, TLS) were developed for its integration in the Suricata network analysis tool. This work was presented at Suricon 2016 by Pierre Chifflier And a lot of other projects. As a side note, people apparently like to build parsers for flac, bittorrent and bitcoin stuff, LISP and Scheme tokenizers and, oddly, ASN.1 libraries :D I have been really humbled by what people achieved with this little library, and I hope it will enable even more awesome projects! Growth and stabilization The goal before 1.0 was to get a usable parsing library, and after 1.0, to add features people were missing and explore new ideas. A lot of code was contributed for bitstream and string parsing, and adding a lot of useful combinators like &quot;peek!&quot;, &quot;separated_list!&quot; or &quot;tuple!&quot;. Unfortunately, a few parts of nom got increasingly painful to maintain and support, so the 2.0 was a good opportunity to clean them up, and add more features while we're at it. The &quot;chain!&quot; combinator, which everybody uses to parse a sequence of things and accumulate the results in structs or tuple, is now deprecated, and will be replaced by &quot;do_parse!&quot;, a simpler alternative. There are also a lot of specific helpers to make your code nicer, like &quot;pair!&quot;, &quot;preceded!&quot;, &quot;delimited!&quot;, &quot;separated_pair!&quot;, &quot;separated_list!&quot; and &quot;delimited!&quot;. Yes, I went to great lengths to make sure you stop using chain :) The &quot;length_value!&quot; and other associated combinators were refactored, to have more sensible names and behaviours. &quot;eof&quot;, eol&quot; and the basic token parsers like &quot;digit&quot; or &quot;alphanumeric&quot; got the same treatment. Those can be a source of issues in the upgrade to 2.0, but if the new behaviour does not work in your project, replacing them is still easy with the &quot;is_a!&quot; combinator and others. At last, I changed the name of the &quot;error!&quot; macro that was conflicting with the one from the log crate. I hoped that by waiting long enough, the log people would change their macro, but it looks like I lost :p New combinators A few new simple combinators are here: the previously mentioned &quot;do_parse!&quot; makes nicer code than &quot;chain!&quot;: The &quot;chain!&quot; version uses this weird closure-like syntax (while not actually using a closure) with a comma ending the parser list: named!(filetype_parser&amp;lt;&amp;amp;[u8],FileType&amp;gt;, chain!( m: brand_name ~ v: take!(4) ~ c: many0!(brand_name) , ||{ FileType{ major_brand: m, major_brand_version:v, compatible_brands: c } } )); The &quot;do_parse!&quot; version only uses &quot;&amp;gt;&amp;gt;&quot; as separating token, and returns a value as a tuple. If the tuple contains only value, (A) is conveniently equivalent to A. named!(filetype_parser&amp;lt;&amp;amp;[u8],FileType&amp;gt;, do_parse!( m: brand_name &amp;gt;&amp;gt; v: take!(4) &amp;gt;&amp;gt; c: many0!(brand_name) &amp;gt;&amp;gt; (FileType{ major_brand: m, major_brand_version:v, compatible_brands: c }) )); &quot;chain!&quot; had too many features, like a &quot;?&quot; indicating a parser was optional (which you can now do with &quot;opt!&quot;), and you could declare one of the values as mutable. All of those and the awkward syntax made it hard to maintain. Still, it was one of the first useful combinators in nom, and it can now happily retire &quot;permutation!&quot; applies its child parser in any order, as long as all of them succeed once fn permutation() { named!(perm&amp;lt;(&amp;amp;[u8], &amp;amp;[u8], &amp;amp;[u8])&amp;gt;, permutation!(tag!(&quot;abcd&quot;), tag!(&quot;efg&quot;), tag!(&quot;hi&quot;)) ); let expected = (&amp;amp;b&quot;abcd&quot;[..], &amp;amp;b&quot;efg&quot;[..], &amp;amp;b&quot;hi&quot;[..]); let a = &amp;amp;b&quot;abcdefghijk&quot;[..]; assert_eq!(perm(a), Done(&amp;amp;b&quot;jk&quot;[..], expected)); let b = &amp;amp;b&quot;efgabcdhijk&quot;[..]; assert_eq!(perm(b), Done(&amp;amp;b&quot;jk&quot;[..], expected)); let c = &amp;amp;b&quot;hiefgabcdjk&quot;[..]; assert_eq!(perm(c), Done(&amp;amp;b&quot;jk&quot;[..], expected) } This one was very interesting to write :) &quot;tag_no_case!&quot; works like &quot;tag!&quot;, but compares independently from the case. This works great for ASCII strings, since the comparison requires no allocation, but the UTF-8 case is trickier, and I'm still looking for a correct way to handle it &quot;named_attr!&quot; creates functions like &quot;named!&quot; but can add attributes like documentation. This was a big pain point, now nom parsers can have documentation generated by rustdoc &quot;many_till!&quot; applies repeatedly its first child parser until the second succeeds Whitespace separated formats This is one of the biggest new additions, and a feature that people wanted for a long time. A lot of the other Rust parser libraries are designed with programming languages parsing in mind, while I started nom mainly to parse binary formats, like video containers. Those libraries usually handle whitespace parsing for you, and you only need to specify the different elements of your grammars. You essentially work on a list of already separated elements. Previously, with nom, you had to explicitely parse the spaces, tabs and end of lines, which made the parsers harder to maintain. What we want in the following example is to recognize a &quot;(&quot;, an expression, then a &quot;)&quot;, and return the expression, but we have to introduce a lot more code: named!(parens&amp;lt;i64&amp;gt;, delimited!( delimited!(opt!(multispace), tag!(&quot;(&quot;), opt!(multispace)), expr, delimited!(opt!(multispace), tag!(&quot;)&quot;), opt!(multispace)) ) ); This new release introduces &quot;ws!&quot;, a combinator that will automatically insert the separators everywhere: named!(parens&amp;lt;i64&amp;gt;, ws!(delimited!( tag!(&quot;(&quot;), expr, tag!(&quot;)&quot;) )) ); By default, it removes spaces, tabs, carriage returns and line feed, but you can easily specify your own separator parser and make your own version of &quot;ws!&quot;. This makes whitespace separated formats very easy to write. See for example the quickly put together, probably not spec compliant JSON parser I added as test. If you're working on a language parsers, this should help you greatly. Architecture changes Error management The error management system that accumulated errors and input positions as it backtracks through the parser tree is great for some projects like language parsers, but others were not using it and got a penalty because of vectors allocation and deallocation. In the 2.0 release, this error management system is now activated by the &quot;verbose-errors&quot; feature. Projects that don't use it should build correctly right away, and their parsers could get 30% to 50% faster! Input types One of nom's original assumptions was that it should work on byte slices and strings instead of byte or char iterators, because the CPU likes contiguous data. As always, the reality is a bit more complex than that, but it worked well and made the code very simple: I only passed subslices from one parser to the next. But I wrongly assumed that because of that design, nom could only work on contiguous data. Carl Lerche made the interesting point that there are few points where nom actually needs to read a serie of bytes or chars and those could accomodate other data structures like ropes or a list of buffers. So I got to work on an abstraction for input types that would work for &amp;amp;[u8] and &amp;amp;str, but also for other types. In the process, I was able to factor most of the &amp;amp;str specific combinators with the &amp;amp;[u8] ones. This will make them easier to maintain in the future. The result of that work is a list of traits that any input type should implement to be usable with nom. I experimented a bit with the BlockBuf type, and this approach looks promising. I expect that people will find cool applications for this, like parsers returning references to not yet loaded data, or blocking a coroutine on a tag comparison until the data is available. A smooth upgrade process For the 1.0 release, I choose a few projects using nom, and tried to build them to test the new version and document the upgrade. This was so useful that I did it again for 2.0, so if you're lucky, you maintain one of the 30 crates I tested, and you received a pull request doing that upgrade for you. Otherwise, I wrote an upgrade documentation that you can follow to fix the migration issues. You're still lucky, though, because most crates will build (or only require a one line fix in Cargo.toml). I'll write soon about that process and the benefits you can get by applying it to your projects. The future I have a lot of ideas for the next version, also a lot of pull requests to merge and issues to fix. Not everything could make it into the 2.0, otherwise I would never have released it. In short, the plan: rewrite completely the producers and consumers system. It is not very usable right now. It could be replaced by an implementation based on futures improve the performance. I got a good enough library by choosing the most naive solutions, but there are lots of points I could improve (especially in helping LLVM generate faster code) implement a new serialization library. I believe there is some room for a serialization system that does not rely on automatic code generation, and it would go well with nom continue my work on writing nom demuxers for VLC media player. I have a good proof of concept, now I need to make it production ready add new, interesting examples: indentation based programming languages, tokio integration, integration in high performance networking systems I'll release very soon a large networking tool that relies heavily on nom. Expect some big news :) That's it, now go and upgrade your code, you'll enjoy this new version! &amp;nbsp;</summary></entry></feed>