This year in nom: 2.0 is here!

Nearly one year ago, on November 15th 2015, I released the 1.0 version of nom, the fast parser combinators library I wrote in Rust. A lot happened around that project, and I have been really happy to interact with nom users around the world.

TL;DR: it's new nom day! The 2.0 release is here! Read the changelog. Follow the upgrade documentation if it breaks stuff.


Interesting usage

I wouldn't be able to list all the projects using nom on this page, even the subset present on, but here are a few examples of what people built with nom:

And a lot of other projects. As a side note, people apparently like to build parsers for flac, bittorrent and bitcoin stuff, LISP and Scheme tokenizers and, oddly, ASN.1 libraries :D

I have been really humbled by what people achieved with this little library, and I hope it will enable even more awesome projects!

Growth and stabilization

The goal before 1.0 was to get a usable parsing library, and after 1.0, to add features people were missing and explore new ideas. A lot of code was contributed for bitstream and string parsing, and adding a lot of useful combinators like "peek!", "separated_list!" or "tuple!".

Unfortunately, a few parts of nom got increasingly painful to maintain and support, so the 2.0 was a good opportunity to clean them up, and add more features while we're at it.

The "chain!" combinator, which everybody uses to parse a sequence of things and accumulate the results in structs or tuple, is now deprecated, and will be replaced by "do_parse!", a simpler alternative. There are also a lot of specific helpers to make your code nicer, like "pair!", "preceded!", "delimited!", "separated_pair!", "separated_list!" and "delimited!". Yes, I went to great lengths to make sure you stop using chain :)

The "length_value!" and other associated combinators were refactored, to have more sensible names and behaviours. "eof", eol" and the basic token parsers like "digit" or "alphanumeric" got the same treatment. Those can be a source of issues in the upgrade to 2.0, but if the new behaviour does not work in your project, replacing them is still easy with the "is_a!" combinator and others.

At last, I changed the name of the "error!" macro that was conflicting with the one from the log crate. I hoped that by waiting long enough, the log people would change their macro, but it looks like I lost :p

New combinators

A few new simple combinators are here:

  • the previously mentioned "do_parse!" makes nicer code than "chain!":

The "chain!" version uses this weird closure-like syntax (while not actually using a closure) with a comma ending the parser list:

 m: brand_name ~ 
 v: take!(4) ~ 
 c: many0!(brand_name) , 
 ||{ FileType{
   major_brand: m,
   compatible_brands: c
 } } 

The "do_parse!" version only uses ">>" as separating token, and returns a value as a tuple. If the tuple contains only value, (A) is conveniently equivalent to A.

   m: brand_name >> 
   v: take!(4) >> 
   c: many0!(brand_name) >> 
     major_brand: m,
     compatible_brands: c

"chain!" had too many features, like a "?" indicating a parser was optional (which you can now do with "opt!"), and you could declare one of the values as mutable. All of those and the awkward syntax made it hard to maintain. Still, it was one of the first useful combinators in nom, and it can now happily retire

  • "permutation!" applies its child parser in any order, as long as all of them succeed once
  fn permutation() {
    named!(perm<(&[u8], &[u8], &[u8])>,
      permutation!(tag!("abcd"), tag!("efg"), tag!("hi"))

    let expected = (&b"abcd"[..], &b"efg"[..], &b"hi"[..]);

    let a = &b"abcdefghijk"[..];
    assert_eq!(perm(a), Done(&b"jk"[..], expected));
    let b = &b"efgabcdhijk"[..];
    assert_eq!(perm(b), Done(&b"jk"[..], expected));
    let c = &b"hiefgabcdjk"[..];
    assert_eq!(perm(c), Done(&b"jk"[..], expected)

This one was very interesting to write :)

  • "tag_no_case!" works like "tag!", but compares independently from the case. This works great for ASCII strings, since the comparison requires no allocation, but the UTF-8 case is trickier, and I'm still looking for a correct way to handle it
  • "named_attr!" creates functions like "named!" but can add attributes like documentation. This was a big pain point, now nom parsers can have documentation generated by rustdoc
  • "many_till!" applies repeatedly its first child parser until the second succeeds

Whitespace separated formats

This is one of the biggest new additions, and a feature that people wanted for a long time. A lot of the other Rust parser libraries are designed with programming languages parsing in mind, while I started nom mainly to parse binary formats, like video containers. Those libraries usually handle whitespace parsing for you, and you only need to specify the different elements of your grammars. You essentially work on a list of already separated elements.

Previously, with nom, you had to explicitely parse the spaces, tabs and end of lines, which made the parsers harder to maintain. What we want in the following example is to recognize a "(", an expression, then a ")", and return the expression, but we have to introduce a lot more code:

named!(parens<i64>, delimited!(
    delimited!(opt!(multispace), tag!("("), opt!(multispace)),
    delimited!(opt!(multispace), tag!(")"), opt!(multispace))

This new release introduces "ws!", a combinator that will automatically insert the separators everywhere:

named!(parens<i64>, ws!(delimited!( tag!("("), expr, tag!(")") )) );

magicBy default, it removes spaces, tabs, carriage returns and line feed, but you can easily specify your own separator parser and make your own version of "ws!".

This makes whitespace separated formats very easy to write. See for example the