Speeding up fuzzing rust with shared initialization
Having ported the Pi searcher to Rust, I've started exploring fuzzing frameworks to help find bugs. I settled on Rust's cargo-fuzz using the libfuzzer backend, since I'm familiar with libfuzzer from using it on TensorFlow.
Naturally, since the Pi Searcher had been comparatively stable for 7 years in Go, and the translation from Go to Rust was fairly straightforward... I introduced a bug that my tests didn't find, and the fuzzer found over dinner:
Whoopsie. That bug arose because of an error in handling the boundary condition of searching for just "9" using the index (which doesn't happen normally because the search function uses a simple sequential search for short strings):
Here, num_digits is 200 million, but they're stored compressed, so the bug was using the size of the file (100 million bytes), which caused the binary search to return a bad range. Fuzzing is awesome, and adapting the fuzzer to the Pi searcher was straightforward:
But one thing that bugged me: Fuzzing was a little slow.
Why was it slow? Every time the fuzz function is run with a new input, it had to instantiate a new PiSearcher. Each PiSearcher opens and memory maps using mmap two files: 100MB of compressed digits of Pi and 800MB of a suffix array index into those bytes. This adds a bunch of unnecessary overhead compared to the relatively fast searching code.
Libfuzzer's documentation suggests that initialization should use a statically initialized global object. But this is Rust. One does not simply willy-nilly toss around globals. Some quick Googling didn't turn up much useful, so I put it aside, as this was merely a speed optimization.
Fortunately, the most recent "this week in Rust" blog pointed to Paul Kernfeld's very useful Guide to Global Data in Rust. Which, in turn, linked to the once_cell crate. Just what the fuzzer ordered! A quick rewrite of the fuzzing target yielded:
Which allows the (read-only) PiSearcher object to be reused across fuzzing calls. Simple and easy.
How much did this help?
> cargo fuzz run coretest
Debug build (default):
Old version (new mmap per function call): 304 tests/second.
New version (statically initialized PiSeacher): 340 tests/second.
10% is nice and all, but that's not a particularly large gain. But wait - that's very slow compared to the normal speed of the Pi searcher. Ahh, yes: cargo fuzz builds debug mode by default. Let's try release, keeping in mind that coverage may not be quite as good (but with the speed gains, it's probably worth it):
> cargo fuzz run coretest --release --jobs 4
Release build, 4 jobs:
Old version: 10800
New version 26050
Not bad - 10% in debug mode and almost 2.5x faster in release mode with multiple jobs. Thanks, once_cell!
Naturally, since the Pi Searcher had been comparatively stable for 7 years in Go, and the translation from Go to Rust was fairly straightforward... I introduced a bug that my tests didn't find, and the fuzzer found over dinner:
Whoopsie. That bug arose because of an error in handling the boundary condition of searching for just "9" using the index (which doesn't happen normally because the search function uses a simple sequential search for short strings):
Here, num_digits is 200 million, but they're stored compressed, so the bug was using the size of the file (100 million bytes), which caused the binary search to return a bad range. Fuzzing is awesome, and adapting the fuzzer to the Pi searcher was straightforward:
But one thing that bugged me: Fuzzing was a little slow.
Why was it slow? Every time the fuzz function is run with a new input, it had to instantiate a new PiSearcher. Each PiSearcher opens and memory maps using mmap two files: 100MB of compressed digits of Pi and 800MB of a suffix array index into those bytes. This adds a bunch of unnecessary overhead compared to the relatively fast searching code.
Libfuzzer's documentation suggests that initialization should use a statically initialized global object. But this is Rust. One does not simply willy-nilly toss around globals. Some quick Googling didn't turn up much useful, so I put it aside, as this was merely a speed optimization.
Fortunately, the most recent "this week in Rust" blog pointed to Paul Kernfeld's very useful Guide to Global Data in Rust. Which, in turn, linked to the once_cell crate. Just what the fuzzer ordered! A quick rewrite of the fuzzing target yielded:
Which allows the (read-only) PiSearcher object to be reused across fuzzing calls. Simple and easy.
How much did this help?
> cargo fuzz run coretest
Debug build (default):
Old version (new mmap per function call): 304 tests/second.
New version (statically initialized PiSeacher): 340 tests/second.
10% is nice and all, but that's not a particularly large gain. But wait - that's very slow compared to the normal speed of the Pi searcher. Ahh, yes: cargo fuzz builds debug mode by default. Let's try release, keeping in mind that coverage may not be quite as good (but with the speed gains, it's probably worth it):
> cargo fuzz run coretest --release --jobs 4
Release build, 4 jobs:
Old version: 10800
New version 26050
Not bad - 10% in debug mode and almost 2.5x faster in release mode with multiple jobs. Thanks, once_cell!
Comments
Post a Comment