Posts

Speeding up fuzzing rust with shared initialization

Image
Having ported the Pi searcher to Rust, I've started exploring fuzzing frameworks to help find bugs.  I settled on Rust's cargo-fuzz using the libfuzzer backend, since I'm familiar with libfuzzer from using it on TensorFlow.

Naturally, since the Pi Searcher had been comparatively stable for 7 years in Go, and the translation from Go to Rust was fairly straightforward... I introduced a bug that my tests didn't find, and the fuzzer found over dinner:


Whoopsie.  That bug arose because of an error in handling the boundary condition of searching for just "9" using the index (which doesn't happen normally because the search function uses a simple sequential search for short strings):

Here, num_digits is 200 million, but they're stored compressed, so the bug was using the size of the file (100 million bytes), which caused the binary search to return a bad range.  Fuzzing is awesome, and adapting the fuzzer to the Pi searcher was straightforward:

But one thing…

Moving the Pi Searcher from Go to Rust

Image
Seven years ago (almost to the date), I wrote about my experience moving the Pi Searcher from C++ to Go.  In that article, I mentioned that while the core searching code got slower, the system as a whole sped up by avoiding the per-request fork overhead of the earlier CGI-based code.

This year I moved the Pi Searcher to Rust, hoping to get the best of both worlds.  The tl;dr is that it worked and gave me and end-to-end throughput increase of at least 22%.  But not entirely in the place I'd expected!  The code is basically a transliteration of the Go to Rust, except for moving from Go's internal net/http server to actix-web for Rust.  The algorithm is the same:  sequential search for short search strings, and using a suffix array for longer strings.
Results Code length isn't too different.  I was a little sloppy in writing the Rust version, so I don't think a comparison is entirely fair, but they're within 50% of each other and would probably be closer if I structur…

Updates on factors that will ease living with a coronavirus pandemic

Image
Three weeks ago, I posted about things that will help us loosen restrictions as we move forward through the coronavirus pandemic.  In this post, I'm following up on a few of those in the form of a progress tracker.
TestingPCR viral tests
March 23, 2020:  58,500 tests/day Apr 14, 2020:  145,000 tests/day.
Source:  https://covidtracking.com/data/us-daily
Taken as the average of a 5 day window centered around the date of concern.
As with all things, test availability is subject to multiple bottlenecks.  It takes swabs, PCR reagents, PPE for the people doing the tests, machines on which to run tests, trained personnel, etc.  The manufacturing and supply of all of these can take time to spin up. We're not yet to the point where we can test all likely cases, but we're running about 2.4x as many tests as we were three weeks ago.
This is not enough.  Estimates vary widely, but one Harvard study suggests we may need in the millions of tests per day range.  Others put the numbers in…

Tolerating two years of a pandemic in no easy (but some manageable) steps

Image
First off, a reminder:  Don't listen to random or not-so-random people like me when it comes to predictions about the spread and danger of COVID-19.  Listen to the epidemiologists.  I suggest starting with:
The Imperial College London team, who've done groundbreaking work on previous coronaviruses including SARS and MERS.  It's an experienced, all-star team.Trevor Bedford's twitter feed is full of great stuff tracing the path of the coronavirus in Seattle.Adam Kucharski et al. at the London School of Hygiene & Tropical Medicine;The work from Christian Althaus's group at the Institute for Social and Preventative MedicineHelen Branswell's reporting on the subject is accurate and lucid. My goal here is to lay out the case that:  (1)  It's worth doing our collective best at minimizing social contact for a while; and that (2)  It won't be as bad in the long run, due to a combination of social and technical factors.  In other words, we'll get through t…

Crossing a Brdg

Roughly two weeks ago was my last day as a visiting scientist at Google.  It's been four years, one full-time in Mountain View and three working there one day per week in Pittsburgh after my return to Carnegie Mellon.  I'm also about to hit send on an email requesting a leave from CMU starting in January, because...

We're joining the startup race!
Together with Michael Kaminsky (my attached-at-the-hip co-advisor and co-author of 15 years), Robbie Sedgewick (formerly Apple, Uber, Google), and Ash Munshi (co-founder with me, Mu, and Alex Smola at Marianas Labs five years ago, currently CEO of Pepperdata, former one-time CTO of Yahoo), we're creating a little company that we're very excited about.
BrdgAI, complete with 2019-era vowel dropping, aims to be the connection between cloud-based machine learning and companies that produce enormous amounts of data at the edge from video and other sensors.  Many real-world deployments of modern machine learning operate with ba…

Chili Crisp Showdown: Laoganma and Flybyjing

Image
I've been on a bit of a chili crisp mission lately.  The Internets -- and my friends -- appear convinced that the best chili crisp is Laoganma.  Or Flybyjing.  Certainly one of them.  But which?

My answer, alas, is that you might want a bottle of each.  Or five bottles of each.  Let's jump in:


Laoganma on the left, Flybyjing on the right.  The famous slightly disapproving face of the old grandmother stares at us, contrasted with the apparently very-hard-to-print rainbow label of the newer entrant.  Opening up the containers, one is immediately struck by the color and texture differences:


Laoganma, on the left in both photos, has more "crunchy bits", but also has fewer suspended solids in the oil.  The bottle is about 3/4 full of the solid crispy bits.  The flybyjing bottle is only about 1/3 full of crispy bits.

This difference flows through to the spoonfuls below -- much more crispy stuff in the laoganma, but its oil is merely the color of chili oil, and isn't qu…

Finding Bugs in TensorFlow with LibFuzzer

Image
Over the past year, I've spent some time working on improving the robustness of TensorFlow.  As I mentioned earlier, one of my goals for my time at Google was to dive into industry best-practices for writing good code.  At Google, writing good code starts with careful programmers, requires good tests that get run on a fantastic internal testing infrastructure, is improved through code review, and makes use of several code quality tools and linters.

One part of that testing that's been gaining more visibility recently is fuzzing - throwing random inputs at programs or libraries to try to cause them to crash.  John Regehr has been fuzzing compilers for a while now - very effectively.  (That link has a nice taxonomy of the types of fuzzers.)  Google's Project Zero has been fuzzing the FreeType library for the last 4 years, and has found a tremendous number of security vulnerabilities in all sorts of programs.   (This isn't to suggest fuzzing is new - it's been used f…