No, C++ still isn't cutting it.

A recent blog post asked "Does C++ still deserve its bad rap?"While that title presumes the answer (after all, a "bad rap" implies that the judgement is undeserved), I think C++'s reputation for making it very difficult to achieve security or memory safety or thread-safety remains well-deserved, even though it's gotten a lot better over the years. I mean that: The c++-17 version of that program is years better than it would have been with c++0x, and it let a careful programmer write a nice program.The post used a simple multi-file word count as an example:  Count all words in all ".txt" files in the current directory and all of its subdirectories, where words were defined as things matching the regexp "([a-z]{2,})" in a case-insensitive manner.First, I'll walk through the example from that blog post and identify some remaining bugs.  And then we'll build up a (IMO) better version in Rust that's faster, safer, and more concis…

Tolerating the Pandemic #4: The Ventilation Edition

This is #4 in the series of how we can tolerate the pandemic in a lot of slow, steady progress and hard work.  As a quick recap, the first post laid out a bit of a roadmap (March 23, 2020); the second (April 17, 2020) and third (June 26, 2020) noted updates on the categories described in the first.In the two months since my previous post, one of the most exciting advances has been in improved understanding of the transmission dynamics of SARS-CoV-2, particularly with respect to aerosol-like transmission.  An Aug 11 New York Times article has a good summary of some of the research in this area.  Tl;dr:  Ballistic (large droplet) transmission is probably still a big factor, and is helped quite a bit by masks.  Aerosol transmission is a big contributor particularly to superspreader events (and particularly indoors).  Cloth or surgical masks help but are not a silver bullet, particularly in crowded, confined spaces.Why is this exciting? Because it helps us bring more tools to bear agains…

Tolerating the Pandemic #3

In two previous posts, I discussed some ongoing processes that will help us tolerate the pandemic better, and an update 3 weeks after the first.
It's been two months, which is enough time for more progress to have happened.  There hasn't been enough, and some things have had less progress than I've hoped for, but we've also seen a bit more progress in other areas.  Let's stick with the same framework from before:  Testing, PPE/Supplies, Treatments, and Societal changes to reduce contact.
In PCR testing, a picture is worth a thousand words, such as this one taken from Johns Hopkins:
We've gone from 58k tests/day, through 145k tests/day, up to about 450k tests/day.  It took us about a month to triple and another two months to triple again.  That's good, but it's still not at the levels that the Harvard study recommend.  It is, however, about halfway there, and we can probably get to where we need to be in another five months if the federal and state g…

Speeding up fuzzing rust with shared initialization

Having ported the Pi searcher to Rust, I've started exploring fuzzing frameworks to help find bugs.  I settled on Rust's cargo-fuzz using the libfuzzer backend, since I'm familiar with libfuzzer from using it on TensorFlow.

Naturally, since the Pi Searcher had been comparatively stable for 7 years in Go, and the translation from Go to Rust was fairly straightforward... I introduced a bug that my tests didn't find, and the fuzzer found over dinner:

Whoopsie.  That bug arose because of an error in handling the boundary condition of searching for just "9" using the index (which doesn't happen normally because the search function uses a simple sequential search for short strings):

Here, num_digits is 200 million, but they're stored compressed, so the bug was using the size of the file (100 million bytes), which caused the binary search to return a bad range.  Fuzzing is awesome, and adapting the fuzzer to the Pi searcher was straightforward:

But one thing…

Moving the Pi Searcher from Go to Rust

Seven years ago (almost to the date), I wrote about my experience moving the Pi Searcher from C++ to Go.  In that article, I mentioned that while the core searching code got slower, the system as a whole sped up by avoiding the per-request fork overhead of the earlier CGI-based code.

This year I moved the Pi Searcher to Rust, hoping to get the best of both worlds.  The tl;dr is that it worked and gave me and end-to-end throughput increase of at least 22%.  But not entirely in the place I'd expected!  The code is basically a transliteration of the Go to Rust, except for moving from Go's internal net/http server to actix-web for Rust.  The algorithm is the same:  sequential search for short search strings, and using a suffix array for longer strings.
Results Code length isn't too different.  I was a little sloppy in writing the Rust version, so I don't think a comparison is entirely fair, but they're within 50% of each other and would probably be closer if I structur…

Updates on factors that will ease living with a coronavirus pandemic

Three weeks ago, I posted about things that will help us loosen restrictions as we move forward through the coronavirus pandemic.  In this post, I'm following up on a few of those in the form of a progress tracker.
TestingPCR viral tests
March 23, 2020:  58,500 tests/day Apr 14, 2020:  145,000 tests/day.
Taken as the average of a 5 day window centered around the date of concern.
As with all things, test availability is subject to multiple bottlenecks.  It takes swabs, PCR reagents, PPE for the people doing the tests, machines on which to run tests, trained personnel, etc.  The manufacturing and supply of all of these can take time to spin up. We're not yet to the point where we can test all likely cases, but we're running about 2.4x as many tests as we were three weeks ago.
This is not enough.  Estimates vary widely, but one Harvard study suggests we may need in the millions of tests per day range.  Others put the numbers in…

Tolerating two years of a pandemic in no easy (but some manageable) steps

First off, a reminder:  Don't listen to random or not-so-random people like me when it comes to predictions about the spread and danger of COVID-19.  Listen to the epidemiologists.  I suggest starting with:
The Imperial College London team, who've done groundbreaking work on previous coronaviruses including SARS and MERS.  It's an experienced, all-star team.Trevor Bedford's twitter feed is full of great stuff tracing the path of the coronavirus in Seattle.Adam Kucharski et al. at the London School of Hygiene & Tropical Medicine;The work from Christian Althaus's group at the Institute for Social and Preventative MedicineHelen Branswell's reporting on the subject is accurate and lucid. My goal here is to lay out the case that:  (1)  It's worth doing our collective best at minimizing social contact for a while; and that (2)  It won't be as bad in the long run, due to a combination of social and technical factors.  In other words, we'll get through t…