Dave's Data

Posts

Integer log10 in Rust and C

February 10, 2023

Following the stabilization of the integer log10 interface in Rust 1.67 , I got interested in seeing whether it could be optimized. There's well known lore in this area starting from the omnipresent Hackers Delight implementation, to more recent thoughts (discussed here ). I had a bit of a new thought on how to express it that turned out inferior for reasons I find interesting. Let's dive in. Most of the approaches involve a two-step process: (1) Use integer log2 to approximate the log10 result; and (2) check to see if the value exceeds the next log10 threshold and increment if so. To make that clear, consider the number 14. Its rounded-down log2 is 3. In that range one can find the numbers 8-15. Therefore, one would map 3 -> 0 (i.e., "numbers whose log2 is 3 should be assumed to be 10^0"), but then test to see if the number was >= 10, in which case, it gets an extra boost. This algorithm was succinctly described by Steve Cannon in a stackoverflow post , using ...

No, C++ still isn't cutting it.

October 17, 2020

A recent blog post asked " Does C++ still deserve its bad rap ?" While that title presumes the answer (after all, a "bad rap" implies that the judgement is undeserved), I think C++'s reputation for making it very difficult to achieve security or memory safety or thread-safety remains well-deserved, even though it's gotten a lot better over the years. I mean that: The c++-17 version of that program is years better than it would have been with c++0x, and it let a careful programmer write a nice program. The post used a simple multi-file word count as an example: Count all words in all ".txt" files in the current directory and all of its subdirectories, where words were defined as things matching the regexp "([a-z]{2,})" in a case-insensitive manner. First, I'll walk through the example from that blog post and identify some remaining bugs. And then we'll build up a (IMO) better version in Rust that's faster, safer, and more c...

Tolerating the Pandemic #4: The Ventilation Edition

August 26, 2020

This is #4 in the series of how we can tolerate the pandemic in a lot of slow, steady progress and hard work. As a quick recap, the first post laid out a bit of a roadmap (March 23, 2020); the second (April 17, 2020) and third (June 26, 2020) noted updates on the categories described in the first. In the two months since my previous post, one of the most exciting advances has been in improved understanding of the transmission dynamics of SARS-CoV-2, particularly with respect to aerosol-like transmission. An Aug 11 New York Times article has a good summary of some of the research in this area. Tl;dr: Ballistic (large droplet) transmission is probably still a big factor, and is helped quite a bit by masks. Aerosol transmission is a big contributor particularly to superspreader events (and particularly indoors). Cloth or surgical masks help but are not a silver bullet, particularly in crowded, confined spaces. Why is this excitin...

Tolerating the Pandemic #3

June 26, 2020

In two previous posts, I discussed some ongoing processes that will help us tolerate the pandemic better , and an update 3 weeks after the first. It's been two months, which is enough time for more progress to have happened. There hasn't been enough, and some things have had less progress than I've hoped for, but we've also seen a bit more progress in other areas. Let's stick with the same framework from before: Testing, PPE/Supplies, Treatments, and Societal changes to reduce contact. Testing In PCR testing , a picture is worth a thousand words, such as this one taken from Johns Hopkins: We've gone from 58k tests/day, through 145k tests/day, up to about 450k tests/day. It took us about a month to triple and another two months to triple again. That's good, but it's still not at the levels that the Harvard study recommend . It is, however, about halfway there, and we can probably get to where we need to be in another five months if the federal and ...

Speeding up fuzzing rust with shared initialization

May 24, 2020

Having ported the Pi searcher to Rust , I've started exploring fuzzing frameworks to help find bugs. I settled on Rust's cargo-fuzz using the libfuzzer backend, since I'm familiar with libfuzzer from using it on TensorFlow . Naturally, since the Pi Searcher had been comparatively stable for 7 years in Go, and the translation from Go to Rust was fairly straightforward... I introduced a bug that my tests didn't find, and the fuzzer found over dinner: Whoopsie. That bug arose because of an error in handling the boundary condition of searching for just "9" using the index (which doesn't happen normally because the search function uses a simple sequential search for short strings): Here, num_digits is 200 million, but they're stored compressed, so the bug was using the size of the file (100 million bytes), which caused the binary search to return a bad range. Fuzzing is awesome, and adapting the fuzzer to the Pi searcher was straightforward: ...

Moving the Pi Searcher from Go to Rust

May 18, 2020

Seven years ago (almost to the date), I wrote about my experience moving the Pi Searcher from C++ to Go . In that article, I mentioned that while the core searching code got slower, the system as a whole sped up by avoiding the per-request fork overhead of the earlier CGI-based code. This year I moved the Pi Searcher to Rust, hoping to get the best of both worlds. The tl;dr is that it worked and gave me and end-to-end throughput increase of at least 22%. But not entirely in the place I'd expected! The code is basically a transliteration of the Go to Rust, except for moving from Go's internal net/http server to actix-web for Rust. The algorithm is the same: sequential search for short search strings, and using a suffix array for longer strings. Results Code length isn't too different. I was a little sloppy in writing the Rust version, so I don't think a comparison is entirely fair, but they're within 50% of each other and would probably be ...

Updates on factors that will ease living with a coronavirus pandemic

April 17, 2020

Three weeks ago, I posted about things that will help us loosen restrictions as we move forward through the coronavirus pandemic . In this post, I'm following up on a few of those in the form of a progress tracker. Testing PCR viral tests March 23, 2020: 58,500 tests/day Apr 14, 2020: 145,000 tests/day. Source: https://covidtracking.com/data/us-daily Taken as the average of a 5 day window centered around the date of concern. SARS-CoV-2 tests per day As with all things, test availability is subject to multiple bottlenecks. It takes swabs, PCR reagents, PPE for the people doing the tests, machines on which to run tests, trained personnel, etc. The manufacturing and supply of all of these can take time to spin up. We're not yet to the point where we can test all likely cases, but we're running about 2.4x as many tests as we were three weeks ago. This is not enough. Estimates vary widely, but one Harvard stud...