Crossing a Brdg

Roughly two weeks ago was my last day as a visiting scientist at Google.  It's been four years, one full-time in Mountain View and three working there one day per week in Pittsburgh after my return to Carnegie Mellon.  I'm also about to hit send on an email requesting a leave from CMU starting in January, because...

We're joining the startup race!
Together with Michael Kaminsky (my attached-at-the-hip co-advisor and co-author of 15 years), Robbie Sedgewick (formerly Apple, Uber, Google), and Ash Munshi (co-founder with me, Mu, and Alex Smola at Marianas Labs five years ago, currently CEO of Pepperdata, former one-time CTO of Yahoo), we're creating a little company that we're very excited about.
BrdgAI, complete with 2019-era vowel dropping, aims to be the connection between cloud-based machine learning and companies that produce enormous amounts of data at the edge from video and other sensors.  Many real-world deployments of modern machine learning operate with ba…

Chili Crisp Showdown: Laoganma and Flybyjing

I've been on a bit of a chili crisp mission lately.  The Internets -- and my friends -- appear convinced that the best chili crisp is Laoganma.  Or Flybyjing.  Certainly one of them.  But which?

My answer, alas, is that you might want a bottle of each.  Or five bottles of each.  Let's jump in:

Laoganma on the left, Flybyjing on the right.  The famous slightly disapproving face of the old grandmother stares at us, contrasted with the apparently very-hard-to-print rainbow label of the newer entrant.  Opening up the containers, one is immediately struck by the color and texture differences:

Laoganma, on the left in both photos, has more "crunchy bits", but also has fewer suspended solids in the oil.  The bottle is about 3/4 full of the solid crispy bits.  The flybyjing bottle is only about 1/3 full of crispy bits.

This difference flows through to the spoonfuls below -- much more crispy stuff in the laoganma, but its oil is merely the color of chili oil, and isn't qu…

Finding Bugs in TensorFlow with LibFuzzer

Over the past year, I've spent some time working on improving the robustness of TensorFlow.  As I mentioned earlier, one of my goals for my time at Google was to dive into industry best-practices for writing good code.  At Google, writing good code starts with careful programmers, requires good tests that get run on a fantastic internal testing infrastructure, is improved through code review, and makes use of several code quality tools and linters.

One part of that testing that's been gaining more visibility recently is fuzzing - throwing random inputs at programs or libraries to try to cause them to crash.  John Regehr has been fuzzing compilers for a while now - very effectively.  (That link has a nice taxonomy of the types of fuzzers.)  Google's Project Zero has been fuzzing the FreeType library for the last 4 years, and has found a tremendous number of security vulnerabilities in all sorts of programs.   (This isn't to suggest fuzzing is new - it's been used f…

Accelerating cryptocurrency Mining with Intel ISPC

Daniel Lemire and Maxime Chevalier just had a great exchange on Twitter about the state of compilers and being able to automatically vectorize code, such as a scalar product.  Of course, hoping the compiler can correctly vectorize your code can be a bit fragile, and as Maxime points out, writing raw intrinsics results in an unreadable pain in the editor, and a GPU-style implementation in CUDA or OpenCL might be more scalable and maintainable.

A few years ago, some folks at Intel wrote a compiler called ISPC, the Intel SPMD Program Compiler.  A possibly unfairly simple way to describe ISPC is that it's OpenCL-style programming for x86 vector units.  You can write code that looks like this:
export uniform float scalar_product(uniform float a[],                                     uniform float b[],                                     uniform int count) {     float a_times_b = 0.0;     foreach (i = 0 ... count) {         a_times_b += a[i] * b[i];     }     return reduce_add(a_times_b); }

Finances for recent CS Ph.Ds headed to academia

As you may have guessed from my recent post analyzing the TCO of my former automobile, and my post on finances for CS Ph.D. students, I've been thinking about finance a bit lately.  After a thought-provoking discussion with a senior colleague in another department, I've come to the conclusion my financial satisfaction graph looked something like this -- and I bet it's similar for many other no-kids-at-completion academics who end up enfamilied (can I make up that word?).

(In case I seem too negative about kids, don't get me wrong.  As the absolutely awesome book All Joy and No Fun notes, being honest about the myriad costs of having a child isn't at odds with also being very glad that the little wriggling worm is in your life.  Great book.  If you haven't read it and you do or may have kids, read it.)

The Taulbee survey suggests that the median assistant professor in Computer Science has a 9mo salary of about $96,055.  Assuming you pay your summers, that's …

22 Months of Voice-Based Assistants

Almost two years ago, I posted on Google+ that I'd purchased an Amazon Echo.  In that post, I wrote: It's surprisingly cool.  But the best part is what it makes me want, because it also sucks.  It can't recognize my 2.6 y/o daughter's voice, for example -- even though she can say to it fairly clearly "Alexa, play Puff the Magic Dragon"  (it will, if I ask it).  What an awesome enabler for kids, if it worked.  A little dangerous, too, but hey. :-)  It can't turn my remote-enabled lights on or off for me.  It can't even send me an email to remind me about something.  Boo.But - these are all current limitations.  Its speech recognition, albeit within a slightly narrow domain, is really solid.  It's happy with me, it's happy with my wife, and it's happy listening to us with the microwave running and a toddler running around.  The convenience is awesome.  I suddenly want to control more of my life by talking to it. But I can't. Yet.It'…

Nobody ever implements sort

In discussions on Hacker News, I repeatedly see comments about "nobody ever implements sorting algorithms" -- or, generalized, that it's a waste of time studying up on basic data structures and algorithms, because they're all already implemented.

While it's rare, it's not been my experience to never re-implement the basics.  I decided to data-mine my and my research group's past and current projects to identify the number of places we've implemented elementary data structures for good reasons.
Sorting In the Cuckoo Filter, we implement a custom sort to sort the four items that can go in a bucket.  Because this is a high-performance context, and the size of the set to be sorted is fixed and small, it's much faster to do this inline.  This sort only cares about the least-significant byte of the key.

    inline void SortPair(uint32_t& a, uint32_t& b) {          if ((a & 0x0f) > (b & 0x0f)) {              std::swap(a, b);          }     …