Dave's Data

Posts

Improving the Pi Searcher's speed by moving from C++ to Go

May 15, 2013

From the title, I hope you expect that there's a trick here, and, of course, there is. I'll get to that in a second. None of this is new - it's the same observation about the overhead of fork/exec in CGI that things like FastCGI were designed to address. The reason I'm scribbling about it was the way in which moving to a "slower" language actually made it easier to use the right system architecture for better overall performance. I created the Pi Searcher as a joke in 1996 or thereabouts. In those days, we were still finding out about new and cool websites by word-of-mouth, and a guy named Arthur Bebak ran an email list called Netsurfer Digest. In it, he spread news about cool new things happening on this "web" thing. In one of his issues, he noted a ridiculously useless website, and asked, "What's next? A site where you can search for your birthday in Pi?". It had to happen. Over the years, the Pi searcher has grown, adop...

Mushroom lentil bourguignon [vegan, tasty]

April 15, 2013

Boeuf bourguignon - or its chickenly near-equivalent coq au vin - is one of those rub-your-tummy satisfying dishes that offers a lovely combination of heartiness and deep, rich flavors. We decided to host a real dinner party the other day for the first time since turning our lives a little upside down by having a child, and this seemed like a good way to go. Except, well, for that beef/chicken thing. Which brought me to the question: Is it possible to create a vegan version of the dish that's the equal of the original? Nearly. It lacks just a little of the mouthfeel imparted by the gelatin in an amazing beef stock, but the rest of the flavors are there. This is one of my favorite dishes I've made recently, and I love the way it came out. This isn't my usual type of content for this blog, but I was happy enough with it that I wanted to write down the recipe somewhere. I started from a recipe I found on Treehugger recently ; but, afte...

Rank & Select for Systems Folks + our minor contribution to the area

April 02, 2013

Last Spring, I had the opportunity to teach a "special topics" course. I use that as shorthand for "Dave wants to learn about X, so I'll inflict my learning process upon some willing victims and collaborators called students." I roped in Michael Kaminsky to co-teach a course on "memory and resource efficient big data computing," which we pretty much defined as we went along. In fact, one of the first assignments for the students was to help come up with topics to discuss. CMU students are an awesome resource that way. There's been a lot of exciting progress in memory-efficient data structures from the theory side of things in the last decade or so. Not all of it is usable yet, but some of it has that "this is almost there" feel to it that gets me salivating for the chance to refactor how I think about some aspects of system design. There are two themes that I think are particularly worth considering: The ideas behind succinct d...

Optimistic Cuckoo Hashing for concurrent, read-intensive applications

March 11, 2013

Our FAWN team has been spending a lot of time looking at memory-efficient data and algorithms structures for various things (with a lot of emphasis on key-value stores, as per our first and second papers on FAWN-KV and SILT , respectively). Coming up at NSDI'13 , Bin Fan has a new paper in which we substantially improve the throughput of the memcached distributed DRAM cache. One of the core techniques we use is a new multiple-reader, single-writer concurrent variant of Cuckoo hashing that we call optimistic cuckoo hashing. It combines the refinement of a technique we introduced in SILT ("partial-key cuckoo hashing"), a new way of moving items during cuckoo insertion, and an optimistic variant of lock-striping to create a hash table that is extremely compact and supports extremely high read throughput, while still allowing one thread to update it at high speed (about 2M updates/second in our tests). We've released the code on github to this and you should...

Caring about Causality - now in Cassandra

February 25, 2013

Over the past few years, we've spent a bunch of time thinking about and designing scalable systems that provide causally-consistent wide-area replication. (Here, "we" means the team of Wyatt Lloyd, Michael Freedman, Michael Kaminsky, and myself; but if you know academia, you wouldn't be surprised that about 90% of the project was accomplished by Wyatt, who's a graduating Ph.D. student at the time of this writing.) I'm posting this because we've finally entered the realm of the practical, with the release of both the paper (to appear at NSDI'13) and code for our new implementation of causally-consistent replication (we call it Eiger) within the popular Cassandra key-value store. Why do we care about consistency in wide-area replication? Because there's a fundamental, unavoidable trade-off between having guaranteed low-latency access (meaning not having to send packets back-and-forth across the country) and making sure that every client sees ...

Teaching Distributed Systems in Go

February 12, 2013

It's been a year and a half since I migrated my undergrad distributed systems course (15-440) to the Go programming language . I'm still glad I did it and will keep it there, but we've also learned some important lessons about how to best use it in our context that I hope I'll be able to improve upon next semester. Hopefully, some of this may be useful for those of you considering moving to or creating a Go-based class. The big points I hope to get across are: (1) Go is really good for distributed systems; (2) Students need more intro to the language than I thought; (3) The language has matured a lot, but there are some technical glitches remaining that you'll have to work around. Go is good for teaching (and building) distributed systems I believe this for two reasons: Go provides a good mix of high-level and low-level functionality. Channels and goroutines provide a program structure that works well for distributed progra...

Two examples from the computer science review and publication process

January 30, 2013

A few days ago, I posted about getting the Ph.D. in computer science . As part of that post, I mentioned that I'd publish & discuss some of the reviews my papers have received. These aren't so nasty that they're funny, but I decided to post the full submission/review/submission/review/final as a way to also help show what the publication and revision process looks like. I probably owe a better explanation of the delta between the papers, but - hey, the SIGCOMM, USENIX ATC, and SEA (algorithms) deadlines are all this week. :) I've deliberately picked two papers with very different initial reviews. Both were eventually accepted, and neither was accepted upon first submission. One received a best paper award. Both have over 100 citations. I've picked these not because they're representative -- I've written papers with low citation counts also -- but because they illustrate the wide disparity in review constructiveness and tone that eve...