Teaching Distributed Systems in Go

It's been a year and a half since I migrated my undergrad distributed systems course (15-440) to the Go programming language.  I'm still glad I did it and will keep it there, but we've also learned some important lessons about how to best use it in our context that I hope I'll be able to improve upon next semester.   Hopefully, some of this may be useful for those of you considering moving to or creating a Go-based class.

The big points I hope to get across are:  (1)  Go is really good for distributed systems;  (2)  Students need more intro to the language than I thought;  (3)  The language has matured a lot, but there are some technical glitches remaining that you'll have to work around.

Go is good for teaching (and building) distributed systems

I believe this for two reasons:
  1. Go provides a good mix of high-level and low-level functionality.
  2. Channels and goroutines provide a program structure that works well for distributed programming.
The importance of high-level functionality:  A problem I encountered often in the C++ version of my course was students struggling with the STL or other libraries for data structures and algorithms.  My learning goals for 440 do not include "implement another hash table":  I want the students to take such options for granted and think more about the selection of structures and how they affect the bigger structure and communication of their program.  Native maps and lists are key.

So, it turns out, are web servers and things like JSON:  We strongly want the students to understand the idea of marshaling and RPC, but we wanted to have a maximally easy-to-debug wire format to keep things simple.  While I'm a fan of binary encodings when performance matters, for teaching, the readability of JSON trumps.  Being able to trivially instantiate a web server makes it easy to set up RPC in a way that's familiar to the students, and gives an easy-to-use interface.

The importance of low-level access:  At the same time, we did want the students to have to reason about issues such as buffering and queues.  In the first assignment, the students implement a library that does reliable, stop-and-wait client-to-server communication on top of UDP (and then implement a distributed password cracker using it).  We want them familiar with the sockets interface in all of its glory and ugly, and Go exposes it naturally.

Distributed thinking for distributed programming:  This is the big one:  It turns out that structuring programs internally as little distributed systems provides a good match for implementing distributed systems.  As an example, the figure to the right shows the suggested architecture of the project 1 system:  We structure the LSP library as three communicating goroutines.  The first handles the network:  All parsing and UDP is handled there.  It pushes well-defined events into a channel to the event handler.  The epoch handler similarly handles all timing-related issues, and ... pushes well-defined events to the event handler.  The event handler consists of a very simple go select loop:

                      select {
                      case netd := <-srv.netInChan:
                              srv.handleNetMessage(netd)
                      case appm := <-srv.appWriteChan:
                              srv.handleAppWrite(appm)
                      case <- srv.epochChan:
                              srv.handleEpoch()
                      }

(with a bit more after to handle responses).  The beauty of this is that it's a single select loop that both handles and serializes several different "types" of events - messages from the network, library function calls from the application, and timer events.  It's much more difficult to write and integrate a library in C that has its own threads doing things (or its own main loop).  It makes it very easy to reason about the "core" functionality of the program.  You can give a simpler assignment that uses the exact same structure to implement an echo client/server to help the students build up to this structure.

From a 2011 student:
"The theoretical concepts were very interesting and once I got used to Go, I was very happy that the class wasn't in C++ or C. I definitely see the advantages to Go, and I think that it was a very good idea to use it for the class. I feel a lot more comfortable having implemented a bunch of distributed concepts myself and having the code actually work as opposed to messing around for days with memory issues and arcane syntax."

Students need a lot of intro to Go

2011 comments:
"The professors decided to use Go, which is an extremely new and experimental language. There is very little clear documentation for it which forced time to be sunk into attempting to learn the language rather than trying to learn the concepts of the course."

"Go language was pretty interesting; definitely better choice than C++ but I would also suggest Java or wait till Go has better development tools (debugger, IDE, etc). I spent a lot of time getting the code compiled and then fixing small runtime issues which was quite annoying using printf statements since stacktrace is unreadable most of the time."

Go went a bit better in 2012 than it did in 2011, for a few reasons:  The documentation matured, the language stabilized, and the "tour of go" tutorial was written.  I sent out two reminder emails to the entire class over the summer encouraging them to go through the tour before the semester started.  We also taught one lecture introducing students to the language, and both Randy and I tended to use Go for all of the pseudocode we went through in class.  However:
"Shotgun-debugging Go while balancing other courses and first project was a disaster. All the Go tutorial sessions in the world didn't help manage the time it took to actually use Go for first project. Great for lecture pseudocode though. Also setting up a Go environment on a new machine is a nightmare. Compilation in general is a huge, unintuitive mess."
"It would be really helpful to just have a small assignment that is like a go tutorial before the first major project, so [people] like me don't try to take on P1 and learning Go at the same time."
The improvements weren't enough.  But the complaints changed.  It's clear that for 2013, we need a preliminary assignment that builds into writing a simple networked server using sockets, channels, and goroutines.  This shouldn't be hard, but it needs to be done.

Underlying some of the complaints in 2012 was a current, serious limitation in Go for instructional purposes:

You can't hand out precompiled library code and use the go build tools

This was one of the most serious limitations we encountered.  We tried several (hacky) approaches and none worked reliably enough for the class environment (we need to try the mtime hack again on AFS).  This meant that we had no way to hand out binary reference code for the students to test against.  Nor could we hand out reference implementations for students who couldn't get part of the project working.  This is also a serious limitation:  The projects in 15-440 are designed in a few parts, but the parts are cumulative by design, in order to help students learn about API and layering design and having to use their own library/server code a few weeks later.  We didn't want it to be all-or-nothing.

Our solution was that we handed out our full codebase for students to look at or use (losing a few points for using our version).  Which means that we can't reuse the project again next year.

Plan in advance for not being able to hand out libraries.

This gets us to the third problem, which has nothing to do with Go:

Don't underestimate the difficulty of writing good tests for undergrad distributed systems assignments.

The biggest complaint by far about the course -- and a completely valid one -- was that I botched handing out tests for my part of the projects in time.  I got hammered for this in the course feedback (Variations of "the most poorly-organized course I've ever taken" came up at least twice).  This has an unpleasant interaction with handing out our codebase:  We had redone the projects in full for 2012 to improve them and avoid potential cheating issues, but the time this consumed killed the course staff, and resulted in a poor experience for the students.  Ordinarily, this would be amortized across reusing the project a few times and letting the infrastructure mature.  We're now in the process of trying to create more general project infrastructure that we can reuse while still changing the projects themselves sufficiently.

My friends in industry are probably chuckling about this complaint - designing & testing distributed systems is hard - but it's a bit more rare to be whacked in the face with it this hard in academia.

Lessons from Project 3

The third project was a "design your own" team project in teams of two.  Students proposed and implemented a design of their choice.  The project rules boiled down to:
  1. Must have a core distributed systems challenge of some kind of distributed state management, synchronization, replication.  Must operate reliably under the failure of processes or nodes, etc.
  2. Must do something interesting.  Bonus points for nice UI and being cool.
This is by far and away my favorite part of the class.  In general, the students at Carnegie Mellon are both talented and creative, and the projects reflected that.  We saw things ranging from air hockey on an infinite board to multiplayer pacman, geospatial information systems, a system to RAID your data across multiple cloud storage providers (e.g., dropbox, google drive), and more.  Several teams implemented Paxos for state management;  others focused on two-phase commit for consistent update to different state; and a few stared deeply into their problem domains to come up with situation-specific solutions that tolerated weaker semantics.

I've been tempted to require Paxos or 2PC for this, but I really did like some of the more out-of-the-box projects that dodged these conventional approaches.  A few students commented, and I agree, that this is a tradeoff where I think we need better sets of "canned" projects for the students who don't want to do their own thing.  For next year, the "canned" project will be Paxos + something.

If you're considering doing this part of the class, just note that it's extremely staff intensive on both the professors and TAs.  We met with each project group several times for going over the proposal and the final version, plus devoted an in-class demo day (with Go Gophers and a fully-tricked-out Raspberry Pi to the projects voted best by their classmates).

What did this project contribute about Go?  The teams could choose any language they wanted for the projects.  About half to two-thirds of the projects ended up using Go for their final project.  The rest used mostly Python or Java.  The biggest reason students switched languages was the lack of good GUI tools for Go.  Several that stuck with Go created a client-server architecture with a Java or Python GUI front-end client.  This wasn't surprising for the team that developed a cloud+Android app, but it's a disappointing limitation when moving from purely server apps to GUI apps in Go.  But the number who stuck with it for the backend servers was very encouraging, despite many of the students being far more expert in other languages before the start of the semester.

But the best lesson from this was watching the number of students who implemented rich, intricate distributed systems algorithms - such as Paxos - compared to the amount of time we'd spent in the 2010 C++ course fighting with syntax and STL complexity without being able to get to some of the theoretical and algorithmic complexity.

The bottom line

"I believe learning and doing projects in Go was an immensely valuable experience."
I'm really glad we switched the course to Go, and I plan to keep it there.  In a year or two - probably with go1.1 and solid binary package support - it's going to be the best option for this class hands-down.  (We can have some arguments about Erlang in the comment thread;  I favored going with a c-like systems language because I'm more familiar with it and was less comfortable teaching in Erlang.)  It needs more progress on the GUI front;  I need a summer without a newborn baby to be able to get the projects to stability; and a bit more polish to the language and available documentation will probably seal the deal.

But if you can live with the binary package limitation and don't mind losing a few weeks of project productivity to letting the students ramp up in Go, I'd strongly encourage you to ... go for it.

Popular posts from this blog

  • Over the past three weeks, I've been running crypto-coin mining software on between 20 and 60 Amazon EC2 GPU instances at a (very small) profit, with the goal of understanding the currencies, exchanges, and algorithms in more detail.  This is post #1 of 2 discussing scrypt coin mining on Amazon.  In the follow-up post, I'll go into the algorithmic and engineering details of building a better CUDA-based miner. A few weeks ago, Gün and colleagues wrote a paper describing a new collusion attack against Bitcoin .  I hadn't thought much about BTC before that, but he got me curious.  As I explored, I discovered Litecoin ,  an alternative to Bitcoin that is designed to reduce the comparative advantage of using custom ASICs (or GPUs) for mining it relative to using a conventional CPU.  Its future is even less certain that BTC, of course, as a later comer, but the technically interesting bits lie in its proof-of-work hash function:   Scrypt .  Scrypt is designed to be "memory
    Keep reading
  • I woke up on May 28th, 2014, on vacation with my family in the middle of the desert, to find a copy of my private source code plastered across the bitcointalk message board.  Announced as a "new optimized version" of the Monero currency miner, it was enthusiastically adopted by cryptocurrency miners across the world.  And in the process of doing so, my daily profit from the Monero Mining Project dropped by over five thousand dollars per day. But let's start at the beginning, when I started getting in to a loose collaboration with three people I've never met---one whose name I'm not even confident I really know---with hundreds of thousands of dollars worth of a nebulous new cryptocurrency at stake. It started with a cryptic note from someone I'd met online, with a link to a bitcointalk.org message board discussion for a new currency called "bitmonero".  His note said only:  "this looks interesting." From prior collaborations with him
    Keep reading