Teaching Distributed Systems in Go

It's been a year and a half since I migrated my undergrad distributed systems course (15-440) to the Go programming language.  I'm still glad I did it and will keep it there, but we've also learned some important lessons about how to best use it in our context that I hope I'll be able to improve upon next semester.   Hopefully, some of this may be useful for those of you considering moving to or creating a Go-based class.

The big points I hope to get across are:  (1)  Go is really good for distributed systems;  (2)  Students need more intro to the language than I thought;  (3)  The language has matured a lot, but there are some technical glitches remaining that you'll have to work around.

Go is good for teaching (and building) distributed systems

I believe this for two reasons:
  1. Go provides a good mix of high-level and low-level functionality.
  2. Channels and goroutines provide a program structure that works well for distributed programming.
The importance of high-level functionality:  A problem I encountered often in the C++ version of my course was students struggling with the STL or other libraries for data structures and algorithms.  My learning goals for 440 do not include "implement another hash table":  I want the students to take such options for granted and think more about the selection of structures and how they affect the bigger structure and communication of their program.  Native maps and lists are key.

So, it turns out, are web servers and things like JSON:  We strongly want the students to understand the idea of marshaling and RPC, but we wanted to have a maximally easy-to-debug wire format to keep things simple.  While I'm a fan of binary encodings when performance matters, for teaching, the readability of JSON trumps.  Being able to trivially instantiate a web server makes it easy to set up RPC in a way that's familiar to the students, and gives an easy-to-use interface.

The importance of low-level access:  At the same time, we did want the students to have to reason about issues such as buffering and queues.  In the first assignment, the students implement a library that does reliable, stop-and-wait client-to-server communication on top of UDP (and then implement a distributed password cracker using it).  We want them familiar with the sockets interface in all of its glory and ugly, and Go exposes it naturally.

Distributed thinking for distributed programming:  This is the big one:  It turns out that structuring programs internally as little distributed systems provides a good match for implementing distributed systems.  As an example, the figure to the right shows the suggested architecture of the project 1 system:  We structure the LSP library as three communicating goroutines.  The first handles the network:  All parsing and UDP is handled there.  It pushes well-defined events into a channel to the event handler.  The epoch handler similarly handles all timing-related issues, and ... pushes well-defined events to the event handler.  The event handler consists of a very simple go select loop:

                      select {
                      case netd := <-srv.netInChan:
                              srv.handleNetMessage(netd)
                      case appm := <-srv.appWriteChan:
                              srv.handleAppWrite(appm)
                      case <- srv.epochChan:
                              srv.handleEpoch()
                      }

(with a bit more after to handle responses).  The beauty of this is that it's a single select loop that both handles and serializes several different "types" of events - messages from the network, library function calls from the application, and timer events.  It's much more difficult to write and integrate a library in C that has its own threads doing things (or its own main loop).  It makes it very easy to reason about the "core" functionality of the program.  You can give a simpler assignment that uses the exact same structure to implement an echo client/server to help the students build up to this structure.

From a 2011 student:
"The theoretical concepts were very interesting and once I got used to Go, I was very happy that the class wasn't in C++ or C. I definitely see the advantages to Go, and I think that it was a very good idea to use it for the class. I feel a lot more comfortable having implemented a bunch of distributed concepts myself and having the code actually work as opposed to messing around for days with memory issues and arcane syntax."

Students need a lot of intro to Go

2011 comments:
"The professors decided to use Go, which is an extremely new and experimental language. There is very little clear documentation for it which forced time to be sunk into attempting to learn the language rather than trying to learn the concepts of the course."

"Go language was pretty interesting; definitely better choice than C++ but I would also suggest Java or wait till Go has better development tools (debugger, IDE, etc). I spent a lot of time getting the code compiled and then fixing small runtime issues which was quite annoying using printf statements since stacktrace is unreadable most of the time."

Go went a bit better in 2012 than it did in 2011, for a few reasons:  The documentation matured, the language stabilized, and the "tour of go" tutorial was written.  I sent out two reminder emails to the entire class over the summer encouraging them to go through the tour before the semester started.  We also taught one lecture introducing students to the language, and both Randy and I tended to use Go for all of the pseudocode we went through in class.  However:
"Shotgun-debugging Go while balancing other courses and first project was a disaster. All the Go tutorial sessions in the world didn't help manage the time it took to actually use Go for first project. Great for lecture pseudocode though. Also setting up a Go environment on a new machine is a nightmare. Compilation in general is a huge, unintuitive mess."
"It would be really helpful to just have a small assignment that is like a go tutorial before the first major project, so [people] like me don't try to take on P1 and learning Go at the same time."
The improvements weren't enough.  But the complaints changed.  It's clear that for 2013, we need a preliminary assignment that builds into writing a simple networked server using sockets, channels, and goroutines.  This shouldn't be hard, but it needs to be done.

Underlying some of the complaints in 2012 was a current, serious limitation in Go for instructional purposes:

You can't hand out precompiled library code and use the go build tools

This was one of the most serious limitations we encountered.  We tried several (hacky) approaches and none worked reliably enough for the class environment (we need to try the mtime hack again on AFS).  This meant that we had no way to hand out binary reference code for the students to test against.  Nor could we hand out reference implementations for students who couldn't get part of the project working.  This is also a serious limitation:  The projects in 15-440 are designed in a few parts, but the parts are cumulative by design, in order to help students learn about API and layering design and having to use their own library/server code a few weeks later.  We didn't want it to be all-or-nothing.

Our solution was that we handed out our full codebase for students to look at or use (losing a few points for using our version).  Which means that we can't reuse the project again next year.

Plan in advance for not being able to hand out libraries.

This gets us to the third problem, which has nothing to do with Go:

Don't underestimate the difficulty of writing good tests for undergrad distributed systems assignments.

The biggest complaint by far about the course -- and a completely valid one -- was that I botched handing out tests for my part of the projects in time.  I got hammered for this in the course feedback (Variations of "the most poorly-organized course I've ever taken" came up at least twice).  This has an unpleasant interaction with handing out our codebase:  We had redone the projects in full for 2012 to improve them and avoid potential cheating issues, but the time this consumed killed the course staff, and resulted in a poor experience for the students.  Ordinarily, this would be amortized across reusing the project a few times and letting the infrastructure mature.  We're now in the process of trying to create more general project infrastructure that we can reuse while still changing the projects themselves sufficiently.

My friends in industry are probably chuckling about this complaint - designing & testing distributed systems is hard - but it's a bit more rare to be whacked in the face with it this hard in academia.

Lessons from Project 3

The third project was a "design your own" team project in teams of two.  Students proposed and implemented a design of their choice.  The project rules boiled down to:
  1. Must have a core distributed systems challenge of some kind of distributed state management, synchronization, replication.  Must operate reliably under the failure of processes or nodes, etc.
  2. Must do something interesting.  Bonus points for nice UI and being cool.
This is by far and away my favorite part of the class.  In general, the students at Carnegie Mellon are both talented and creative, and the projects reflected that.  We saw things ranging from air hockey on an infinite board to multiplayer pacman, geospatial information systems, a system to RAID your data across multiple cloud storage providers (e.g., dropbox, google drive), and more.  Several teams implemented Paxos for state management;  others focused on two-phase commit for consistent update to different state; and a few stared deeply into their problem domains to come up with situation-specific solutions that tolerated weaker semantics.

I've been tempted to require Paxos or 2PC for this, but I really did like some of the more out-of-the-box projects that dodged these conventional approaches.  A few students commented, and I agree, that this is a tradeoff where I think we need better sets of "canned" projects for the students who don't want to do their own thing.  For next year, the "canned" project will be Paxos + something.

If you're considering doing this part of the class, just note that it's extremely staff intensive on both the professors and TAs.  We met with each project group several times for going over the proposal and the final version, plus devoted an in-class demo day (with Go Gophers and a fully-tricked-out Raspberry Pi to the projects voted best by their classmates).

What did this project contribute about Go?  The teams could choose any language they wanted for the projects.  About half to two-thirds of the projects ended up using Go for their final project.  The rest used mostly Python or Java.  The biggest reason students switched languages was the lack of good GUI tools for Go.  Several that stuck with Go created a client-server architecture with a Java or Python GUI front-end client.  This wasn't surprising for the team that developed a cloud+Android app, but it's a disappointing limitation when moving from purely server apps to GUI apps in Go.  But the number who stuck with it for the backend servers was very encouraging, despite many of the students being far more expert in other languages before the start of the semester.

But the best lesson from this was watching the number of students who implemented rich, intricate distributed systems algorithms - such as Paxos - compared to the amount of time we'd spent in the 2010 C++ course fighting with syntax and STL complexity without being able to get to some of the theoretical and algorithmic complexity.

The bottom line

"I believe learning and doing projects in Go was an immensely valuable experience."
I'm really glad we switched the course to Go, and I plan to keep it there.  In a year or two - probably with go1.1 and solid binary package support - it's going to be the best option for this class hands-down.  (We can have some arguments about Erlang in the comment thread;  I favored going with a c-like systems language because I'm more familiar with it and was less comfortable teaching in Erlang.)  It needs more progress on the GUI front;  I need a summer without a newborn baby to be able to get the projects to stability; and a bit more polish to the language and available documentation will probably seal the deal.

But if you can live with the binary package limitation and don't mind losing a few weeks of project productivity to letting the students ramp up in Go, I'd strongly encourage you to ... go for it.

Comments

  1. For handing out reference implementations for testing... you could always have a package that communicates with a local executable. The executable provides the functionality via a local webserver (or whatever), the package abstracts away the functionality to conform to the proper API, and so it appears to the caller to be calling local code.

    ReplyDelete
  2. Regarding precompiled code: http://code.google.com/p/go/issues/detail?id=2775

    ReplyDelete
  3. 15-440 last semester was my favorite course I've taken at CMU so far. I really enjoyed learning Go, and I agree that it is a great language to teach distributed systems. I'm shocked that the Spring version of the course is still being taught in Java... Go makes RPCs, JSON marshaling, and writing web servers so much easier.

    A couple of things about Go which I think should be covered a little more next semester:

    (1) The difference between values, pointers, and references in Go. It's tough to wrap your head around whats going on without someone explicitly saying it. Arrays are value. When you pass an array to a function, the function receives a copy of the array. Slices, maps, and channels are references. When you pass a slice, map, or channel to a function, the function receives a copy of the reference, and any modifications made to the underlying data structure will persist. Go pointers behave similarly... any changes you make to an interface value won't persist unless you if pass it as a pointer. I realized all of this a couple weeks after taking the final exam... it would have been nice to understand it a bit more as I took the class. I would also recommend clarifying that returning pointers to local variables inside of a function is perfectly acceptable in Go (especially if most of the students come from a C/C++ background).

    (2) Go packages, dependencies, and compilation. At some point, I looked this up online and forced myself to understand it. I think in general most people didn't understand the setup and/or environmental variables which needed to be set in order to get things working properly.

    (3) String literal tags for struct fields. Why didn't we learn about these! Just when I thought marshaling couldn't get any easier. :P

    Also, I think that the complaints about tests stem more from the general frustration in not being able to pass some of them. When an AutoLab test fails, it's all or nothing and you get little to no feedback as to what went wrong. There were many cases where I (and many others, from what I've heard) got stuck on a single test for hours/days due to a very simple (but specific) bug in my code. It would have been nice if each test described its purpose (i.e. what does it test, how does it test it, is the test expected to run in a short or long amount of time, etc.).

    TL;DR - I really enjoyed the class. :)

    ReplyDelete
  4. I suppose most students just haven't looked for IDE hard enough. I can name at least Liteide, GoWorks (windows only) which have debugging support. After all gdb supports debugging go binaries. If they just need auto-completion and syntax highlight SublimeText2 + gocode is good enough I think.

    For binary RPC you can use building RPC package which doesn't require you to use full-fledged http server with it you can use just tcp connection and it will send binary gobs. Only drawback that you need to have Go client on the other side :)

    For tests you can provide your own servers and clients with some predefined interface and which could be used to test student's code.

    So there are solutions for almost all "problems". But any way I wish you good luck. I'd attend your course with pleasure if I were American and student of your university :)

    ReplyDelete
  5. Very interesting to read about your experience Dave. I've also been using Go for my distributed systems course this semester, and although I've not collected very much written feedback yet, I find that most students have really enjoyed it. Except they think the workload is too high ;-) Anyway, in our project we require the students to implement Paxos, going through several steps (first a failure detector and leader election, then single instance Paxos, multi-Paxos, and finally building an RSM.) In each step, there are several bonus assignments.

    Thanks for sharing your experiences.
    :) Hein

    ReplyDelete

Post a Comment

Popular posts from this blog

Reflecting on CS Graduate Admissions

Chili Crisp Showdown: Laoganma and Flybyjing

Moving the Pi Searcher from Go to Rust