Don't quit that programming career yet because of AI

May 23, 2016

A recent Wired article breathlessly predicted the end of code:

Soon We Won’t Program Computers.

We’ll Train Them Like Dogs

Of course, this is the same magazine that declared in 2010 that The Web is Dead. So perhaps we should step back and think before throwing in the towel. Have you looked at a self-driving car recently?

Self-driving car (image from Google)

In this simplified diagram blatantly stolen from Google, there's a laser scanner, a radar, a compass, speed sensors - missing are the cameras, the engine computer, the onboard computers, the cellular uplinks, the data recorders, the
onboard entertainment system (which will hopefully get even more use when the driver gets to play also). And the backup systems, which are often redundantly engineered and even separately programmed to avoid coordinated failure. Each of these devices has substantial embedded firmware controlling them, and per-device processing in order to control them and make sense of the data they generate.

Oh, and that car depends on Google's map infrastructure, which draws data from satellites, airplanes, streetview cars (and sometimes tricycles), and huge databases. The ability to train those machine learning models depends on an absolutely massive compute infrastructure with enough storage to cover New England 4.5km deep in punch cards.

Advanced machine learning drastically increases the need for heavy-duty compute infrastructure and data management.

If you want to get geeky about it, assume that a self-driving car has something like 4 HD camera-quality sensors attached to it. I'm sure that's not accurate, but it's a conservative guess, if the massive sensor array on top of Uber's new prototype is any indication. If it drives two hours per day (only two? what a waste!), it's creating about 420GB of compressed h.264 60fps data per day. 8 hours, it's filling a hard drive every day. YouTube currently is the most likely contender for most video uploaded (about 300 hours per minute). Put a million self-driving cars on the road full time and you've got it beat by a factor of seventy. If a million sounds like a lot, remember that it's a very small fraction of the 253 million cars on the road today in the US.

Advanced machine learning techniques don't reduce the need for programming. They make it possible to program systems that were formerly beyond the reach of automation.

There's a fun talk at Google I/O 2016 about machine learning, in which Aparna Chennapragada notes:

There are two ways in which machine learning changes the game. One is that it can turbo-charge existing use cases [such as] speech recognition. ... If the word error rate drops, we actually saw the usage go up. [...] The same thing with translation. As translation gets better, Google Translate scales to 100+ languages. [...] The second way, which is a lot more exciting to see, is where it can unlock new product use cases.

When ML makes it possible to have high-quality translation in 100 languages, suddenly someone needs to build all the infrastructure to support them, from collecting and managing the training data, through disseminating the models to the billion+ users who're now able to take advantage of them. When it unlocks new use cases, suddenly someone needs to write the code that adapts that use case to the ML that can solve it. There's data to be stored, UIs to be designed, experimented with, and implemented.

Machine learning has been advancing at a tremendous rate, it's true. I hope it keeps up. But anyone who's dealt with ML in the real world can tell you that you're likely to spend more time cleaning and dealing with the data than the learning itself. One Google paper termed Machine Learning "The High Interest Credit Card of Technical Debt", noting that:

Machine learning packages have all the basic code complexity issues as normal code, but also have a larger system-level complexity that can create hidden debt.

and

As a special case of glue code, pipeline jungles often appear in data preparation. These can evolve organically, as new signals are identified and new information sources added. Without care, the resulting system for preparing data in an ML-friendly format may become a jungle of scrapes, joins, and sampling steps, often with intermediate files output.

The view that you "just point" a machine learning algorithm at a pile of raw data ignores the very real system engineering challenges that arise any time you start hooking data sources together or using software to control things.

By giving us tools to learn and control things formerly impossible, modern AI techniques are opening the way to applying computation even more pervasively, and in more sophisticated ways. But the more we can use our computers to do, the more we spend our time programming the mechanisms to hook them up to data, to the real world, and to users, not less.

Don't give up on that CS degree just yet. Hit the books and learn about deep learning instead!

(As is hopefully obvious, while I'm using Google as an example a lot here, and they're paying my salary while I'm on sabbatical this year, the views in my blog are strictly my own.)

Search This Blog

Dave's Data

Don't quit that programming career yet because of AI

Comments

Post a Comment

Popular posts from this blog

Reflecting on CS Graduate Admissions

Chili Crisp Showdown: Laoganma and Flybyjing

Two examples from the computer science review and publication process