22 Months of Voice-Based Assistants

Almost two years ago, I posted on Google+ that I'd purchased an Amazon Echo.  In that post, I wrote:
It's surprisingly cool.  But the best part is what it makes me want, because it also sucks.  It can't recognize my 2.6 y/o daughter's voice, for example -- even though she can say to it fairly clearly "Alexa, play Puff the Magic Dragon"  (it will, if I ask it).  What an awesome enabler for kids, if it worked.  A little dangerous, too, but hey. :-)  It can't turn my remote-enabled lights on or off for me.  It can't even send me an email to remind me about something.  Boo.
But - these are all current limitations.  Its speech recognition, albeit within a slightly narrow domain, is really solid.  It's happy with me, it's happy with my wife, and it's happy listening to us with the microwave running and a toddler running around.  The convenience is awesome.  I suddenly want to control more of my life by talking to it. But I can't. Yet.
It's two years later, and in that time:
  • My daughter is now four and a half, with much more capability of interacting with an assistant;
  • I've purchased a Google Home and run it head to head with the Amazon Echo;
  • Both the Home and the Echo have improved substantially in their capabilities.
At the start of 2015, I was excited about the potential of these things.  Now I'm excited about the reality.   When we moved to California in August 2015 for a year, we didn't bring our stereo - but we brought our Alexa to be our primary stereo system.  Now we're on vacation in Utah for two weeks over the winter, and we brought our Google Home with us.  (We were checking baggage anyway.  I'm enough of a geek that I also brought our home wifi with us so I didn't have to reprogram anything onto our AirBnB rental's wifi, using a cute little GR-AR150 mini router.  I wish I'd started doing that a long time ago.)

I'm excited enough about it that I've tied it into our fledgling home automation system to be able to control the lights.  Excited enough that instead of a CD player, we plan on buying our daughter an assistant device of her own to be able to stream music.  And excited enough that I set my 73 year old mom, to whom I am very wary of giving unwanted gifts, up with a Home and a Chromecast Audio for Christmas.

Looking Back at My Initial Thoughts

It can't recognize my daughter's voice

That's been fixed by the passage of time, but it's still one where I'd love both devices to improve.  For the Amazon Echo, the problem was its speech recognition -- far too often it would misunderstand her or fail to parse what she was saying.  For the Google Home, the kicker is the hotword detection ("O.K. Google", or "Hey, Google" -- my family favors "Hey, Google.").  There's something awkward in that set of sounds for a young child, and it doesn't trigger half the time for her.  On the bright side for the home, once she can trigger it, it almost always gets what she's saying.

For adults, though, the Home is a huge leap forward in the quality of intent recognition.  It's done a much better job of doing something reasonable.  ("Play jazz thanksgiving music", for example, generates a perfectly reasonable playlist on the home, and an uncomprehending shrug from the Echo).  I suspect that the Echo also got better over the years, but it wasn't the huge jump that the Home was, so I didn't notice it as much.

Echo Grade:  B
Google Home Grade:  B

It can't turn my remote-enabled lights on or off for me.

Done and done - both the Echo and the Home now have integration with both specific control mechanisms (including my SmartThings hub, which I use to control a bunch of z-wave switches), as well as the mega-popular IFTTT.

Other parts of the integration are just as cool.  I have a Chromecast Audio plugged into the stereo system, and we regularly command the Google Home:  "Hey, Google, play classical christmas music on the living room speakers."  And it just works.

Echo Grade:  A
Google Home Grade:  A

Hooray!

It can't even send me an email to remind me about something.

Partial progress, but with the experience of time, I now have more appreciation for why this needs more progress. You can rig up a system to accomplish this through IFTTT. But it's not smooth - partly because of IFTTT's limitations in not being able to have more than one copy of a skill defined. I've found this annoying on the Echo, and fairly not working on the Home.

I have sympathy for both Amazon and Google on this one, though, because the more you think about integrating a voice-based assistant into things like your email and calendar, the more pause you should be giving. I refuse to allow either one access to any personal account information - there's no authentication, and it's too easy to glitch or maliciously take advantage of. We need better ways of handling this in a trustworthy-but-seamless fashion, and I'm happy to give both teams a pass while they play with ideas.

Echo Grade: B-
Google Home Grade: C+

Looking Forward

Many of my gripes today relate to the lack of personalization. I can't use the Google Home to independently "find my phone" on my and my wife's phones, because the IFTTT recipes I wrote for it can only have one phone as a target. That's an IFTTT limitation, but c'mon - integrating with find my phone is so damn obvious. There's now a Trackr skill for doing this with the Echo, but I hadn't tried it at the time I sold mine. (We're now a strictly Google Home family.)

There are obvious ways that these things will improve in the next two years.

We'll see more hardware. Amazon's already released the cute tiny-form-factor Echo Dot. Google's "Assistant" software powers both the Home and the Pixel phone. There's some speculation out there that Apple could try to integrate Siri with their earbuds, which would be fantastically cool, especially if their AI hiring spree helps Siri go a few more rounds against Google's on-phone assistant software. One presumes all of these companies have more tricks up their sleeves.

They'll integrate with more services and be able to control more devices. While Amazon already ties in to a ton of services, Google has a ways to go.

They'll get continued incremental improvements in their speech recognition - hopefully for children's voices, too, if you ML folks are listening. I'm pretty underwhelmed by the Amazon knowledgebase for question answering, but I assume it will get marginally better, and I'm quite certain Google's will. (If I had to bet, of course, I'd wager that Amazon will continue to do a better job of playing well with others, and Google will do a better job of managing information -- both of those things seem pretty related to the DNA of each company).

But I have one bet, and hope:

The next leap forward will be voice personalization.

These assistants are amazing for families. Ours have become a regular part of our interaction with music, timers, random factoids, etc. But family implies multiple people, and being able to take advantage of the human context will be a huge next step in making these assistants even more smooth and capable. I want my device to recognize when my daughter is asking for something, and assume that she's less likely to ask for "Elzevir - Dragon Catcher" death metal and more likely to ask for Puff the Magic Dragon. And when I'm asking it to find my phone, it should find my phone. When my spouse asks about her phone, it should find her phone. "Remind me" should work similarly. There's so much potential there. All I want for Christmas next year is...

(Like what you're reading? Find more in the Archive...)

(obligatory disclaimer: I work part-time for Google and full-time for Carnegie Mellon University. This post, like everything on my blog, represents only my own opinion, and is not sanctioned, endorsed, paid for, wanted, smelled, or anything else to do with anyone who pays me or doesn't pay me. If you click some of those links, you might get taken to Amazon via my affiliate link, and I might make a penny.)

Comments

Popular posts from this blog

Reflecting on CS Graduate Admissions

Chili Crisp Showdown: Laoganma and Flybyjing

Two examples from the computer science review and publication process