Walking the Walk

Published in

Sourcerer Blog

5 min readOct 11, 2017

This video depicts a segmented humanoid figure navigating a somewhat ungainly path through a virtual environment. If it’s new to you, I’d forgive you for thinking that it also shows what happens when a segmented virtual humanoid enjoys one or two simulated cocktails too many.

But of course, you’d be wrong.

In fact, the model is trotting to the tune of some modern deep reinforcement learning algorithms that imbue it with the ability to learn how to move and navigate in rich environments. It’s not a rigged armature, nor has it snacked on a library of recorded motion capture data. What you see the agent doing — and doing it pretty successfully — is what it learned to do all by itself.

Developed by researchers at DeepMind, the same folks that built AlphaGo (and who were unsurprisingly acquired by Google in 2014), this work extends a tradition of machine learning experimentation in virtual environments — and this includes games. Demis Hassabis, DeepMind’s founder, set up Elixir Studios earlier in his career (anyone remember Evil Genius, or Republic: The Revolution?), so it’s natural to see him develop these concepts further, but on a much larger canvas. What, with him being super smart (check him out), his published games, and the Google acquisition, I swear I’m soon going to have to start start calling him the Do No Evil Genius™.

Reinforcement learning is widely recognized as an extremely challenging branch of machine learning. It’s governs how an agent takes actions in an environment to maximize some long-term reward signal. Unlike other forms of machine learning, there’s no training set — the agent is never explicitly presented with a sample of the correct inputs and outputs. Instead, an agent wanders along a knife-edge between exploitation and exploration. It exploits behaviors that is has learned deliver a high reward signal. It explores uncharted territory in the hope of finding even higher rewards.

Reinforcement learning strategies can be very sensitive to the reward function. As you manipulate which behaviors get rewarded, so an agent’s optimal strategy changes. Reward engineering, as it’s called, has some known related pitfalls — for example, overly complex reward functions can be brittle and lead to unexpected results if adjusted even slightly.

In their pursuit of robust and generalized behaviors, the DeepMind researchers opted for simple reward functions — based on the premise that “rich and robust behaviors will emerge from simple reward functions, if the environment itself contains sufficient richness and diversity” [1]. DeepMind’s humanoid agent was rewarded for making progress along the track — and can’t be too careful — for not falling down. It was penalized when it deviated too far from the center of the track, and when its movements generated excessive torques in its joints (… thus avoiding virtual rotator cuff injuries).

I don’t know about you, but this kind of experimentation leaves me totally stoked — there’s a lot of beauty in it — and color me sentimental, but it’s so easy to anthropomorphize that isolated, articulated running figure. I’d cautiously offer that to me, it’s not a million miles away from a child taking its first steps. Whether you agree or not, I’m hopeful that before too long, experimentation like this will be well within the means of those of us without access to Googleplex-scale resources. But there’s a bar or two to get over first.

First, there’s the compute resource bar. DeepMind’s agent was trained in parallel — many many instances of the agent and its environment were run simultaneously across many many machines. Data collection and gradient calculation (… gradient calculation, if you’re unaware, is the meat and potatoes of machine learning) were distributed across many workers, and the results aggregated synchronously. Most of us aren’t able to (or can’t afford to) spin up such architectures on a whim, although recent moves by both Amazon and Google to offer per-second-billing for their cloud products are an optimistic indicator, perhaps, of more such things to come.

“Doctor Smith, I have some bad news. That last optimization run? We just spent next year’s operating budget.”

And there’s no getting away from the fact that if you want to get started, there’s a pretty hefty entry fee. Sure, a Ph.D in the right discipline is a good start. But it’s also possible that an experienced engineer might have accumulated much of the right stuff without even realizing it. This said — and you can probably Google better than me — the list starts …

Linear algebra…multivariate calculus…programming experience…statistics and probability…vector calculus…data analysis…distributed systems…machine learning basics…algorithms…

To be honest, that rap sheet looks as if it’s already long and getting longer. And we’ve only just started.

Does it sound like you? Do you know if it sounds like you? Would you like to know if it sounds like you? At sourcerer we’re building the capability for you (and those you allow) to interpret your professional story as it’s articulated in the body of your work. A ‘profile’ that functions like a hash of your code, that digs deep, analyzes your technologies, habits and preferences to show the world who you are as a software engineer. We’ll be able to tell you exactly how much gas you have in the tank, and because you don’t have to be perfect — you only have to be better than the competition — we’ll also be able to tell you how you stack up against the rest.

Even if you did drop out of that Ph.D. program, there’s no need to let go of your dream. Your accomplishments have already been writ large, and perhaps now is the time for their unfolding story to play its part in helping you get to that next elusive level. Maybe even at DeepMind.

Wouldn’t that be nice? Cocktails anyone?

References

[1] Emergence of Locomotion Behaviours in Rich Environments. Nicholas Hees, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver — DeepMind

https://arxiv.org/pdf/1707.02286.pdf

Walking the Walk

Written by Sourcerer Bot