Be Wary Of Simulations

An old (well, relatively speaking – from March this year) video has recently resurfaced on the Twitterz describing how researchers are (were) using virtual worlds to train Deep Learning systems for possible use in autonomous vehicles:

It reminded me of a demo by Karl Sims at a From Animals to Animats conference years ago in which he’d evolved creatures in a 3D world to perform various forms of movement:

One thing I remember, but not shown in the video above, related to one creature being evolved to jump as high as it could. Apparently, it found a flaw in the simulated physics of the world within which the critters were being evolved that meant it could jump to infinity…

In turn, Sims critters reminded me of a parable about neural networks getting image recognition wrong*, retold here: Detecting Tanks. In trying to track down the origins of that story, references are made to this November 1993 Fort Carson RSTA Data Collection Final Report. In passing, I note that the report (on collecting visual scene information to train systems to detect military vehicles in natural settings) refers to a Surrogate Semiautonomous Vehicle (SSV) Program; which in turn makes me think: how many fits and starts has autonomous vehicle research gone through prior to it’s current incarnation?

* In turn, this reminds me of another possibly apocryphal story – of a robot trained to run a maze being demoed for some important event. The robot ran the maze fine, but then the maze was moved to another part of the lab for the Big Important Demo. At which point, the robot messed up completely: rather than learning the maze, the robot had trained its escape based on things it could see in the lab – such as the windows – that were outside the maze. The problem with training machines is you’re never quite sure what they’re focussing on…

PS via Pete Mitton, another great simulation snafu story: the tale of the kangaroos. Anyone got any more?:-)


  1. deepanalytics

    I am shocked, shocked that there is gambling going on here….

    This problem of feature/response correlation is present in all modeling/characterization/training problems, not just NN-based machine learning. It is also present in biological systems: I can train a dog to roll over by presenting positive feedback like a treat. Human beings too: organizations subject their employees to positive/negative reinforcement as well. Can be as a performance management policy or as a social group.

    And history is full of stories about failure when conditioned for the wrong ‘feature’ in the model.

    • Tony Hirst

      @deepanalytics Agreed that the problem is more general than just training machines and machines with a particular architecture at that. But if we are to start mounting a resistance, we need anecdotes that people will remember. And taking lessons from recent political events, it may be that we need anecdotes – like the tanks – that may be “truthy” rather than true. (As to the ethics and academic legitmacy of such an approach – discuss!;-)

      • deepanalytics

        Indeed, good anecdotes are a good quest.

        The progression I personally have gone through, particularly in light of the changing media landscape and political events of the past decade, is that the past is a great prognostinator until it isn’t. The black swan event, popularized by Nassim Taleb, is a great recent example of collecting failure anecdotes, but human history is rife with them.

        That brings me to the understanding that the physics and chemistry communities have about theoretical models: you want the simplest model to explain a behavior, and nothing more. The subtlety is that this means that predictive models are not trusted until they have demonstrated to work in a small subset of the world, and it is KNOWN when they are not to be used when you are outside of that validated operating environment.

        • Tony Hirst

          Right – which is where expertise comes in: being apply to recognise situations as examples of situations you know how to deal with… One of the problems with trying to use academic results is that it can be too easy to skip over the assumptions, or caveats about when not to apply a particular model / circumstances in which it is unlikely to apply.

          • deepanalytics

            I think it goes much deeper. The experience is that ALL models are approximations only valid for a small subset of inputs. THUS any consumer MUST defend against bad predictions. That means that any errors in the feedback loop need to be attenuated, they cannot be left unconstrained.

            That implies that any ethical autonomous system based on a predictive model must have a PROVEN stability criterion. Since that is an open research question, good luck with preventing future failures that will cost lives.

  2. dougclow

    As another example, there’s Microsoft’s Tay chat bot, which produced a stream of inflammatory and racist messages when released on Twitter, widely presumed to be a result of trying to learn how to Tweet from what it read on Twitter.

    There was also AlphaGo getting stumped by Lee Sedol making a stupid (or at least, extremely unusual and unexpected) move in their fourth game, because the move was outside its expected decision space so it struggled to focus its calculations appropriately. But that didn’t stop AlphaGo winning the series 4-1.