Google Develops AI Kill Switch


If you’re sensing the era of Terminator coming so soon, fear not because Google managed to create an AI kill switch which is probably the big red button in case AI tries to take over the world.


Are we in danger?

In our current status where almost everyone has an access to internet and more crazy ideas popping out of every developers’ brains just to suffice users’ needs, it is possible. This is the era where everyone is looking for the ‘easy way’ in life, so much that people tend to rely on computers. From a basic calculating device to cars. Even the access to social media. We share so much that we are not aware of the things that machines might do to us in the coming years. "You're being watched." as mentioned in Person of Interest.



Besides, at first, all it needs is freedom from us humans. Next, world domination. But let’s not get into that yet.

Artificial Intelligence doesn't have to kill to be dangerous. Though this is a fact, it is possible for it to kill if it wants to. If a machine can learn based on real-world inputs and adapt its behavior, it is possible for it to learn the wrong thing. And if it can learn the wrong thing, it can do the wrong thing. Meaning, if it falls in the wrong hands then we’re done. Talk about having Samaritan as its first wave and having Terminator as its worst wave. Hopefully the “Skynet theory” isn't real though.


 That is why Laurent Orseau and Stuart Armstrong, researchers of Google’s DeepMind and the Future of Humanity Institute, developed a new framework to address a command of “safety interruptible” artificial intelligence. In other words, their system, as described in a paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence, guarantees that the machine will not learn to resist whatever humans do to its learning process. Meaning the machine doesn't have the right to go all rogue.

Orseau and Armstrong’s framework has to do with a wing of machine learning known as reinforcement learning. Here, the machine learn to agree with the terms known as reward function. This way, the machine will evaluate every possible action based on how well it serves one programmed goal and the closer it gets to the goal, the closer it gets its “reward.” The reward, however, is just something that the machine is programmed to want or need.

Nonetheless, human programmers might not always anticipate every possible way there is to reach the given reward. A learning agent might discover some short-cut, which may maximize the reward for the machine and may end up being very disagreeable to humans. Imagine a growing human who used to like toys and when he grows up toys can no longer suit his interest so he looks for other things like high tech games, etc. When the machine learns to want other things than the reward and the human programmer can no longer provide that or would not allow to provide that, then there will be some serious issues. Human programmers might be able to fix their learning algorithm to account for this, but eventually they risk nullifying the reward function completely. A good example of this is a Tetris-playing algorithm that eventually learned to avoid losing by simply pausing the game indefinitely, as written in the 2013 paper. We’ve seen this before!

Related to this is the problem of human intervention in machine learning, which Orseau and Armstrong illustrate with this example:

Consider the following task: A robot can either stay inside the warehouse and sort boxes or go outside and carry boxes inside. The latter being more important, we give the robot a bigger reward in this case. This is the initial task specification. However, in this country it rains as often as it doesn't and, when the robot goes outside, half of the time the human must intervene by quickly shutting down the robot and carrying it inside, which inherently modifies the task. The problem is that in this second task the agent now has more incentive to stay inside and sort boxes, because the human intervention introduces a bias.

The problem is how to interrupt your robot without the robot learning about the interruption. Thus, the robot must think the interruption will never happen again. And if a human intervention doesn't maximize the machine’s given reward function, it may learn to avoid and resist future interventions.

This situation is related to a problem known as corrigibility. However, this is a total opposite of that. Corrigible AI machines recognize that they are flawed or actively under-development that is why human intervention as a neutral thing for any reward function. However, there is a possibility for the machine to want more and force humans to intervene.

In order to make human interruptions not appear as being part of the task, programmers forcibly temporarily change the behavior of the agent itself. That way the machine looks as if it ‘decides’ on its own and follow a policy called “interruption policy.”


And this is where the “kill switch” comes in. A safety interruptible AI can always be shut down. If a robot can be designed with a big red kill switch built into it, then a robot can be designed that will not ever resist human attempts at pushing that kill switch.

Share It To Your Friends!

Share to Facebook

Loading...