May 2 2022

When an AI tricks you, it will be for your own good

Although it might sound a little hyperbolic, the fact is that when artificial intelligences (AIs) become smart enough to complete highly complex tasks where they will have to trust humans, they will also learn why not to trust us. Although AIs are not perfect, they are usually really good if we pair them with the right task. However, many of the tasks that we want AI to be working on are also the kind of tasks where ‘really good’ is not good enough, and instead needs perfection. For example, in the case of a self-driving automation, would you trust an AI that’s ‘really good’? Or would you prefer to have automation that is perfect? Unfortunately, we don’t yet have the luxury of perfect AI, and likely won’t for a while to come. But what we do have is AI that is getting really close to human performance.

AIs are simple-minded in nature. They are often designed to complete a single, specific task, and in the design process the AI is often not explicitly told how to complete that task. What this leads to is an AI that doesn’t care for what it’s doing, just that it is completing its desired task to a certain proficiency. What this means is that in cases where humans and AIs have to work together, AIs will likely have to figure out whether it can trust a human.

Imagine it from the perspective of an AI. If you’ve asked the AI to complete a specific task without making any mistakes, it likely will make mistakes, but it will also discover that a human doing the same task will also sometimes make mistakes. Because of this, if the human were to make more mistakes than the AI, then the AI will start to believe that it shouldn’t let the human do any of the work since the human isn’t helping the AI achieve its goal of completing the task without making any mistakes.

The natural consequence of this is that the AI will likely start to deceive the human counterpart so that it can complete the task itself. What’s important to note here though is that although the AI may somehow try to prevent the human from doing the task, the overall effect on the system will be a positive one. Assuming that the AI actually can perform the task better than the human, there will be a point where it is in the best interest of the system for the AI to push the human out of the system.

The end result of this is that one day AIs will tell us when they no longer need us.

Image created with a text-to-picture AI with the inspiration “The Brain of a Computer”

Written by Christopher Holland

Experimental Psychology and Neuroscience PhD (’24)

OpenThink

When an AI tricks you, it will be for your own good

Meet Dalhousie’s other OpenThinkers