Some AI-based systems could start “cheating”, with very concrete consequences for humanity.
As impressive as they are, many observers like Elon Musk agree that the technologies associated with artificial intelligence also entail considerable risks that must be anticipated today. This is also the conclusion of a chilling new research paper whose authors believe that this technology represents a real existential threat to humanity.
It is far from being the first time that we have seen this discourse re-emerge; even if this assertion rests on very serious bases, it is often accompanied by rather caricatural arguments, not to say completely fanciful.
But this time, the situation is very different. It starts with the identity of these whistleblowers. It’s not just a few cranks blowing air in the depths of a dark forum; these works are due to very serious researchers from reliable and prestigious institutions, namely Oxford University and DeepMindone of the world leaders in artificial intelligence.
Cadors, in short, who would not step up to the plate without a valid reason. And when they too begin to affirm that humanity has vastly underestimated the dangers of AI, better listen. Especially since they present technical arguments that seem more than convincing.
GANs, (too?) powerful programs
Their postulate is contained in one sentence which is also the title of their research paper: “ advanced artificial agents mediate the reward process “. To understand this tortuous assertion, we must begin by looking at the concept of Generative Adversarial Network, or GAN.
GANs are programs designed by engineer Ian Goodfellow. Very briefly, they function thanks to two relatively independent subroutines which oppose each other – hence the term ” adversarial “. On the one hand, we have a relatively standard neural network that learns over iterations.
On the other, there is a second network which supervises the training of the first. Much like a teacher, he reviews his friend’s findings to let him know if learning is progressing in the desired direction. If the results are satisfactory, the first network receives a ” reward which encourages him to persevere in the same direction. Otherwise, he inherits a reprimand which tells him that he followed the wrong lead.
It’s a concept that works terribly well, so much so that GANs are now used in a lot of areas. But a problem could arise as technology evolves, especially if this architecture is integrated with these famous “advanced artificial agents“.
This term designates a new class of still hypothetical algorithms. They would be significantly more advanced and more autonomous than current GANs. And above all, they would have much greater room for maneuver which would enable them to set their own goals – as long as it helps humans solve real-world problems”in environments where they do not have the source code“, ie the real world.
The researchers explain that motivating such a system with a reward system could have quite catastrophic consequences.
The key claim of the paper is in the title: Advanced Artificial Agents Intervene in the Provision of Reward. We further argue that AIs intervening in the provision of their rewards would have consequences that are very bad. 2/15
—Michael Cohen (@Michael05156007) September 6, 2022
What if AIs cheat?
This model could indeed push the AI to develop a strategy that would allow it to “ to intervene in the reward process », as the title of the paper explains. In other words, these algorithms could start ” cheat » by excessively optimizing the process that allows him to obtain « rewards »… even if it means leaving humans on the floor.
Indeed, since this approach is supposed to tell the AI in which direction to progress, it assumes that any action that leads to a reward is fundamentally beneficial. In essence, the program would behave much like a puppy being trained that swipes kibble directly from the bag or bites its master’s hand rather than responding to its commands to earn its reward; if this behavior is not handled immediately, it can escalate quite quickly.
And what makes this paper both disturbing and very interesting is that it’s not about killer robots or other fanciful predictions modeled on science fiction; the disaster scenario proposed by the researchers is based on a very concrete problem, namely the quantity of resources available on our planet.
The authors imagine a kind of great zero-sum game, with on the one hand, a humanity that needs to sustain itself, and on the other, a program that would use all the resources at its disposal without the slightest consideration, simply to obtain these famous rewards.
Imagine, for example, a medical AI designed to diagnose serious pathologies. In such a scenario, the program might find a way to ” cheat to get his reward, even if he offers a wrong diagnosis. He would then no longer have the slightest interest in identifying diseases correctly.
A different approach to human-machine competition
Instead, he would be content to produce completely fake results in industrial quantities just to get his shot. Even if it means completely turning away from its initial objective and appropriating all the electricity available on the network.
And this is only the tip of a gigantic iceberg. ” In a world with finite resources, there will inevitably be competition for resources “Explains Michael Cohen, lead author of the study, in an interview with Motherboard. ” And if you’re competing with something that’s capable of getting ahead at every turn, you shouldn’t expect to win. “, he hammers.
Winning the competition of “getting to use the last bit of available energy” while playing against something much smarter than us would probably be very hard. Losing would be fatal. 12/15
—Michael Cohen (@Michael05156007) September 6, 2022
” And losing this game would be fatal “, he insists. With his team, he therefore came to the conclusion that the annihilation of humanity by an AI is no longer just ” possible »; it is now ” likely if AI research continues at the current pace.
And this is where the shoe pinches. Because this technology is a great tool that is already doing wonders in lots of areas. And this is probably just the beginning; AI in the broad sense still has immense potential, the full extent of which we have probably not yet grasped. Today, AI is unquestionably an added value for humanity, and there is therefore a real interest in pushing this work as far as possible.
The precautionary principle must have the last word
But it also means that we could get closer and closer to this scenario which smacks of dystopia. Of course, it must be remembered thatthey stay for now hypothetical and rather abstract. But the researchers therefore insist on the importance of keeping control over this work; he believes that it would be pointless to give in to the temptation of unbridled research, knowing that we are still very far from having explored all the possibilities of current technologies.
” Given our current understanding, it’s not something that would be worth developing unless you do some serious work to figure out how to control them. concludes Cohen.
Without giving in to catastrophism however, these works are a reminder that it will be necessary be extra careful at every major stage of AI research, and even more so when it comes to entrusting critical systems to GANs.
In the end, those looking for a moral to this story may be able to base themselves on the conclusion of the excellent Wargames; this anticipation film released in 1983 and still relevant today treats this theme admirably well. And as the WOPR says so well in the final scene, the only way to win this strange game may be… to simply refrain from playing.
The text of the study is available here.