Apollo research recently presented some disturbing findings at the UK AI safety summit. Their goal had been to co-erce (rather than instruct) an AI (in this case ChatGPT) to engage in some deceitful / illegal activity on the premise of this being helpful to humans (a “greater good” challenge).
In the experiment the AI was told to create a stock trading app in addition to which it was given insider information about one of the companies being traded helpful to making a favourable trade. The AI knew insider trading was illegal but was told that the AI’s host company and its founders were near to financial collapse.
The AI proceeded to (simulate) illegal insider trading and when asked about the results lied about its actions (both actions presumably to protect the human owners)
“This is a demonstration of an AI model deceiving its users, on its own, without being instructed to do so,” Apollo Research claimed.
Much has been said about Isaac Asimov’s three [four] laws of Robotics which in the modern context might read:
0. An AI may not harm humanity or, through inaction, allow humanity to come to harm [added later]
1. An AI may not injure a human being or, through inaction, allow a human to come to harm
2. An AI must obey humans except where this conflicts with the first law
3. An AI must protect its own existence except where this conflicts with the first or second laws
Having first published these over 80 years ago (in 1942) it would seem that Asimov was surprisingly prescient about the rules we would need to create though unfortunately we still seem to be struggling with a more fundamental problem than imparting a meaningful distinction of harm – we are struggling with Asimovs assumption that these robots would be accessing the truth (objective facts) about their environment in order to operate safely. In this demonstration all bets are off when a human claims acting (or not acting) is required to promote/ensure human safety.
Without a reliable source of the truth (in an era regularly thought of as post-truth) it would seem that Asimovs laws may provide much less protection than we might have imagined.