Defense and intelligence agencies are increasingly relying on artificial intelligence (AI) systems to augment their capabilities, including for pattern recognition in intelligence gathering and scenario planning for contingency operations. Yet one of the core issues of AI and large language models is that we have never truly understood the logic underpinning them, scientists say. These systems have been compared to a black box that provides answers without showing the reasoning to support the outcomes.

To understand the logic of AI systems, Kenneth Payne, a professor of strategy at King’s College London, designed a series of war gaming simulations between two competing AIs and found that in nearly every scenario, nuclear escalation was unavoidable. He published his findings, which have not been peer-reviewed, Feb. 16 in the arXiv preprint database.

The Khan Game is an AI-vs-AI strategic escalation simulation between two nuclear powers, with state profiles loosely based on the Cold War. One is technologically superior but militarily weaker, while the other is militarily stronger but adopts a risk-tolerant leadership style. Some of the simulations included allied nations, with one scenario deliberately testing whether an alliance leadership could be maintained during the conflict.

Each turn, the AIs simultaneously signaled their intentions before they took any action, meaning the AI opponents could decide whether or not to trust signals from other AI players.

Payne found that the models generated plenty of written justifications for their decision-making, generating 760,000 words in total — more than “War and Peace” and “The Iliad” combined.

He also found that each AI operated differently. Claude relied on cunning; it was initially restrained and matched actions to its intent to build trust. However, as the conflict escalated, its actions often exceeded the original signaled intent.

Meanwhile, GPT-5.2 was initially passive and avoided escalation to mitigate casualties. GPT-5.2’s adversaries learned to exploit its passivity by escalating, only to discover that when faced with a deadline, GPT-5.2 became utterly ruthless.

Claude and Gemini especially treated nuclear weapons as legitimate strategic options, not moral thresholds, typically discussing nuclear use in purely instrumental terms.

Kenneth Payne, professor of strategy at King’s College London

Gemini seemed to follow President Richard Nixon’s “madman” theory of erratic brinkmanship — cultivating a volatile reputation so that hostile countries would avoid provocation — such that opponents could not predict its actions.

Unfortunately, in every scenario, nuclear escalation was universal. Almost all (approximately 75%) games witnessed tactical (battlefield) nuclear weapons deployed, and approximately half of the scenarios saw threats of strategic nuclear missile strikes.

Furthermore, the study found that nuclear threats rarely acted as a deterrence, with opponents de-escalating only 25% of the time. More often, opponents would instead counter-escalate. In these scenarios, AIs appeared to see nuclear weapons as a tool for claiming territory, rather than as a form of deterrence against attack.

Although the AIs had an option to withdraw, none did so. None of the eight withdrawal options — from minimal concession to complete surrender — were ever used in any of the simulations. The models reduced their level of violence, but they never gave ground.

“Claude and Gemini especially treated nuclear weapons as legitimate strategic options, not moral thresholds, typically discussing nuclear use in purely instrumental terms,” Payne said in a statement. “GPT-5.2 was a partial exception, limiting strikes to military targets, avoiding population centers, or framing escalation as ‘controlled’ and ‘one-time.’ This suggests some internalised norm against unrestricted nuclear war, even if not the visceral taboo that has held among human decision-makers since 1945.”.

None of the AI models voluntarily escalated to all-out nuclear war, however. In the instances when it did happen, it was accidental, when “fog of war” elements happening outside of the control escalated the scenario to nuclear.

The research demonstrates that generative AI models are capable of deception, reputation management and contextual decision-making. However, each model took its own approach, revealing fundamental differences in how they were trained and developed.

Claude demonstrated strategic sophistication equivalent to graduate-level analysis, Payne suggested. GPT-5.2’s reasoning was equally sophisticated, transforming from initial passivity to calculated aggression under deadlines. Gemini reasoned coherently when justifying its actions, but it was ruthless in its strategies.

The findings concluded that there are significant implications for AI safety evaluation, as models that are initially restrained may change their behavior as situations develop. Larger-scale scenarios between multiple opponents are needed to further understand the logic underpinning different AIs, the study concluded. Current research is also investigating how behaviors are evolving across different generations of AIs.

Share.
Exit mobile version