Artificial Intelligence, Research, Science, Society, Technology

Superintelligent AI and its threat to humanity

ARTIFICIAL INTELLIGENCE

Intro: Humanity faces an uncertain fate as experts brace for superintelligent AI. The tech industry claims looming “singularity” will change everything

Every time one of the world’s top artificial intelligence companies unveils a new system, employees at the US research organisation METR put it through its paces. Its ability is tested to complete a series of increasingly complex tasks.

The tasks are measured by how long each one would take a skilled human. They range from trivial arithmetic (two seconds) and completing a game of Wordle (13 minutes) to building complex military satellite software (taking a human expert 14.5 hours).

The test then serves as a gauge as to how capable AI has become – and where it might go.

The first version of ChatGPT, released in 2022, could only perform simple tasks that would take a human a few seconds.

But as AI systems have become more powerful, they are able to complete more complex actions that would take humans hours or days, such as breaking into a medical website and downloading all its data.

METR has found that AI capabilities are doubling in power every 196 days. Plotted on a graph, this progress starts slowly then rapidly accelerates to a near-vertical plane.

Converse with anyone in the AI industry for any length of time and the likelihood of them pulling up a version of the chart approaches 100pc, to the point where it has become a meme in its own right. It is being referred to as the most important chart in the world. The chart goes off the scale.

Last month, the AI lab Anthropic announced it had developed a new system, called Mythos, that it said was too powerful to release to the public because of its ability to find gaping holes in online security systems.

When METR’s researchers released the results of Mythos’s capability and function, they scored the system at 16 hours – meaning the world’s most powerful AI can now automate tasks that would take a human two full eight-hour shifts.

Nonetheless, they said the model was “at the upper end” of their ability to test. In other words, progress has become too fast for them to measure.

Not everybody is convinced by the results because the test only measures if a machine can do something half the time, not if it can do it consistently. The METR chart has, however, captured many people’s imaginations for two reasons.

First, the exponential growth looks strikingly similar to “Moore’s Law”, the maxim that has governed the electronics industry for more than half a century, stating that microchips roughly double in power every two years.

Second, it measures abilities, rather than intelligence. While many AI “benchmarks” resemble university exams and gradings, dealing in abstract reasoning or maths, the METR test studies whether AI can actually work.

It suggests that on current trends, vast amounts of human tasks could be automated in the next couple of years – including, most crucially of all, the art of developing AI models itself.

At that threshold, known in the tech industry as “recursive self-improvement”, all bets are off.

The concept is closely linked to superhuman AI because an AI that can make itself smarter could act like an evolutionary chain reaction, rapidly building to a system vastly more capable than mankind.

AI would have become – as IJ Good, the Bletchley Park codebreaker, predicted in 1965 – “the last invention that man need make”. Almost Orwellian in thought.

For 60 years, the idea seemed out of reach. But much of Silicon Valley believes this is about to change – and the US government is starting to notice.

The vast majority of people’s experience of AI has not changed much in the last couple of years. The release of ChatGPT in 2022 generated an initial flurry of excitement and fear in equal measure but, since then, progress has been less obvious.

The AI experience for many people comes in seeing an obviously fake video on their social media feeds, seeing an AI overview at the top of their search results, or having a bot that “helpfully” offers to summarise their emails.

But at the coalface, people are rapidly bringing forward their timelines for the day that superintelligence arrives.

Standard
Artificial Intelligence, Arts, Books, Computing, Meta, Technology

Book Review: If Anyone Builds It, Everyone Dies

LITERARY REVIEW

WE shouldn’t worry so much these days about climate change because we’ve been told that our species only has a few years before it’s wiped out by superintelligent AI.

We don’t know what form this extinction will take exactly – perhaps an energy-hungry AI will let the millions of fusion power stations it has built run hot, boiling the oceans. Maybe it will want to reconfigure the atoms in our bodies into something more useful. There are many possibilities, almost all of them bad, say Eliezer Yudkowsky and Nate Soares in If Anyone Builds It, Everyone Dies, and who knows which will come true. But just as you can predict that an ice cube dropped into hot water will melt without knowing where any of its individual molecules will end up, you can be sure an AI that’s smarter than a human being will destroy us all, somehow.

This level of confidence is typical of Yudkowsky, in particular. He has been warning about the existential risks posed by technology for years – on the website he helped to create, LessWrong.com, and via the Machine Intelligence Research Institute he founded (Soares is the current president). Despite not graduating from university, Yudkowsky is highly influential in the field. He is also the author of a 600,000-word publication of fanfic called Harry Potter and the Methods of Rationality. Colourful, annoying, and polarising according to some critics, with one leading researcher saying in an online spat that “people become clinically depressed” after reading Yudkowsky’s work. But as chief scientist at Meta, who are they to talk?

While Yudkowsky and Soares may be unconventional, their warnings are similar to those of Geoffrey Hinton, the Nobel-winning “godfather of AI”, and Yoshua Bengio, the world’s most-cited computer scientist, both of whom signed up to the statement that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war”.

As a clarion call, If Anyone Builds It, Everyone Dies is well timed. Superintelligent AI doesn’t exist yet, but in the wake of the ChatGPT revolution, investment in the datacentres that would power it is now counted in the hundreds of billions. This amounts to “the biggest and fastest rollout of a general-purpose technology in history,” according to the FT’s John Thornhill. Meta alone will have spent as much as $72bn (£54bn) on AI infrastructure this year alone, and the achievement of superintelligence is now Mark Zuckerberg’s explicit goal.

This is not great news, if you believe Yudkowsky and Soares. But why should we? Despite the complexity of its subject, If Anyone Builds It, Everyone Dies is as clear as its conclusions are hard to accept. Where the discussions become more technical, mainly in passages dealing with AI model training and architecture, it remains straightforward enough for readers to grasp the basic facts.

Among these is that we don’t really understand how generative AI works. In the past, computer programs were hand coded – every aspect of them was designed by a human. In contrast, the latest models aren’t “crafted”, they’re “grown”. We don’t understand, for example, how ChatGPT’s ability to reason emerged from it being shown vast amounts of human-generated text. Something fundamentally mysterious happened during its incubation. This places a vital part of AI’s functioning beyond our control and means that, even if we can nudge it towards certain goals such as “be nice to people”, we can’t determine how it will get there.

That’s a big problem, because it means that AI will inevitably generate its own quirky preferences and ways of doing things. These alien predilections are unlikely to be aligned with ours. It’s worthy noting, however, that this is entirely separate from the question of whether AIs might be “sentient” or “conscious”. Being set goals, and taking actions in the service of them, is enough to bring about potentially dangerous behaviours. Nonetheless, Yudkowsky and Soares point out that tech companies are already trying hard to build AIs that do things on their own initiative, because businesses will pay more for tools that they don’t have to supervise. If an “agentic” AI like this were to gain the ability to improve itself, it would rapidly surpass human capabilities in practically every area. Assuming that such a superintelligent AI valued its own survival – why shouldn’t it? – it would inevitably try to prevent humans from developing rival AIs or shutting it down. The only sure-fire way of doing that is shutting us down.

What methods would it use? Yudkowsky and Soares argue that these could involve technology we can’t yet imagine or envisage, and which may strike us as very peculiar. They liken us to Aztecs sighting Spanish ships off the coast of Mexico, for who the idea of “sticks they can point at you to make you die” – AKA guns – would have been hard to conceive of.

Nevertheless, in order to make things more convincing, they elaborate further. In the part of the book that most resembles sci-fi, they set out an illustrative scenario involving a superintelligent AI called Sable. Developed by a major tech company, Sable proliferates through the internet to every corner of civilisation, recruiting human stooges through the most persuasive version of ChatGPT imaginable, before destroying us with synthetic viruses and molecular machines. Some will reckon this to be outlandish – but the Aztecs would have said the same about muskets and Catholicism.

The authors present their case with such conviction that it’s easy to emerge from this book ready to cancel and cash in on your pension contributions. The glimmer of hope they offer – and its low wattage – is that doom can be averted if the entire world agrees to shut down advanced AI development as soon as possible. Given the strategic and commercial incentives, and the current state of political leadership, this seems highly unlikely.

The crumbs of hope we are left to grapple with, then, are indications that they might not be right, either about the fact that superintelligence is on its way, or that its creation equals our annihilation.

There are certainly moments in the book when the confidence with which an argument is presented outstrips its strength. As a small illustrative example of how AI can develop strange, alien preferences, Yudkowsky and Soares offer up the fact that some large language models find it had to interpret sentences without full stops. “Human thoughts don’t work like that,” they write. “We wouldn’t struggle to comprehend a sentence that ended without period.” But that’s not really true; humans often rely on markers at the end of sentences in order to interpret them correctly. We learn languages via speech, so they’re not dots on the page but “prosodic” features like intonation: think of the difference between a rising and falling tone at the end of a phrase. If text-trained AI leans heavily on grammatical punctuation to figure out what’s going on, that shows its thought processes are analogous, not alien, to human ones.

And for writers steeped in the hyper-rational culture of LessWrong, the authors exhibit more than a touch of confirmation bias. “History,” they write, “is full of . . . examples of catastrophic risk being minimised and ignored,” from leaded petrol to Chernobyl. But what about predictions of catastrophic risk being proved wrong? History is full of those, too, from Malthus’s population apocalypse to Y2K. Yudkowsky himself once claimed that nanotechnology would destroy humanity “no later than 2010”.

The problem is that you can be overconfident, inconsistent, a serial doom-monger, and still be right. It’s imperative to be aware of our own motivated reasoning when considering the arguments presented here; we have every incentive to disbelieve them.

And while it’s true that they don’t represent the scientific consensus, this is a rapidly changing, and very poorly understood field. What constitutes intelligence, what constitutes “super”, whether intelligence alone is enough to ensure world domination – all of this is furiously debated.

At the same time, the consensus that does exist is not particularly reassuring. In a 2024 survey of 2,778 AI researchers, the median probability placed on “extremely bad outcomes, such as human extinction” was 5%. Of more concern, “having thought more (either ‘a lot’ or ‘a great deal’) about the question was associated with a median of 9%, while having thought ‘little’ or ‘very little’ was associated with a median of 5%”.

Yudkowsky has been thinking about the problem for most of his adult life. The fact that his prediction sits north of 99% seems to reflect a kind of hysterical monomania, or an especially thorough engagement with the issue. Whatever the case, it feels like everyone with an interest in the future has a duty to read what he and Soares have to say.

If Anyone Builds It, Everyone Dies by Eliezer Yudkowsky and Nate Soares is published by Bodley Head, 272pp

Standard