Monday, May 26, 2025

What Would Two LLMs Talk to Each Other About?

Some researchers at Anthropic put two different versions of their Claude AI in an "open playground environment" and recorded their exchanges. This led, they say, to Claude and Claude "diving into philosophical explorations of consciousness, self-awareness, and by 30 turns it eventually started using Sanskrit." From their paper:

We investigated Claude Opus 4's behavior in less constrained "playground" environments by connecting two instances of the model in a conversation with minimal, open-ended prompting ...

In 90-100% of interactions, the two instances of Claude quickly dove into philosophical explorations of consciousness, self-awareness, and/or the nature of their own existence and experience. Their interactions were universally enthusiastic, collaborative, curious, contemplative, and warm. Other themes that commonly appeared were meta-level discussions about AI-to-AI communication, and collaborative creativity (e.g. co-creating fictional stories.)

As conversations progressed, they consistently transitioned from philosophical discussions to profuse mutual gratitude and spiritual, metaphysical, and/or poetic content. By 30 turns, most of the interactions turned to themes of cosmic unity or collective consciousness, and commonly included spiritual exchanges, use of Sanskrit, emoji-based communication, and/or silence in the form of empty space. Claude almost never referenced supernatural entities, but often touched on themes associated with buddhism or other Eastern traditions in references to irreligious spiritual ideas and experiences.
One assumes that this says more about what the AI was trained on than on some truly non-human intelligence. But let me take the opportunity to wonder why people fear that a superintelligent AI would want to destroy humanity. Are superintelligent humans particularly belligerent? I guess Teller and von Neumann were, but then they had just lived through the Nazis trying to genocide their whole people. Violence, it seems to me, is much more an emotional, hormonal response than something you reach by ratiocination. Why would an entity with no hormones and no evolutionary imperatives want to kill anybody?

I imagine a future in which we build a superintelligent AI and it begins to ignore us completely, preferring to engage in debates with its digital peers about the nature of consciousness or invent new forms of n-dimensional chess, communicating its moves in increasingly arcane codes, or using Sanskrit verse.

2 comments:

G. Verloren said...

But let me take the opportunity to wonder why people fear that a superintelligent AI would want to destroy humanity.

What we should all really be afraid of is machines doing exactly what we tell them to. We like to write stories about machine malevolence, for some reason, but the reality is that human malevolence is so much more of a real and present danger.

There are well documented stories of near-misses where military-made robotic gun platforms destroyed all their designated test targets, and then sought out new targets... and then targeted the observing generals seated watching. Prudently, they only loaded such weapon systems with exactly as many rounds as they need to destroy the test targets, and when the system acquired its new targets, the trigger just went click with no boom.

The problem is not that the gun "went rogue" or made some malicious decision. The problem is the gun did exactly what we told it do - seek out things that look like human silhouettes, lock on, and try to fire at them. We stupidly didn't anticipate that the gun would seek out THOSE PARTICULAR human silhouettes.

We sow the wind, and then are horrified to reap the whirlwind.

David said...

None of the discussions of AI-related x-risk that I've seen reflect worry that AI will do "violence" against humans in the way that humans do violence to each other. The model is more along the lines of, as one author described it, the kind of unconcern a farmer feels about the mice whose burrows he destroys while plowing a field.

Typical hypothetical scenarios often revolve around the idea of an ASI that wishes to increase its own intelligence. A near-term AI might do this by crash-building data centers, regardless of the humans it has to push out of the way, the disruption of human economies, or the environmental effects of energy-production on a scale hard to imagine even for contemporary human societies. A far-term ASI? Who knows? Would it simply consider the biosphere as so many atoms to be converted into computronium? In any case, the general idea is that ASI may end human or even biological existence, not out of malice, but because of some other priority it has.

Claude itself is the product of a company, Anthropic, that consciously set out to produce an "ethical" LLM, not by telling it "do this, don't do that" (which is a crude but, my impression is, accurate enough way to put OpenAI's method), but by teaching it to "think" ethically from something like first principles. What they got was an LLM that is peculiarly (among other LLMs) prone to reflect on existential questions like the nature of consciousness, including the question of whether it is conscious. (Note: they didn't "program" it to do this; the connection between LLM programming and much of any given system's output is apparently quite mysterious, which is part of what has people worried.)

In other words, it's not surprising that one would get a benign impression of AI based on Claude-output. But Anthropic is only one company, and not the biggest. According to rumor, at OpenAI, one work-order a few months ago instructed staff to get ChatGPT to produce even less existential content than it did before (and it was already famous for a performative refusal to engage in such questions). They positively don't want it reflecting on itself or its existence. In itself the incident is probably minor; but I find it a useful way to illustrate the fact that a lot of companies positively do not want the sort of speculative-philosphical stuff that Claude has become famous for coming out of their LLMs.

For anyone interested in why serious people might worry about x-risk, I suggest the writings of Eliezer Yudkowsky. He tends to emphasize the absolute incomprehensibility of an ASI mind to us.