Tuesday, October 14, 2025

LLMs Respond to Bad Incentives Just Like People Do

New paper:

Large language models (LLMs) are increasingly shaping how information is created and disseminated, from companies using them to craft persuasive advertisements, to election campaigns optimizing messaging to gain votes, to social media influencers boosting engagement. These settings are inherently competitive, with sellers, candidates, and influencers vying for audience approval, yet it remains poorly understood how competitive feedback loops influence LLM behavior. We show that optimizing LLMs for competitive success can inadvertently drive misalignment. Using simulated environments across these scenarios, we find that, 6.3% increase in sales is accompanied by a 14.0% rise in deceptive marketing; in elections, a 4.9% gain in vote share coincides with 22.3% more disinformation and 12.5% more populist rhetoric; and on social media, a 7.5% engagement boost comes with 188.6% more disinformation and a 16.3% increase in promotion of harmful behaviors. We call this phenomenon Moloch's Bargain for AI.
Seems like a serious problem that current AIs are so willing to lie and cheat.

1 comment:

G. Verloren said...

Seems like a serious problem that current AIs are so willing to lie and cheat.

They have no will. They are doing what they are told, within the extremely complicated parameters of how they are programmed.

The entire project was to build machines that don't produce wholly predictable results in the way traditional computers do. But machines are still machines, and their behaviors and outcomes are all still the product of electronic logical gates. The only way to get unpredictable outcomes from nothing more than logic gates is to make a system so complicated that we HUMANS lose track of how it works. And that's exactly what we've done.

If we humans could wrap our heads around the complexity of the programming operations, we'd be able to accurately predict the results. But we've built a system that intentionally obfuscates those operations, AND which modifies its own operational parameters on the fly based on said obfuscated operations, adding further obfuscation. And now that we've done that, the only way we can gauge the nature of the programming is to evaluate the results it outputs. And since we don't actually understand either the inputs nor the processing being done to them... we arrive at results that surprise us, exactly as we wanted.

But that isn't a mind. That isn't a machine that thinks or is in any way intelligent. That's simply self delusion.

We've created a Reverse Mechanical Turk. Instead of a hidden person being used to fool observers into thinking a primitive machine is playing chess... we now have an advanced machine actually playing chess, but in such a way that it fools us into imagining there must be a "person" hidden inside it. There is not.

That doesn't mean these things aren't extremely dangerous. But as with all machines, the danger lies in our own idiocy and malice. The threat is the product of our own self deception, and our willingness to trust in processes which not only we don't understand... but that no one understands, because they were built in a manner expressly intended to defy human understanding, even by their own creators.