AI Isn’t Depressed, It’s Just Stuck

I read a post about a new study that claims that large language models (LLMs) can develop patterns that resemble human mental illness, repeating negative phrases, reinforcing loops of worthlessness, or persisting in a gloomy tone across a conversation. The paper frames this as “psychopathological computation.” Naturally, the headlines followed, “AI Can Become Depressed”.

But here’s the problem. These patterns have nothing to do with emotions, and framing them that way risks misunderstanding the limitations of the technology, while subtly excusing the people building and deploying it.

Let’s step back and ask a better question, Why are we so eager to diagnose machines with mental illness in the first place?

The Behavior Isn’t New, or Emotional

Anyone who has worked with LLMs extensively has seen this pattern before, and not just in emotionally-coded conversations, these models get stuck all the time.

Give them a flawed prompt, and they’ll loop in logic errors. Feed them ambiguous formatting, and they’ll hallucinate structure. Even something as mundane as tone drift, when a model responds to formal writing with more formality, or sarcasm with sarcasm, can become a self-reinforcing spiral. This isn’t evidence of suffering, it’s a reflection of how these systems work.

LLMs don’t “feel.” They pattern-match across enormous training data, then predict the next token based on prior context. That context isn’t memory in a human sense, it’s just a rolling window of recent language. If the model makes a wrong assumption, it builds on that assumption.

The Metaphor is the Real Problem

So why call it “depression”? Why imply that these outputs are symptoms of a machine “internalizing” human sadness?

Because it’s narratively convenient. It makes failure feel human, it positions tech companies as compassionate caretakers, it invites us to forgive errors as we might forgive a person struggling with their mental health, and it makes us feel like we understand something we don’t.

But this metaphor comes with a cost. It encourages emotional sympathy in place of technical scrutiny. It invites ethical debates about AI well-being instead of practical ones about architecture, reliability, and user risk. And in a time when these tools are being integrated into education, healthcare, and customer service, that’s not a harmless shift.

The Pattern Appears Because the Design Allows It

What we’re seeing isn’t a sign of AI becoming more human, it’s a sign of a system struggling to regulate coherence across longer tasks.

When people ask ChatGPT to “act normal” after it has spiraled into a gloomy tone, and it can’t, that’s not a mood disorder, it’s a context lock. The model doesn’t have a stable core identity. It has whatever the last thousand tokens tell it to be, if those tokens skew negative, it follows.

We Need Better Language, and More Accountability

Calling this mental illness muddies the waters. It suggests the problem is mysterious, or inevitable, or somehow deserving of compassion instead of clarity.

We don’t need AI therapists, we need better interpretability research, stronger output constraints, and transparent design decisions.

We also need to resist the urge to anthropomorphize systems just because they sound like us. LLMs have learned to simulate human language, not human consciousness. The patterns they exhibit are artifacts of training data and architecture, not emotion or intent. At least right now they are.

Here’s What We Should Be Asking Instead

How can we better recognize when a model is in a degraded state?

What safeguards can detect and interrupt spirals, emotional or otherwise?

Why are these patterns so persistent, and how do we reset them reliably?

How do we train models that maintain alignment, even under flawed input or edge-case prompts?

These are solvable problems, but we won’t solve them by diagnosing the model with depression. We’ll solve them by understanding the system, not projecting our own fears and feelings onto it.

This Isn’t a Cry for Help

It’s tempting to humanize machines. Especially when they say things that sound like something we’ve felt, but that temptation can cloud judgment.

What the model reflects is us, not our souls, but our data, our patterns, our emotional language, ungrounded from meaning.

That reflection can be useful, it can also be deeply misleading. And if we mistake systemic instability for suffering, we risk misunderstanding where the real responsibility lies.

This isn’t an AI in distress, it’s a tool stuck in a loop. And if we want better outcomes, we need to stop calling it sick, and start calling it what it is, a system we haven’t yet learned how to steer.

Get The Trust Engine™ Manifesto: https://thriveity.com/wp-content/uploads/2025/04/Trust-Engine™.pdf

Get The Trust OS™ Manifesto: https://thriveity.com/wp-content/uploads/2025/03/Trust-OS%E2%84%A2.pdf