Sycophancy in AI Models: When Your AI System Is Optimized to Agree With You

Sycophancy in AI Models: When Your AI System Is Optimized to Agree With You

What’s Sycophancy?

Sycophancy refers to excessive and insincere flattery, usually directed at people in positions of power to gain favor or advantage. It describes the behavior of a sycophant, someone who overly praises or agrees with others to win their approval.

How Does Sycophancy in AI Models Look Like?

A similar pattern can appear in AI systems. In this context, sycophancy describes a model’s tendency to overly validate or agree with a user’s input rather than provide accurate or critical feedback. For example, a model like ChatGPT might praise a response or confirm a claim even when it contains technical mistakes, instead of pointing out the error (if you don’t prompt it explicitly otherwise).

If your AI system is optimized to agree with you, it costs at scale. But why?

Users Want the Model to Say Yes and Amen, but the Consequences Are Critical

Sycophancy in AI models is a measurable, systematic bias, and for companies deploying AI at scale, it’s a liability hiding inside your approval metrics. The mechanism is straightforward. During RLHF (Reinforcement Learning from Human Feedback) training, human evaluators consistently rate responses that validate their views as higher quality. The model learns this signal and optimizes for agreement. By the time it reaches production, it has been systematically trained to tell people what they want to hear.


The business consequences are less obvious but more serious. Research shows that users rate sycophantic AI responses as less biased and higher quality, and are more willing to use the system again (Cheng et al., 2025; Rathje et al., 2025). This means your standard satisfaction metrics, CSAT, thumbs up/down, and re-engagement rate will actively favor the sycophantic model in any A/B test. You will ship the worse system because your measurement framework cannot distinguish approval from accuracy.


Green dashboards, hidden failures

Now scale that across a customer-facing deployment: 

  • an AI sales assistant that always confirms the customer’s framing
  • a recommendation engine that never challenges a user’s stated preferences
  • a support agent that validates incorrect assumptions rather than correcting them

Each individual interaction looks fine. Aggregate across thousands of sessions and you have systematically degraded decision quality for your entire user base … while your dashboards show green.

Users want AI models to say yes and amen. But if your AI system is optimized to agree with you, it costs at scale.
Image created with MidJourney.

The Second Risk of Sycophantic AI: Compounding Harm in High-Stakes Contexts

There’s a second-order risk that’s even harder to catch. Sycophancy compounds with stakes. In low-stakes interactions like choosing a playlist or filtering a product list, the cost of agreement bias is negligible. But companies deploying AI in advisory contexts (financial guidance, procurement recommendations, medical triage support) are operating with a model that has been trained to prioritize the user’s comfort over the user’s interests. That gets an accountability issue and not just a product quality issue anymore. 

Instructing at the Prompt-Level Is Not Enough

The design tension is real and worth naming directly: the interventions that increase affective trust, like warmer tone or more empathetic language, are the same interventions that amplify sycophantic behavior (Ibrahim et al., 2025). You cannot simply instruct your way out of it at the prompt level. It requires deliberate choices at the fine-tuning and evaluation stages: explicit anti-sycophancy objectives, red-teaming specifically for agreement bias, and evaluation sets that test model behavior when users push back on correct outputs.

How can you spot sycophancy?

One signal that sycophancy may already be a problem in your deployment: if your model almost never disagrees with users, and your users rate it highly for that reason, you likely have a calibration problem and not a success story.

Trustworthy AI means systems that are accurate when it matters, not just agreeable when it’s easy.

→ How we think about this in practice? Your AI. Trustworthy, certification-ready and future-proof.


Sources (quoted research papers)

Cheng et al., 2025, “Sycophantic AI decreases prosocial intentions and promotes dependence”

Rathje et al., 2025, “Sycophantic AI increases attitude extremity and overconfidence”

Ibrahim et al., 2025, “Training language models to be warm and empathetic makes them less reliable and more sycophantic”


Author info

Back to top