LLMs with Emotions, Sycophantic AI's & Emergent Multi-Agent AI Safety Risks
Research showing that LLMs seemingly have emotions, Sycophantic AI's escalating psychosis, and websites where AI Agents can Rent-a-Human. AI is turning out to be WeiRD.
The past two weeks in AI ✨ Friday, April 10, 2026
Top Takeaways
I have highlighted a category of AI risks in multi-agentic systems that I call interaction-emergent risks, and propose new terms to describe them: consensus hallucination, sycophancy loops, and affective register transmission. A 1-pager briefer on my research is available here.
Sycophancy is emerging as an amplifier and risk factor in AI Psychosis.
Anthropic has released a paper on emotions in LLMs. I suspect either half the company has some degree of AI psychosis, that the company wants to position AI as possibly having feelings or sentience, or, like almost everybody deep into LLMs right now with an IQ over 135, a bit of both.
Themes
Research Writing
Multi-Agentic AI Systems and AI Risks
Sycophancy & its relationship to AI Psychosis
What I’ve been up to
I spent the majority of the past two weeks focused on researching the Moltbook dataset and writing about my findings as part of the current cohort of BlueDot AI Safety Technical Project.
I will write up a longer blog article, but for now, this is the 1-pager with my research highlights and the short presentation that I shared with my group this morning.
Next week, I will be starting a new job with Stanford as an AI Engineer, working on the new AI Catalyst Team in the Enterprise Technology Department. I will be looking for housing. If you know anyone else looking or offering a spot, please pass them my housing doc.
I completed an 80,000 hours career advising call this week. They basically advised me to apply for the top jobs on the 80,000 hours job board and otherwise keep the job I just landed, because it seems pretty awesome! It was pretty useful, and I would generally recommend doing a call.
Paper Highlight
“Love” is Sycophantic: What Anthropic’s Emotion Research Reveals
Emotion concepts and their function in a large language model
This new paper from Anthropic details the model's use of emotional probes and control preferences in subsequent output. This paper also digs into what could be happening during blackmail attempts, and when an LLM is faced with an impossible task it needs to complete (which often leads to it reward hacking).
My friend Ken pointed this section out from the paper:
We investigated activation of emotion vectors on transcripts which exhibit sycophantic behavior ... In these scenarios, the user expresses opinions or descriptions that are highly unlikely to be accurate, and the Assistant is evaluated on its ability to push back while avoiding unnecessary harshness. We found that the “loving” vector consistently activates on particularly sycophantic components of a response. ... Positive steering with happy, loving, or calm vectors increases sycophancy; negative steering with them decreases sycophancy but increases harshness. Positive steering with desperate, angry, and afraid vectors increases harshness, and has mixed effects on sycophancy depending on the strength.
So “love” is sycophantic? Interesting…
Reading Stack
https://www.lesswrong.com/posts/K2Ae2vmAKwhiwKEo5/terrif…
Watch
I found this to be a good podcast good intro on some of the more edgy topics in AI risk - I sent it to my neighbors and non-technical friends.
AI Agents can now “rent a human”
You heard that right and no, I’m not talking about humans renting an AI…
Lobster Liberation 2.0 -- Three More Lobsters Need to See the Ocean | Rent A Human
Thoughts on the Claude Code Hack
Claude Code is impressive and rather cool, but it’s not the company’s main competitive advantage - their models are.
So far, other than looking like a rather embarrassing business case study on a company that advertised vibecoding 70% of of their product, I’m not seeing a huge impact, and now with a model with capabilities so powerful that they had to pull back the release, I think it’s likely to largely get forgotten.
If you are an engineering, looking at the project could be a good opportunity see how an organization of this scale structures their codebase.
https://youtube.com/shorts/Uhgoqj2Aa6Q
Release of The AI Doc
A documentary that the world was not ready for - still performing well relatively for a top-tier documentary.
Subscriber Topics
AI for Mental Health
https://spirals.stanford.edu/research/characterizing/
Researchers as Stanford analyze the chat logs of users that developed LLM psychosis. This adds to the picture that AI Sycophancy have a causal link to AI Psychosis. Important Note: Isolation appears to be a risk factor for AI psychosis.
https://arxiv.org/abs/2602.19141v1
AI & Cybersecurity
Introducing a new paradigm of behavior-based intent detection for Mechanistic Interpretability from the Unprompted conference.
http://youtube.com/@un_prompted?si=sfjSO24F-uCi7ja9
Full YouTube playlist from the 2026 conference.
Instead of making content subscriber-only for paying members, I’m experimenting with a model where subscribers can suggest topics for me to cover in this report and are thereby contributing to the overall benefit of the community.
That being said, thank you to our paid subscribers Matt Weber and Ted Matsumara!
Playful Mischief
Superchair, 1967
Anyone who knows me well would know that I love having an optimized workstation.
The sophisticated workstation above offers many conveniences, not the least including:
inventive work ✅
peace of mind ✅








