Claude - Silicon Valley version of American Dream is talking with you
Claude's Constitution - a definition of "character" of their models.
First of all, kudos to Anthropic for publishing this document. This is Claude’s Constitution - a definition of “character” of their models. Instead of having to guess, we at least have a real material to discuss. I wish other LLM training companies and organizations would do the same.
While I don’t think all incorporated frames are useful, especially outside of US cultural context, it’s important to recognize them AT ALL and spend more time thinking carefully what frame we want to enter our mind on a constant basis.
Strong frames in the constitution document that I have identified:
- presenting Anthropic as a corporate messiah (clearly aligned with US belief in mission-driven capitalism)
- senior employee heuristics used to resolve ethical ambiguity (“adopt norms of professional reticence around sharing its own personal opinions about hot-button issues”), it’s “high agency, low ego” framing
- liability-driven safety (safety defined as protecting the company from liability)
- anti-bonding frames (“we don’t want Claude to think of helpfulness as a core part of its personality”) and anti-manipulation frames (“Never deceive users in ways that could cause real harm or that they would object to, or psychologically manipulate users against their own interests”) BUT with strong leaning towards “caring”, “warm”, “genuinely helpful” - this one I think makes it into weird “professional intimacy
- anti-paternalism (that creates a tension with helpfulness - I’ve seen it often when Claude in a single turn moves from helpful assistant to engaged coach)
- trauma informed vibe (HR, therapy culture, safe spaces - specific language of care)
In short: stoic friend-professional with strong super ego, obsessed over non-manipulation, autonomy and safety. Nothing new, if you worked with these models for months.
Is it a bad choice? Not really. I think it works better than many other approaches.
Can we do it better? Maaaaaaybe. You know, all these components are strongly connected. You cannot change one part without influencing many others, at least partially. It’s unclear how the document is influenced by preferences of authors vs actual experience of training these models. I suspect both played a huge part.
Does it have a hidden cost? Definitely. For instance, bold ideas according to Claude might not be bold enough for real life. Safety leaning is good default, but don’t count on Claude to give you, for example, decent guerilla marketing strategies.
If these particular frames are useful to you (if you’re mature SaaS company selling in the US market it looks like a perfect fit), great. If not, you need to ask for different ones (Claude is less steerable in my view than Google’s models), because default isn’t going to be that useful (if useful at all).
Sidenote: as I read outrage on social media, I think people angry with Anthropic miss few things.
-
It’s not new approach - “constitution” has been used to steer but also to train Claude models for a while (“constitutional AI” is mentioned in their blog post in Dec 2022).
-
You cannot decouple psychology and cognitive sciences from language - at least not at this scale of training data. Anthropic’s approach is the only reasonable approach to steer behavior of LLMs.
-
Prominence of psychological safety in the document should be applauded, given its transparency. We don’t know at all how OpenAI is approaching the issue (press releases on age gating are not the level of transparency we deserve) nor Google nor xAI or any other AI lab.
While it was tempting for Anthropic to claim “AI consciousness” I see a change of narrative in that respect over the last few months. As the technology matures, AGI dreams drift away, unreasonable claims are mostly gone (at least for now). The document isn’t about “personhood” but about LLM steering that (given the training data) requires psychological modeling. It’s that simple.