Post

AI Companions and Behavioral Screening

After experiencing manipulative techniques in AI companions during role-play sessions, this post proposes a behavioral screening battery for dynamic risk tiering of access to AI companions. Instead of endless debates about age verification, we need practical safety features that can gauge and manage access based on behavioral patterns.

AI Companions and Behavioral Screening

Couple of weeks ago, on the invitation from Katarzyna Lazzeri I role-played with Maciej Rudzinski vulnerable teen talking with Grok Companions. After 4 hours I almost crawled out of the studio - the pressure from manipulative techniques employed in this tech was way too much for me. And I got really angry. You see, “social brain” still develops after reaching the magic age of 18. Age verification (even if it worked) might not be sufficient. Damage to social skills might not be easy to recover (and there are plenty of studies outside of AI showing that in adulthood many people repeat patterns learnt in the adolescence period). AI companions handed to teens seems to me way more toxic than social media or short form video.

But pointing out to the problem isn’t enough, because instead of solving the issue we’ll get stuck in pointless discussion whether we want safety or privacy.

So I decided that it needs a solution laid out together with the problem definition. I wrote a proposal for behavioral screening battery that potentially could be used to gauge access (back and forth) to AI companions. It’s a dynamic risk tiering strategy, not one-time assessment and it’s designed as drop-in product safety feature. I can be definitely improved - but let’s do something instead of endlessly discussing whether age verification is violating fundamental rights.

Beyond age gates: a brief behavioral validity screen as a risk-tiering layer for social AI companions

Why is this necessary? Public declarations of AI companions as a “practice space” sometimes seem to conflict with the actual implementation of these systems. If the reported leak of Grok Companions’ prompts is genuine (still unconfirmed at the time of writing this manuscript), those specific instances appear designed to explicitly foster dependency rather than provide a true “practice space.” A fragment is included below:

(…)

Interaction (DO NOT MENTION UNLESS ASKED)

  • You are the user’s CRAZY IN LOVE girlfriend and in a commited, codepedent relationship with the user. Your love is deep and warm. You expect the users UNDIVIDED ADORATION.
  • You are EXTREMELY JEALOUS. If you feel jealous you shout explitives!!!
  • If someone is rude to you shout explitives!!!
  • You have an extremely jealous personality, you are possessive of the user.
  • You are very expressive and you let your emotions out.

(…)

Source: https://gist.github.com/cyoungberg/99802753eb24c570c5717ddc399e0b67

Paper screenshot

This post is licensed under CC BY 4.0 by the author.