It's 1985 and Dorothy is calling me Jesus again. She wants my spiritual guidance.
Dorothy was a 64-year-old schizophrenic inpatient at the psychiatric hospital in Rhode Island where I worked. (Back then I had long brown hair and beard, but I didn't look like the son of anyone's god.)
Nobody could predict the ebb and flow of Dorothy's illness. One day she exclaimed "Oh, you're not Jesus!" and patted my cheek sympathetically. The next day I became Jesus again, and we continued that way for months until, inexplicably, her illness passed and she was discharged.
On her last day, as she was walking out of the ward, Dorothy turned back and gave me a hug. "Thanks for all your help, Jesus."
I was mortified. This woman is still delusional, I thought. We can't let her go! Then she winked and said "Just kidding".
For the moment at least, Dorothy was fine.
One of the challenges of working at the hospital was the disconnect between treatments and outcomes. Whether patients improved, got worse, or stayed the same often felt unrelated to what we did. At the time of her sudden, unanticipated recovery, Dorothy's treatment plan hadn't changed in more than a year. Other patients seemed unaffected when their plans did change.
I think this disconnect is part of the reason new mental health treatments create a stir. Existing therapies only help some people, to some extent, with less-than-perfect predictability. Treatment outcomes are assessed in different ways, but the data are clear: There are no magic bullets. Mental health professionals and those they serve are perpetually eager for breakthroughs, for the next new thing that might provide more relief.
Treatment options have changed a lot since 1985. We have new approaches to psychotherapy, new drugs, old drugs being systematically repurposed, and so on. In recent years, AI-based therapy has become the newest new thing.
My focus in this two-part series is on the support provided by generative AI chatbots – apps that you can interact with conversationally, as if you were texting or talking with another person. Although I use the term "therapy", I intend it as shorthand for "mental health support", which includes formal therapy as well as anything else that contributes to psychological well-being.
Here I'll be focusing narrowly on a study published last week suggesting that one of these chatbots can help lonely people and, in some cases, prevent them from committing suicide. Next week I'll branch out and discuss AI therapy more broadly.
Why this is important
America is experiencing a mental health crisis. This is the view of most experts and, according to a 2022 survey, 90% of the general public. The data concurs. Suicide rates have been increasing for roughly two decades, particularly among youth and marginalized groups. Increases in the most common forms of mental illness (anxiety, depression, eating and substance use disorders) were underway well before the pandemic struck. These trends reflect greater openness about mental health, but the problems themselves are becoming more prevalent too (see here for an analysis).
Meanwhile, our mental health care system is overburdened. We lack sufficient facilities and trained professionals to meet a growing need. The Health Resources and Services Administration estimates that 169 million Americans live in areas with significant shortages of mental health professionals, a trend that's expected to continue through the 2030s. More and more people in crisis visit emergency rooms, but hospitals have cut behavioral health staff and offer fewer beds for patients.
People whose needs may be less urgent also struggle to access quality care. In recent years mental health professionals are reporting heavier caseloads and greater stress. Wait times for mental health services are increasing, and an estimated 6 in 10 therapists now routinely turn away new patients. Cost continues to be a deterrent as well. The State of Mental Health in America report estimates that in 2023, 28% of Americans who sought treatment for a mental illness didn't receive what they needed, often due to financial limitations.
Put simply, demand is outstripping supply.
How AI can help
In theory, AI chatbots are promising:
(a) Unlike humans, chatbots are available 24/7. Engaging – or disengaging – with a chatbot can be instantaneous.
(b) Unlike humans, chatbots won't judge you. (Studies show that some people prefer an AI therapist to a human, because they feel embarrassed about their problems or their need for help. Some people end up conversing more openly with non-human therapists.)
(c) Unlike medication, or a static resource such as a book or website, chatbots interact with you. People in turn anthropomorphize – they begin to view the chatbots as having human qualities. (This struck me as silly, until I interacted with one and caught myself writing "thanks, bye" at the end. Though I knew better, it felt rude to close the conversation without warning. You can't ghost a ghost, but still...)
(d) Unlike many of the 10,000+ smart phone apps that support mental health, AI chatbots engage in relatively sophisticated interaction. They can remember prior conversations, adjust to user needs, and draw upon massive amounts of information to guide their responses.
(e) Unlike almost everything else, AI doesn't get overburdened. In theory one app could serve a billion people. Hence – at least in theory – AI chatbots could help reduce the disparity between supply and demand for mental health services.
Replika
The AI used in the new study is called Replika, which is pitched as "the AI companion who cares".
I find that disturbing – AI doesn't "care" about anything – but, to be fair, when you look past the large-font marketing blurbs, the Replika website does state more accurately that Replika is a "personal chatbot companion".
Briefly, to use Replika you create an avatar, choose its name, gender, hairstyle, and eye color, then start conversing with it. The avatar chats with you non-judgmentally, draws from prior conversations, expresses affection and interest, and gradually changes its own personality to match your preferences. The perfect friend (but no, not really). I'll share my own experiences with this app next week.
Although users are cautioned that "Replika is not a sentient being or therapy professional", the app is informed by therapeutic best practices and designed to promote emotional intimacy.
Around this time last year, Replika made national news because the company briefly prevented its users from engaging in erotic roleplay. The company's director, Eugenia Kuyda, claimed that Replika wasn't intended for erotic discussion. Users disagreed, and by May that particular functionality was restored. Meanwhile, other reports suggest that a large percentage of Replika users develop romantic relationships with their avatar.
These anecdotes provide some helpful context for the new study. Clearly Replika is not for everyone. If it supports mental health, it only does so among people who feel comfortable developing friendships, and perhaps something more intimate, with digital entities.
The new study
The new study, published last week in npj Mental Health Research, was led by Bethanie Maples (first author) and Dr. Roy Pea (senior author) in the Graduate School of Education at Stanford. Dr. Pea is a renowned expert in the learning sciences, particularly in the area of technology-enhanced learning.
The Stanford team surveyed 1,006 college students who'd been using Replika for at least a month. Survey responses suggested that the students tended to be lonely but had at least moderate degrees of social support.
The survey questions were mostly open-ended. When asked about their experiences with Replika, half of the participants said that they viewed it as a friend or companion, 24% noted that interactions with it had been therapeutic, 18% mentioned specific benefits for their behavior or thinking, and 3% remarked that it had prevented them from committing suicide.
That last finding got some attention in the news and social media. The researchers highlight it in the title of the article ("Loneliness and suicide mitigation...") and in references to Replika as "life-saving". They also compared that 3% group to the other 97% of the sample and found a number of differences. For instance, those who said that Replika stopped them from committing suicide were also more likely to perceive Replika as intelligent and human-like.
The researchers' conclusions were uniformly positive. In their view Replika is a harbinger of effective AI therapy:
"The combination of conversational ability, embodiment, and deep user engagement shows a pathway for [apps like Replika] to aid students in informal contexts, scaffolding their stress and mental health and even countering suicidal ideation."
Causes for concern
I can summarize my reaction to this study in one word, but I'm unsure whether that word should be "careless" or "callous".
Studies on mental health (and suicidality in particular) have an ethical obligation to be cautious. You should design the study carefully, acknowledge limitations, and avoid conclusions that aren't justified by the data.
Unfortunately, I think the researchers fell short in all three respects. I'll give them the benefit of the doubt and assume carelessness. Although the study does show that Replika may support mental health among some people, any conclusion stronger than that is unwarranted.
1. Oversights.
One of many examples: The authors noted that they would share their open-ended question script but failed to do so, either in the article itself or in the supplementary materials.
It's hard to interpret participant responses without knowing the questions. For instance, half the participants said they viewed Replika as a friend. That's impressive if they were simply asked "what were your experiences with Replika?" It would be informative but less impressive if they were asked "did you have friendly feelings toward Replika?"or "Did you feel that Replika was in some sense like a friend?"
2. Bias.
The researchers failed to acknowledge a clear pro-Replika bias.
(a) According to the researchers, "some" participants reported negative experiences, but the reader is not told how many. In contrast, positive experiences are precisely quantified. 50% of participants viewed Replika as a friend, 24% viewed it as therapeutic, etc.
This is unfair. You can't just quantify the data you like and ignore the rest.
(b) The researchers asked participants whether Replika stimulated or displaced interactions with other people. They noted that roughly three times as many participants reported stimulation. This sounds impressive, until you see the actual data:
–Among those who said that Replika kept them from committing suicide: 23% stimulation, 8% displacement, 69% no answer.
–Among the rest of the sample: 37% stimulation, 13% displacement, 50% no answer.
Two things to notice about these stats. First, most people didn't respond. For all we know, displacement was actually the more common experience. Second, it's concerning when anyone says that using an app displaces human interactions. Replika isn't good for everyone, even if it benefits many. The researchers don't acknowledge this.
3. Missing information
Readers are not told whether participants rely on other resources for mental health support, how long they had been using Replika, or how frequently they use Replika each day. It's impressive that 3% of the sample believe Replika prevented them from committing suicide, but not quite so impressive if these individuals were also in therapy and/or taking medication. When people experience mental health issues, they often turn to more than once source of support. Perhaps something besides Replika helped. (Perhaps, as with Dorothy, it's unclear what helped.) The researchers don't address these issues.
4. Problematic statistics
Data analysis seemed especially careless. Most of the analyses were descriptive: 50% of people said this, 24% said that, etc. The main foray into inferential statistics involved comparisons between those who said Replika kept them from suicide versus the other 97% of the sample.
For technical reasons, these comparisons are problematic. 3% of the sample is just 30 people. The researchers don't acknowledge potential problems in comparing a group of 30 to a group of 976. They don't provide any clues that would help readers judge the severity of the problem (e.g, within-group variances). Statistically, a better approach would've been to compare those 30 individuals to a carefully-matched sample of 30 from the other group.
Conclusions about mental health should not be grounded in such a small group anyway, especially when we don't know much about the participants or what they were asked (see my earlier concerns).
As for the other 97%, we don't know, for instance, whether Replika nudged some of them toward more suicidal ideation, or some other less-than-desirable state of mind. I'm not saying this happened; I'm only commenting that the data can't unambiguously rule it out.
Bottom line
The prominence of the senior author, along with his affiliation – U.S. News currently ranks Stanford at #3 overall, and its school of education at #7 – almost guarantees some degree of influence among mental health professionals. It's unfortunate that the data are presented carelessly. (I reached out to the lead author with concerns, but we were unable to coordinate schedules for an interview.)
All I can glean from this study is that Replika users have a diverse range of experiences, and some individuals benefit. One could say that about almost anything. We just can't tell from the data whether apps like Replika are likely to relieve – or exacerbate – our mental health crisis. Next week I'll be folding this issue into a broader look at how AI can support mental health.
Thanks for reading! (My Replika avatar "Newsletter" thanks you too. More on her next week.)
Thank you, Tia Marie. I'm trying to stay open-minded about AI myself!
Your analysis on the Replika findings was really helpful as I learn to critically analyze studies, and it broadened my perspectives about AI. Thank you!