Koko Used GPT-3 AI Without Informed Consent from Users
When you’re on the cutting edge of trying to help people with mental health issues, sometimes it’s easy to let the technology and good intentions get ahead of ethics.
That seems to be the case, for the second time, for Robert Morris, PhD, and his technology service, Koko. Artificial intelligence (AI) has garnered much attention and headlines lately due to the advances it’s made in art, and more recently, with text (via services like ChatGPT which utilizes an AI language model called GPT-3).
When deployed on vulnerable populations, such efforts need to be mediated by adhering to human subjects research ethics. It’s not clear Koko did so in this case.
What is Koko?
Koko appears to have started off as a project of then-graduate student Morris who was noticing the impact of peer communities online in technology. He thought the Q&A communities in technology in the late 2000s could be replicated for mental health concerns. (Apparently Morris was not aware of the thousands of existing online peer support communities that existed at that time on a wide variety of social media platforms and independent forums.)
An app called Koko was created for this peer support platform, and it apparently helped a lot of people. As a part of this app, they also created a piece of software, called a “bot,” that could catch keywords (like “suicide” or “suicidal”) and send an alert to people who could intervene. This bot, called Kokobot, was also utilized independently on other social networks to identify emotional distress, and according to Morris, helped millions of people.
I admire Morris for his ingenuity in leveraging filtering technology to quickly help people in need. It’s a simple thing to offer to an online community or social media platform, but I could see how it can help make a significant difference. Sometimes people are just looking for acknowledgement of their pain, or a nudge to further resources (such as a helpline or chat crisis service) to help them.
But things didn’t always work out as planned and not everyone appreciated Kokobot’s suggestions. Take the 2017 Reddit incident. Reddit is a vast, old online community made up of tens of thousands of individual forums, run by volunteers who moderate and oversee them. In January 2017, Morris and his team decided to release Kokobot onto two Reddit forums without warning or users’ consent. Because they didn’t take into account that maybe not everyone was looking for or appreciated automated self-help or crisis resources, Kokobot didn’t last long. It became the target of users’ ire and anger, for being released without notice or allowing a user to “opt-in” to their words being analyzed by the bot.
Koko seems to have fallen on hard times during Morris’s stint at Airbnb, and he left the company in 2021 to focus on Koko. It seems to have been revitalized in 2022 as an add-on to existing online communities and social media platforms, to alert community administrators when a user seems to be struggling with their mental health or emotional stability. It seems to also be incorporating more AI into its filtering, to better detect when a person is in actual or increasing distress.
GPT-3 Provided Support Answers to Vulnerable Users
On January 6, 2023, Morris crowed on Twitter, “We provided mental health support to about 4,000 people — using GPT-3. Here’s what happened.” He continued:
To run the experiment, we used @koko a nonprofit that offers peer support to millions of people… On Koko, people can ask for help, or help others. What happens if GPT-3 helps as well?
We used a ‘co-pilot’ approach, with humans supervising the AI as needed. We did this on about 30,000 messages…
Messages composed by AI (and supervised by humans) were rated significantly higher than those written by humans on their own (p < .001). Response times went down 50%, to well under a minute.
He goes on to note that after the experiment was completed, Morris pulled the bot off the platform, saying, “Once people learned the messages were co-created by a machine, it didn’t work. Simulated empathy feels weird, empty.”
The problem? Morris was conducting research on human subjects without their express permission, nor informed consent. Koko also apparently failed to get the experiment reviewed by a group of independent researchers, since it was being conducted on human subjects.
Making Excuses About Research Ethics
It’s odd to me that Morris, a professional researcher with a Ph.D. from MIT, didn’t think Koko was conducting human subjects research requiring the approval of an institutional review board (IRB). An IRB is an independent third-party group of professionals who review proposed research that is intended to be conducted on others. These are usually housed at universities and review all the research being conducted by the university’s own researchers to ensure it doesn’t violate things like the law, human rights, or human dignity. IRBs important determine whether a human subjects experiment is safe and whether it’s likely to cause harm. If it is, alternatives must be proposed that reduce the risk profile of the experiment.
Morris claimed later in the same twitter thread:
We offered our peer supporters GPT-3 to help craft their responses. We wanted to see if this would make them more effective. People on the service were not chatting with the AI directly.
This feature was opt-in. Everyone knew about the feature when it was live for a few days.
The problem is, of course, that an objective researcher would have no way to back the claim that “everyone knew” Koko was using an AI chatbot to help craft its responses. How would they know? Because in a reply from the Koko service, the message cryptically noted that it was “written in collaboration with Koko Bot.” What that means and how a person would opt-out of such “collaboration” is not clear, but appears to be simply assumed by Morris and his team.
It also seems intellectually dishonest to suggest that since the responses were “supervised by humans,” the AI wasn’t really providing the support answers. He doesn’t note how much of each response or how many responses were actually edited significantly by a human.
The Ethics of Technology Experimentation
Morris claims that “2023 has started with some really important conversations about artificial intelligence and GPT-3.” But honestly, human subjects research and its intersection with technology companies is nothing new. In fact, Facebook was caught also conducting human subjects research on its unwitting users nearly a decade ago in 2013. These “important conversations” boil down to this: using at-risk, emotionally distraught individuals for experimentation without their express knowledge or consent.
It is unconscionable that technologists continue to deploy self-help technology and apps with little understanding of their emotional or psychological impact. And believe that, in the noble pursuit of “helping them,” they can take advantage of their vulnerable status.
Koko Advisory Board Asleep At the Wheel
It might be fair to ask what about Koko’s esteemed Advisory Board? Why did they approve this experimental research on human subjects without it going in front of an IRB? I mean, look at this list:
- Fraser Kelton
- Margaret Laws
President and CEO of Hopelab
- Stephen Schueller
Associate Professor, UC Irvine
- Gina Sanders
Founder, Gina’s Collective
- Jessica Schieder
- Matthew Nock
Edgar Pierce Professor of Psychology, Harvard
- Mike Nolet
Founder, Live Better
- Tom Insel
Former director of NIMH, Cofounder Humanest
- Barr Taylor
Professor of Psychology and Behavioral Sciences, Stanford
- Rosalind Picard
Professor of Media Arts & Sciences, MIT
- Andy Grygiel
Chief Brand Officer, project44
- Tom Siegel
Co-Founder and CEO, Trust & Safety Laboratory Inc.
- Melissa Thomas-Hunt
Professor of Public Policy
John D. Forbes Distinguished Professor of Business Administration
- James LaBelle
Former CSO, Scripps Health
Apparently the answer is these people are mostly a board of “names,” gathered to help make Koko look good. Whether they meet regularly (or at all) is an open question. Clearly they were not consulted prior to this research being conducted, because Morris notes, “Our clinical advisory board is meeting to discuss guidelines for future work, specifically regarding IRB approval.”
Ahh, so now you’re going to consult your Advisory Board, months after the horse has been let out of the barn.
The sad reality is, had Morris not crowed about his accomplishment on Twitter, it’s unlikely this breach of ethics would’ve been discovered. Vulnerable users certainly only had a tiny inkling they were a part of an experiment — and had no real way of opting out of it.
The most important component of consent is that it has to be informed. Informed means a user needs to know what your experiment is about, generally, and how and when their data will be used. An ethical research also has to convince an IRB that the experiment won’t cause harm to subjects; it’s not clear Morris even included a harm measurement in this research.
I doubt this will be the last time I write about a well-meaning technologist who ignores research ethics in order to test his or her hypothesis. The ability to conduct research on unwitting users has increased exponentially in the past decade. Most technologists have little understanding of scientific research ethics, or why they matter. I hope this changes, but if history is any guide, my hope may be misplaced.