Live Learning, AI and Web3: Trends in Enterprise Education Technologies
In fact, while The Trevor Project used open source AI models including GPT-2 from OpenAI and Google ALBERT, it does not use tools built with them to conduct conversations directly with children in difficulty. Instead, the group rolled out these models to create tools it used internally to train more than 1,000 volunteer crisis counselors and to help triage people’s calls and texts to prioritize patients for high risk and connect them more quickly to advisors in real situations. .
The Trevor Project refined GPT-2 to create a crisis contact simulator with two AI-based characters. Named Riley and Drew, these AI-powered characters communicate internally with counselor trainees, helping them prepare for the kinds of conversations they’ll have with real kids and teens.
Each persona represents a different life situation, background, sexual orientation, gender identity, and level of suicide risk. Riley impersonates a North Carolina teenager who feels depressed and anxious, while Drew is in his early 20s, lives in California, and deals with bullying and harassment.
Launching in 2021, Riley was the first of two characters. Rather than just using out-of-the-box GPT-2 models, the organization adapted the deep learning model to its specific purpose by training it using hundreds of role-playing discussions between real staff advisors and an initial set of data reflecting what someone like Riley can say.
“We’ve trained Riley on several hundred past Riley RPGs,” said Dan Fichter, head of AI and engineering at The Trevor Project, which developed Riley’s character through a partnership with the program. from Google grants, Google.org. “The model should remember everything said and asked by you so far. When we trained GPT on these conversations, we got something that is very reliable and responsive the way our trainers would respond [to],” he said.
The Trevor Project, which has a technical team of 30 people — some of whom are dedicated to machine learning-related work — then developed the character of Drew itself.
“When young people reach out, they are always served by a trained, caring human being who is ready to listen and support them no matter what they are going through,” said Fichter.
Recycling AI models for code-switching and the Texas effect
Although he said the personality models are relatively stable, Fichter said the organization may need to retrain them with new data as the informal language used by children and adolescents evolves to incorporate new ones. acronyms and that current events such as a new law in Texas defining gender-affirming medical care as “child abuse” is becoming a topic of conversation, he said.
“There’s a lot of code-switching that happens because they know they’re reaching out to an adult. [so] it could mean there’s a benefit to retraining regularly,” Fichter said.
The Trevor Project released data from a 2021 national survey that found that over 52% of transgender and non-binary youth”seriously considering suicide“over the past year, and of those, one in five have attempted it.
“Health care is a people-driven industry, and when machine learning crosses paths with people, I think we have to be careful,” said Evan Peterson, machine learning engineer at the health technology company and LifeOmic wellness, which used an open source language. models such as Hugging Face and ROBERTaa version of BERT developed at Facebook, to build chatbots.
To assess performance, fairness, and fairness with respect to certain identity groups, the Trevor Project evaluated a variety of major natural language processing and deep language learning models before deciding which were the best fit. to specific tasks. It turned out that when it came to holding a simulated conversation and generating the type of long, coherent sentence required for a 60-90 minute counselor training session, GPT-2 worked best.
AI for hotline triage and risk prioritization
But ALBERT performed better than others when testing and validating models for a separate machine learning system, the Trevor Project, designed to help assess the risk level of people calling, texting or chatting with his suicide prevention hotline. The risk assessment model is deployed when people in crisis contact the hotline. Based on answers to basic intake questions about a person’s state of mind and suicide history, the model assesses their level of suicide risk, ranking it with a numeric score.
Tailoring large language models for particular purposes with very specific training datasets is one way for users like The Trevor Project to leverage their benefits while being careful not to facilitate more confusing digital conversations.
Photo: The Trevor Project
The model makes the assessments based on a wide range of statements with different levels of detail. While it can be difficult for humans – and deep learning models – to assess suicide risk if someone simply says, “I don’t feel well,” the ALBERT-based model is ” pretty good” for learning emotional terms that correlate with suicide. risk such as the language describing the ideation or details of a plan, Fichter said. When setting up the model to categorize risk, the group was cautious in classifying someone as higher risk when it wasn’t entirely clear, he said.
To train the risk assessment model, counselors tagged tens of thousands of anonymized and archived examples of people’s answers to intake questions, determining the level of clinical risk associated with them. If someone said they were very upset and had attempted suicide in the past, for example, the conversation was categorized as high priority. This labeled information formed the model.
In the past, human advisers used a heuristic rule-based system to triage callers, said Fichter, who said he believed the AI-based process provided “much more accurate prediction.”
Make fun of TV shows (but avoid the worst problems)
The Trevor project balances the benefits of large language models against the potential problems of limiting their use, Fichter said. He pointed to the strictly internal use of GPT-2-based personality models to generate language for counselor training purposes, and the use of the ALBERT-based risk assessment model only to prioritize when a counselor needs to talk to a patient.
Yet large open source natural language processing models, including various iterations of OpenAI’s GPT – pre-trained generative transformer – have generated a reputation as toxic language factories. They have been criticized for producing a text that perpetuates stereotypes and spews out foul language, in part because they were trained using data gleaned from the internet where that language is common. Groups including OpenAI are constantly working to improve the toxicity and accuracy issues associated with large language models.
“Research is underway to make them ‘role models of good citizens,'” Peterson said. However, he said machine learning systems “can make mistakes [and] there are situations in which this is not acceptable.
Meanwhile, big language role models regularly burst onto the scene. Microsoft Tuesday introduces new AI models he said he deployed to improve common language understanding tasks such as name entity recognition, text summarization, custom text classification and key phrase extraction.
Tailoring these models for particular purposes with very specific training datasets is one of the ways users like The Trevor Project have worked to leverage their benefits while ensuring that they don’t facilitate more disturbing digital conversations.
“Because we were able to fine-tune it to do a very specific job, and only for our internal [Riley and Drew personas]our model did not generate any offensive output,” Fichter said.
When developing its crisis contact simulator and risk assessment model, the organization removed names or other personally identifiable information from the data it used to train the personality models.
But privacy wasn’t the only reason, Fichter said. His team didn’t want machine learning models to jump to conclusions about people with certain names, which could lead to model bias. For example, they didn’t want them to conclude that someone named “Jane” was still a bully just because a teenager in a meltdown in a role-playing scenario complained about someone with that name. .
So far, Fichter said the characters in the Crisis Contact Simulator haven’t used any inappropriate or strange words. Typically, they might just answer “I don’t know” if they can’t generate relevant language.
Yet he said Drew – the Californian in his 20s – had poked fun at Netflix’s social media competition show ‘The Circle’. “Drew made fun of some TV shows he was watching,” Fichter said.