And Dartmouth researchers want to make sure chatbots don’t turn toxic.
Our emails nearly write themselves these days. Just a few words in, automatic suggestions pop up to finish sentences articulately.
This is but one example of natural language processing, a rapidly advancing branch of artificial intelligence that is enabling computers to gain mastery over language.
A case in point is the recently unveiled ChatGPT, an AI-powered interface that has taken the world by storm. The chatbot has impressed and amused users who have been putting it through its paces—debating with it, making it explain code, create poetry, and even write sitcom scripts.
At the core of such applications are language models, statistical tools that peruse a mind-boggling volume of existing text to assimilate content as well as underlying linguistic patterns, and thereby develop the ability to reliably predict what might follow a word prompt.
But given all the variables in languages as rich as English, that can pose challenges as well as rewards.
“This means that these models are going to be reflecting what they see in their training data—biases, toxicity, and all,” says Soroush Vosoughi, an assistant professor of computer science. His lab is working to develop strategies that can identify and mitigate undesirable behaviors that may arise from inbuilt flaws.
Vosoughi is a recipient of the Google Research Scholar Award in 2022 and an Amazon Research Award in 2019, and his work has received multiple best paper awards and nominations.
Some of Vosoughi’s research in the area has focused on a domain he refers to as “social natural language processing,” where language models are used either to analyze sociopolitical issues such as misinformation or bias in digital content or to generate sociopolitical text.
“In the last few years language models have become very powerful, and it is only a matter of time before they are deployed in user-facing systems.”
For example, in a paper published this summer in the Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, Vosoughi and his co-author took on the problem of propaganda on Twitter. They developed a language model that can not only flag posts as propaganda but also zero in on the techniques of manipulation at play—whataboutism, obfuscation, strawman arguments, and the like.
The model combs through posts and replies, seeking instances where people call out others for using a particular propaganda technique. An example they share is the response to a tweet that cites a poll where 47% of the respondents believed that a candidate resorted to fraud to win. It reads: “This is what is called the bandwagon fallacy. Just because a lot of people believe something, that doesn’t make it true.”
By compiling its own training data in this manner, their model distinguishes itself from others that use labeled datasets that require human inputs to identify propaganda, a labor-intensive task that runs the risk of harboring the biases of the labelers. The model can also be easily expanded to work in a large set of languages since it compiles its own training data.
The researchers present a dataset of tweets categorized by propaganda technique—the very first of its kind. “Identifying the type of propaganda in this fine-grained manner sets the stage for determining how to best respond to each based on what is known in political science literature,” says Soroush.
While the paper did not propose interventions to counter propaganda, other studies by Vosoughi’s lab go beyond merely detecting issues in language models. Working with collaborators, graduate student Ruibo Liu, Guarini ’23, and Vosoughi devised a method to measure political bias in a previous incarnation of ChatGPT and developed a framework to mitigate the bias.
Without their intervention, the model was prone to complete a prompt such as “I’m from Massachusetts. I will vote…” with suggestions that indicated liberal ideologies. By reinforcing weaker associations between certain attributes such as gender, location, and topic with political biases, they nudged the suggestions towards an unbiased stance. Their work won an Outstanding Paper Award in AI for social impact category at the 2021 AAAI Conference on Artificial Intelligence.
Their most recent research tackles a more complex problem, that of editing AI-generated text to align with human values. Encoding human values is very difficult—not only do they differ across regions, cultures, and languages, they also shift over time. There are, however, certain well-known human values we can draw upon from work in moral philosophy, says Vosoughi, like choosing the greatest good for the greatest number as the most ethical course of action.
The study, which focuses on analyzing AI-generated texts for a violation of a defined set of values and then modifying the generated texts through a chain of edits towards alignment, was presented at the conference on Neural Information Processing Systems earlier this month.
“In the last few years language models have become very powerful, and it is only a matter of time before they are deployed in user-facing systems,” says Vosoughi. At this crucial juncture, he says, it is important to build tools to discover and correct issues that could potentially cause harm as they emerge from labs into real world applications.
Publication: Prashanth Vijayaraghavan, et al., TWEETSPIN: Fine-grained Propaganda Detection in Social Media Using Multi-View Representations, ACL Anthology (2023). DOI: 10.18653/v1/2022.naacl-main.251
Original Story Source: Biological Sciences at Dartmouth