Left off the record: using AI to help improve source diversity in news coverage

Illustration of a crowd of diverse people

Toronto Metropolitan University (TMU) has partnered with NovaceneAI to create a tool that assesses the diversity of sources quoted in news stories.

The research is clear: people don’t generally trust the news media.

Last year, Canadians’ trust in the news media was at its lowest point in seven years. In the United States, a 2022 Pew Research Center survey found that young adults under 30 trust social media information almost as much as national news outlets. Globally, it’s not much better. According to the 2022 Digital News Report from the Reuters Institute at the University of Oxford, trust fell in almost half of the countries in its survey.

What’s to blame for these levels of distrust? Part of the answer can be found in Bias, Bullshit and Lies, a Reuters Institute report that investigates the reasons for low trust in the news media across nine countries. After analyzing thousands of responses, one of the findings indicates that people value a broad range of sources and views, which they feel is where the news media falls short.

Two Toronto Metropolitan University (TMU) researchers have noticed similar trends in news reporting, and are working closely with Novacene to create an algorithm for their Journalism Representation Index (JeRI) – a tool that can “read” a piece of textual journalism to show who is quoted the most in stories, whose voices get the most prominence, and from whose point of view stories are told.

Their ultimate goal is for JeRI to serve as an educational tool for newsrooms across the country – and raise awareness among journalists about the sources, and lack of voices, in their stories to help improve news coverage of traditionally marginalized communities.

From mental health to racial profiling: how sources impact stories

JeRI was born seven years ago. Gavin Adamson, an associate professor at TMU’s School of Journalism, was researching mental health coverage within the news and who the sources were in these stories.

“The kinds of sources may deepen the stigmatizing nature of the news covered – so if you think about cops being sources on stories about people living with mental illness, it immediately sets up a context of crime and violence in that coverage,” he explains. “But think of how different an article can be if you have mental health experts like researchers or doctors. Then, the article might be more about rehabilitation and treatment, or maybe lack of social services.”

That’s when he connected with his colleague Asmaa Malik, a fellow associate professor at TMU’s School of Journalism, who had similar concerns about journalists’ sourcing decisions – particularly on stories related to race.

“When we started to develop (JeRI), we were thinking initially about stories dealing with racial profiling and police carding in Toronto, and who gets quoted in these stories,” Professor Malik explains.

Revealing the sources: developing JeRI’s algorithm

In 2020, the researchers were awarded the Google News Initiative’s North American Innovation Challenge, and received funding to further JeRI’s development. They have since been partnering with the Winnipeg Free Press, which the professors are using as their target market for JeRI, and are focusing on stories related to Indigenous issues and communities across Canada.

“We re-focused our work in some way but in essence, it’s still the same – looking at stories related underrepresented communities and who speaks for them,” Professor Malik says.

The research team also needed to develop JeRI’s algorithm that would sort the stories’ sources into “buckets” including politicians, experts, organizational sources, media, and celebrities – identifying which sources have more prominence in articles.

That’s when they turned to Novacene to develop that algorithm.

“Novacene is on top of the cutting edge of algorithm development, and that’s where their expertise came into play. We couldn’t have done this project without them,” says Professor Adamson. “We knew we needed to find someone to help us develop this, because we’re not computer science experts. We either would’ve had to find a different partner or maybe found some graduate students, but it would have taken a much, much longer time. It was always the vision to find an ideal partner like Novacene.”

Marcelo Bursztein, founder and CEO of Novacene, says the collaboration was a natural fit for the AI company.

“Novacene was very excited to get on board with this project,” says Bursztein. “As experts in natural language processing, we feel privileged to help researchers shed light on how we can improve journalism practices across the country.”

Establishing accountability and transparency

Professor Malik says that JeRI isn’t meant to be a punitive tool, but rather an educational tool for journalists and newsrooms to help them discover how they are covering a particular story – and, whether they are using a diverse group of sources in their stories.

She adds that JeRI could also be useful for readers as a rule of thumb to understand the quality of the journalism they are reading.

For example, JeRI might show journalists and readers that in stories related to Indigenous issues, police or politicians make up 75 per cent of the sources – which sets a context of crime and violence for the reader.

“So much of journalists’ work has been focused on two words: transparency and accountability,” says Professor Malik. “For us, we think it’s important to focus this lens of accountability and transparency on journalism itself.”