Worth a thousand words: enhancing data visualization for researchers and analysts

An office setting with a big screen showing a data visualization tool

As analysts rely on data visualization tools to inform their work, a new prototype supported by Novacene offers more opportunities to help them uncover new insights.

Data visualization has major benefits for analysts. Among its many perks, it makes data easier to understand, creates better user engagement, and helps identify patterns.

But while data visualization tools can help analysts better understand trends and sentiments faster, there are still ways to improve it and build upon it. As this technology continues to evolve, members of Novacene’s team, in collaboration with Carleton University researchers, have designed and developed TextVista – an advanced data visualization tool that offers analysts a new way to make sense of their unstructured text.

The tool unearths new information. It tells the data’s full story, as it unfolded over time. And it solves mysteries that often loom over analysts – revealing the “who”, “what”, “when”, and “how” of their data. 

TextVista’s design and development has been detailed in the newly-published paper TextVista: NLP-Enriched Time-Series Text Data Visualizations. The paper was also recognized at the 2024 Graphics Interface (GI) conference, which is an annual international conference devoted to computer graphics and human-computer interaction. Now, findings from this research – as well as TextVista’s capabilities – are being integrated into the NovaceneAI® Platform.

Learning through time

Beck Langstone, AI Solutions Analyst at NovaceneAI and graduate of the Department of Human-Computer Interaction at Carleton University, designed and developed the prototype for her thesis project in collaboration with Novacene, fellow researchers, and industry experts.

As part of the project, Novacene enriched the industry experts’ text data with NLP techniques such as clustering, cluster labelling, sentiment analysis, and threat detection. Langstone found that all of the industry experts were interested in the temporal nature of their text data – seeking answers to questions like, “What are the trends in topics over time?”, and “What are the trends in sentiment over time?”

Designed iteratively with this industry expert input, TextVista was developed to help analysts understand important temporal trends in their text data – illustrating, through easy-to-digest visualizations, their data’s story as it unraveled over a time period.  

“A lot of text visualization tools that analyze text, using NLP, don’t consider time. That temporal dimension allows analysts to understand events over a specific timeline, which unlocks valuable insights and new discoveries in their data,” Langstone explains.

In the process of designing the prototype, Langstone and the team discovered that analysts also needed to understand trends in the relationships between people or groups, as well as between people and topics – and understand the sentiment’s role in these relationships.

Olivier Dupuis, a data engineer and analyst, was an early user of the prototype when he was tracking protest movements across North America.

He says the tool allowed him to focus on specific events and how they unfolded, and also, added context and fairness to events that triggered differing opinions among the public – for example, an incident that occurred on a New York City subway in 2023.

Dupuis used the tool to analyze the protests that followed this event, and how and why sentiments around events like these changed over time.

“There were so many sides to that story,” Dupuis says. “There were issues about homelessness, about not having access to services for people that were in need in New York. But then on the other side, there were issues about legitimate defense.”

Dupuis says that TextVista’s graphs showed him the main topics people discussed that stemmed from this single event. These visuals also deepened his understanding of what was being advocated for during protests, and revealed the differing media narratives over time.

Daniil Kulik, a Machine Learning (ML) Engineer at Novacene and Carleton University masters program graduate, co-authored the paper and says that the prototype’s time element is a unique feature. 

“We are making the step towards more dynamic visualization,” Kulik says. “Before, data visualization was static – where everything is fixed, and doesn’t have this temporal axis. Now we have something that is dynamic, where analysts can see how things change over time.”

Fostering reading and serendipity

Langstone says that TextVista was also designed to allow the analyst to connect the visualization directly to the source material, helping them make sense of what the patterns revealed by visualization actually meant.

Reading is an important part of the process of sensemaking, she says, and many visualizations relegate this important task to a tiny dialogue box, or worse, leaves users to use a secondary software to open and read the text represented in the visualization.

“Many (current data visualization tools) fail to support reading within the visualization,” she says. “(This prototype) helps answer a lot of questions about messy data with a very direct link to reading, which is an important part of the sense making process – when you’re trying to understand what a trend means, and what it’s about.”

She says the tool was also designed to foster serendipity, meaning that it allows analysts to make fortuitous or unexpected discoveries, and is designed in a way that helps the analyst link together previously unconnected ideas to discover new insights.

“By giving users many different avenues to explore their data, to increase the amount of paths available to them, you increase the opportunity for serendipity,” Langstone explains, adding that analysts find reading and serendipity within a data visualization tool to be valuable. “It seems like a very abstract concept, but it’s a really important part of discovery analysis.”

Dupuis adds that the tool was advantageous because it didn’t require him to have his own hypothesis going into the analysis. Rather, TextVista allowed him to explore the data and the relationships between the data – which led to questions that he could further explore.

“With this tool, you don’t need to start from scratch. You already have something that gives you an idea of the dynamic of your data, and afterwards, you can go deeper into those questions,” Dupuis says. “With other tools, you need to start from scratch.”

Novacene’s support

Langstone and Kulik’s work was supported by the Novacene team, including Dr. Fateme Rajabiyazdi who is a Novacene advisor and assistant professor at Carleton University’s Department of Systems and Computer Engineering.

Partnering through an NSREC Alliance grant, Novacene guided the researchers and helped them refine the tool.

“Novacene gave us really good feedback on user experience, to help the tool meet the needs of future customers who will be using it,” Kulik says.

Marcelo Bursztein, Founder and CEO of Novacene, says the technology company is constantly challenging its technical merit and this new development validates its work in AI, ML, and NLP.

“We were excited to support the creation of this technology, which received international recognition at the GI Conference at Dalhousie University,” Bursztein says. “The development of this prototype aligns with Novacene’s ongoing commitment to partnering with, and supporting, the most innovative and creative research projects – as well as applying the results to bring our customers the most advanced solutions in the market.”