Faster, more accurate text analysis

From chaos to structure

NovaceneAI has taken an important step in the journey towards accuracy at scale – offering customers both increased speed and greater data accuracy.

The NovaceneAI team is constantly working to develop better products: that’s why the company has launched its new clustering algorithm that can deliver results up to four times faster than its previous version, while still producing highly accurate results. In the AI world, this type of double win is considered “the holy grail” of machine learning.

How it started

Clustering is at the core of much of Novacene’s technology. The company provides its clustering algorithm to its customers through the NovaceneAI Platform, which has clustering functionality as a core feature. These algorithms can help customers analyze and make sense of their data, which can be an extremely time-consuming job for analysts.

For instance, a company might send a survey to employees to gather information about their concerns. If the survey contains open-ended questions, it could take a long time for a research analyst to organize the responses and identify employees’ top concerns.

Novacene’s former clustering algorithm was able to cluster this type of text and extract themes from it – seamlessly giving the analyst the information they needed to inform important decisions.

However, not every customer shares the same speed requirement. “While some customers are perfectly comfortable waiting 10 minutes to get the analysis back, we have customers using our technology inside their own products. These customers expect near-real-time response times,” says Justin Pontalba, who led the development of the company’s new algorithm. “Being able to meet this demand enable these customers to improve their own products and provide better value to their clients.”

Pontalba, who is a software engineer with Novacene, felt driven to create the new and improved algorithm after speaking with such a customer who wanted to analyze 10,000 documents in under 30 seconds.

“That was our starting point, and showed us that we needed to improve our algorithm,” he says.

Pontalba adds that it has always been a priority for the company to create a product that provides customers with accurate results at scale, and he wanted to create an algorithm that met Novacene’s high standards for precision.

Balancing speed and accuracy

Pontalba says analysts who use clustering algorithms sometimes have to choose between accurate results or faster speeds.

In other words, slower algorithms usually yield more accurate results than their faster counterparts.

As the project lead, Pontalba says the tricky part in creating the new algorithm was balancing that need for high accuracy and increased speed – and in the end, he was able to produce the updated model that is now significantly faster, while also beating the previous one in accuracy.

“With the previous algorithm, to process 10,000 text samples, it would take around 410 seconds,” he explains. “Now with the improvements, it can process 10,000 text samples in 114 seconds.”

Pontalba adds that the algorithm is also unsupervised – meaning that it doesn’t have to be trained to organize the data and produce results. Instead, it can take data and automatically organize the information – eliminating the need to manually-code comments to train the algorithm, a task that is cost-prohibitive for most organizations.

He adds that ultimately, the new algorithm can provide meaningful results to clients and give them the answers they need when dealing with large amounts of text samples.

Who can use it?

Pontalba says that Novacene’s improved algorithm can be used across many scenarios – including market research, employee and customer engagement, crisis and reputation management, and more.

“It can be applied to any organizations trying uncover find hidden themes in large amounts of free-flowing text,” he explains.

In a world where technology is constantly evolving – including AI and machine learning – Pontalba says that Novacene is committed to offering its customers the latest and most innovative products.

He says that the company’s updated clustering algorithm is an important step in the journey towards accuracy at scale, and is guided by its customers as it continues to build improve its products. “We are customer-focused and customer-driven, and we are constantly listening to them to help us develop and improve new technologies,” he says.