By: Chloe Kestermont, Product Manager
The news can be a source of overwhelming information and helping humans manage massive amounts of textual data is what NLP does best. Our news cycle seems to be rapidly shortening, due mostly to our shift to consuming news online (when was the last time you actually held a newspaper?). Time and mental bandwidth is stretched thin and keeping up with current events sometimes seems impossible. We’ve got information coming at us literally from left and right and media bias can influence the direction of our opinions. Join me in a thought experiment to explore an interesting use case of the Codeq NLP API as I delve into how we can manage this rising tide of information with Natural Language Processing.
Let’s create an app focused on consolidating multiple news sources into one convenient location; we’ll call it “Next Level News.” For this theoretical app, we’ll use the Text Summarization, Named Entity Recognition, Named Entity Linking, Named Entity Salience, and Keyphrase Extraction modules. The Codeq API is a robust set of text understanding tools, containing more than 25 modules that extract rich representations from unstructured textual data, and it can be easily customized. Let’s create our own NLP pipeline based on the linguistic tools we’ll require for our application by using the following python code:
Our mutually imagined app pulls in news from sources covering the entire political spectrum. In these increasingly fraught times, our app will help our users evaluate online articles by labeling them with the media bias of the source, as determined by comparing media bias maps from multiple third party organizations. This will give our users a broad view of the news available and help them assess the sources for themselves.
As a key part of our media bias feature, our app will display summaries of news articles to help our users digest and compare information from a vast variety of sources. After preprocessing the content to extract only the text of the article, we’ll call the text Summarization module (summarize) to automatically generate extractive summaries that contain the most relevant sentences of these news stories. Alternatively, if we want to reduce the size of summaries further, we could use the Summarization with Compression module (summarize_compress), a feature you won’t find in other NLP APIs. This module, where applicable, will remove extraneous clauses without disturbing the main point of the sentences contained in our summaries, generating even more condensed summaries.
In order to improve the organization of information in our app, let’s call the Named Entity Recognition module (ner) which will scour the analyzed articles for named entities found in the sentences of our summaries, highlighting people, places, organizations, dates, and more. In conjunction with this module, we’ll call the Named Entity Linking module(nel), which will produce a list of disambiguated named entities, and a link to their respective Wikipedia pages. This will help us create distinct profiles for homonymous entities (for example Michelle Williams the actress and Michelle Williams the singer). We’ll also call Named Entity Salience module(salience), which will automatically detect the named entities most pertinent to the text, so our app will highlight only the most relevant entities found in each news summary. We can use that information to automatically create deep links that will take our users to an index of current content related to that entity.
A quick mock up of our thought experiment