By Rodrigo Alarcón, Computational Linguist
Codeq’s NLP API includes text summarization modules that can help you to identify the most relevant content from texts. In this tutorial we will detail how to use modules related to text summarization, sentence compression and extraction of keyphrases.
Previous tutorials can be found here:
The complete list of modules we offer can be found in our documentation:
To call Codeq’s NLP API you need to create an instance of our Python SDK client using your API credentials as input parameters. After that, you need to declare a pipeline containing the annotators you are interested in. The client and the pipeline can be used to send a text and get as response a Document object containing the output of the desired annotators. For a quick overview of the output, you can use the method document.pretty_print(). For each annotator in this tutorial we will detail:
- the keyword (KEY) used to call the annotator,
- the attribute (ATTR) where the output is stored.
This module generates as output an extractive summary with the most relevant sentences of the input text. In the case of this annotator, the output is stored at the level of the Document object.
- KEY: summarize
- ATTR: document.summary
This module aims to generate, from a given sentence, a new, shorter one that retains the main point of the original, while possibly omitting some less central details. It can be thought of as the single-sentence counterpart to document summarization.
The output of this module is stored at the Sentence level.
- KEY: compress
- ATTR: sentence.compressed_sentence
In this case, the annotator generates an extractive summary with the most relevant sentences of the input text in their compressed forms, independently of whether the compress Annotator is specified in the pipeline or not.
The output of this module is stored at the Document level.
- KEY: summarize_compress
- ATTR: document.compressed_summary
This module is in charge of finding, for a given document, a list of short phrases that give a user a sense of the topics covered by the document. For example, for documents of a more technical nature, the retrieved keyphrases should include the technical terms most relevant to the topic of the paper, whereas for a news article, the keyphrases should include names of people, organizations, etc. relevant to the article.
The output of this module is stored at the Document level in two forms: the list of keyphrases as strings, and the list of keyphrases as tuples including their relevance score.
- KEY: keyphrases
- ATTR: document.keyphrases
- ATTR: document.keyphrases_scored
In this tutorial we described some modules of the Codeq NLP API that can be used to summarize texts and get relevant keyphrases. The code below summarizes the pipeline names to call each annotator and the variables used to store their output:
Take a look at our documentation to learn more about the NLP tools we provide.
Do you need inspiration? Go to our use case demos and see how you can integrate different tools.
In our NLP demos section you can also try our tools and find examples of the output of each module.