Codeq NLP API Tutorial 7

Part 7. Summarization

By Rodrigo Alarcón, Computational Linguist

Codeq’s NLP API includes text summarization modules that can help you to identify the most relevant content from texts. In this tutorial we will detail how to use modules related to text summarization, sentence compression and extraction of keyphrases.

Previous tutorials can be found here:

The complete list of modules we offer can be found in our documentation:

Codeq NLP API Documentation

Define a NLP pipeline and analyze a text

To call Codeq’s NLP API you need to create an instance of our Python SDK client using your API credentials as input parameters. After that, you need to declare a pipeline containing the annotators you are interested in. The client and the pipeline can be used to send a text and get as response a Document object containing the output of the desired annotators. For a quick overview of the output, you can use the method document.pretty_print(). For each annotator in this tutorial we will detail:

the keyword (KEY) used to call the annotator,
the attribute (ATTR) where the output is stored.

Copy to Clipboard

from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="USER_ID", user_key="USER_KEY")

pipe = [
    "summarize", "compress", "summarize_compress", "keyphrases"
]

text = "A gunman who shot dead a uniformed officer outside the Biloxi police station remained " \
       "on the run on Monday, the subject of an intense manhunt along Mississippi's Gulf coast. " \
       "It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran " \
       "who was scheduled to retire this year. Biloxi's police chief, John Miller, said police did not know " \
       "if he was targeted, or the victim of a random act. The animal that did this is still on the run, " \
       "Miller told reporters. We're going to do everything within our power to bring him to justice " \
       "for Robert and his family. Authorities say the man approached McKeithen in the station's parking " \
       "lot on Sunday night and shot him multiple times, either before or after coming inside the station."

document = client.analyze(text, pipeline=pipe)

print(document.pretty_print())

Summarization

This module generates as output an extractive summary with the most relevant sentences of the input text. In the case of this annotator, the output is stored at the level of the Document object.

KEY: summarize
ATTR: document.summary

Copy to Clipboard

pipe = [
    "summarize"
]

document = client.analyze(text, pipeline=pipe)

summary = document.summary

print(summary)

# Output:
# 
# It was unclear what prompted the killing of Officer Robert McKeithen, 
# a 23-year veteran who was scheduled to retire this year. 
# Biloxi's police chief, John Miller, said police did not know if he was targeted, 
# or the victim of a random act.

Sentence Compression

This module aims to generate, from a given sentence, a new, shorter one that retains the main point of the original, while possibly omitting some less central details. It can be thought of as the single-sentence counterpart to document summarization.

The output of this module is stored at the Sentence level.

KEY: compress
ATTR: sentence.compressed_sentence

Copy to Clipboard

pipe = [
    "compress"
]

document = client.analyze(text, pipeline=pipe)

for sentence in document.sentences:
    raw_sentence = sentence.raw_sentence
    compressed_sentence = sentence.compressed_sentence
    if raw_sentence != compressed_sentence:
        print("original: %s" % raw_sentence)
        print("compressed: %s\n" % compressed_sentence)

# Output:
# 
# original: A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday, the subject of an intense manhunt along Mississippi's Gulf coast.
# compressed: A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday.
# 
# original: It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year.
# compressed: It was unclear what prompted the killing of Officer Robert McKeithen.
# 
# original: Authorities say the man approached McKeithen in the station's parking lot on Sunday night and shot him multiple times, either before or after coming inside the station.
# compressed: Authorities say the man approached McKeithen in the station's parking lot on Sunday night and shot him multiple times.

Summarization with Compression

In this case, the annotator generates an extractive summary with the most relevant sentences of the input text in their compressed forms, independently of whether the compress Annotator is specified in the pipeline or not.

The output of this module is stored at the Document level.

KEY: summarize_compress
ATTR: document.compressed_summary

Copy to Clipboard

pipe = [
    "summarize", "summarize_compress"
]

document = client.analyze(text, pipeline=pipe)

print("summary: %s\n" % document.summary)
print("compressed_summary: %s" % document.compressed_summary)

# Output:
#
# summary: It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi's police chief, John Miller, said police did not know if he was targeted, or the victim of a random act.
# 
# compressed_summary: It was unclear what prompted the killing of Officer Robert McKeithen. Biloxi's police chief, John Miller, said police did not know if he was targeted, or the victim of a random act.

Keyphrase Extraction

This module is in charge of finding, for a given document, a list of short phrases that give a user a sense of the topics covered by the document. For example, for documents of a more technical nature, the retrieved keyphrases should include the technical terms most relevant to the topic of the paper, whereas for a news article, the keyphrases should include names of people, organizations, etc. relevant to the article.

The output of this module is stored at the Document level in two forms: the list of keyphrases as strings, and the list of keyphrases as tuples including their relevance score.

KEY: keyphrases
ATTR: document.keyphrases
ATTR: document.keyphrases_scored

Copy to Clipboard

pipe = [
    "keyphrases"
]
text = "A gunman who shot dead a uniformed officer outside the Biloxi police station remained " \
       "on the run on Monday, the subject of an intense manhunt along Mississippi's Gulf coast. " \
       "It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran " \
       "who was scheduled to retire this year. Biloxi's police chief, John Miller, said police did not know " \
       "if he was targeted, or the victim of a random act. The animal that did this is still on the run, " \
       "Miller told reporters. We're going to do everything within our power to bring him to justice " \
       "for Robert and his family. Authorities say the man approached McKeithen in the station's parking " \
       "lot on Sunday night and shot him multiple times, either before or after coming inside the station."

document = client.analyze(text, pipeline=pipe)

print("Keyphrases:\n")
for k in document.keyphrases:
    print(k)

print("Keyphrases Scored:\n")
for k in document.keyphrases_scored:
    print(k)

# Output:
# 
# 
# Keyphrases:
# 
# Biloxi police station
# Biloxi 's police chief
# Officer Robert McKeithen
# the station 's parking lot
# Mississippi 's Gulf coast
# Sunday night and shot
# him multiple times
# Monday
# John Miller
#
# Keyphrases Scored:
#
# ['Biloxi police station', 0.14236053468171103]
# ["Biloxi 's police chief", 0.12844081612661434]
# ['Officer Robert McKeithen', 0.12583178746051182]
# ["the station 's parking lot", 0.11744640267914488]
# ["Mississippi 's Gulf coast", 0.11721120752467706]
# ['Sunday night and shot', 0.11115943448261123]
# ['him multiple times', 0.09358212954494648]
# ['Monday', 0.08635075551783025]
# ['John Miller', 0.0776169319819529]

Wrap Up

In this tutorial we described some modules of the Codeq NLP API that can be used to summarize texts and get relevant keyphrases. The code below summarizes the pipeline names to call each annotator and the variables used to store their output:

Copy to Clipboard

from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="USER_ID", user_key="USER_KEY")

pipe = [
    "summarize", "compress", "summarize_compress", "keyphrases"
]

document = client.analyze(text, pipeline=pipe)

# Document Level

# Summary:
print("\nSummary:")
print(document.summary)

# Summary Compressed:
print("\nSummary Compressed:")
print(document.compressed_summary)

# Keyphrases
print("\nKeyphrases:")
for k in document.keyphrases:
    print(k)

# Keyphrases Scored
print("\nKeyphrases Scored:")
for k in document.keyphrases_scored:
    print(k)

# Sentence Level

# Compressed Sentences
print("\nCompressed Sentences:")
for sentence in document.sentences:
    raw_sentence = sentence.raw_sentence
    compressed_sentence = sentence.compressed_sentence
    if raw_sentence != compressed_sentence:
        print("original: %s" % raw_sentence)
        print("compressed: %s\n" % compressed_sentence)

Take a look at our documentation to learn more about the NLP tools we provide.

Do you need inspiration? Go to our use case demos and see how you can integrate different tools.

In our NLP demos section you can also try our tools and find examples of the output of each module.

Codeq NLP API Tutorial 7

Part 7. Summarization

Codeq NLP API Documentation

Define a NLP pipeline and analyze a text

Summarization

Sentence Compression

Summarization with Compression

Keyphrase Extraction

Wrap Up

Share This Story, Choose Your Platform!

Related Posts

Codeq’s Summarizer Updated with Summary Length Option

The ‘ncomp’ dependency label

Semantic Role Labeler Argument Categories