Part 9. Semantic Similarity

By Rodrigo Alarcón, Computational Linguist

The complete list of modules we offer can be found in our documentation:

Codeq NLP API Documentation

Calling the Semantic Similarity endpoint

The endpoint to get the semantic similarity between texts can also be called using an instance of our Python SDK. As usual, to create an instance of this client, you need to use your API credentials as input parameters.

Copy to Clipboard

Instead of defining a pipeline with the names of some NLP Annotators, as we have been doing in previous tutorials, in this case you need to use a different method of the client to get the similarity between texts:

Copy to Clipboard

This method requires as input two strings and returns as output a dict containing the text_similarity_score:

Copy to Clipboard

The similarity score indicates the semantic relatedness between the input texts, expressed in the range of 1 to 5, where 1 means highly non-related and 5 means highly related:

  • 5 – The two sentences are completely equivalent, as they mean the same thing.
  • 4 – The two sentences are mostly equivalent, but some unimportant details differ.
  • 3 – The two sentences are roughly equivalent, but some important information differs, or is missing from one or the other.
  • 2 – The two sentences are not equivalent, but share some details or are on the same topic.
  • 1 – The two sentences are completely dissimilar.

Wrap up

In this tutorial we described how to use the Semantic Similarity endpoint of the Codeq NLP API. The code below summarizes how to iterate over its output:

Copy to Clipboard

Take a look at our documentation to learn more about the NLP tools we provide.

Do you need inspiration? Go to our use case demos and see how you can integrate different tools.

In our NLP demos section you can also try our tools and find examples of the output of each module.