OpenNLP Finding Parts of Speech - OpenNLP

How to detect parts of speech?

You can also detect the Parts of Speech of a given sentence and print them with the help of openNlp. You can use short forms of speech than using the full name of the parts of speech. The following table indicates the various parts of speeches detected by OpenNLP and their meanings.

Parts of Speech Meaning of parts of speech
NN Noun, singular or mass
DT Determiner
VB Verb, base form
VBD Verb, past tense
VBZ Verb, third person singular present
IN Preposition or subordinating conjunction
NNP Proper noun, singular
TO to
JJ Adjective

Tagging the Parts of Speech

If you want to tag the parts of speech of a sentence, use a model with a file named en-posmaxent.bin. This is also known as a predefined model used to train to tag the parts of speech of the given raw text.

The POSTaggerME class of the opennlp.tools.postag package is mainly used to load this model and tag the parts of speech of the given raw text using OpenNLP library. To do so, you need to −

  • Load the en-pos-maxent.bin model using the POSModel class.
  • Instantiate the POSTaggerME class.
  • Tokenize the sentence.
  • Generate the tags using tag() method.
  • Print the tokens and tags using POSSample class.

Here are some important steps to be followed to write a program to tag the parts of the speech in the given raw text using the POSTaggerME class.

Step 1: Load the model

Here this model for POS tagging is marked by the class named POSModel, which belongs to the package opennlp.tools.postag.

To load a tokenizer model −

  • Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the model in String format to its constructor).
  • Instantiate the POSModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block –

Step 2: Instantiating the POSTaggerME class

The POSTaggerME class of the package opennlp.tools.postag is mainly used to estimate the parts of speech of the given raw text. You can use maximum Entropy to make its decisions.

Following step shows the class and pass the model object created in the previous step.

Step 3: Tokenizing the sentence

The tokenize() method of the whitespaceTokenizer class is mainly created to tokenize the raw text passed to it. This method follows a string variable as a parameter, and returns an array of Strings (tokens).

Instantiate the whitespaceTokenizer class and the invoke this method by passing the String format of the sentence to this method.

Step 4: Generating the tags

To generate the tag() method of the whitespaceTokenizer class provides POS tags to the sentence of tokens. This method easily accepts an array of tokens (String) as a parameter and returns tag (array).

While,iInvoke the tag() method by passing the tokens create in the previous step to it.

Step 5: Printing the tokens and the tags

The POSSample class presents the POS-tagged sentence. To apply this class, we need to have an array of tokens (of the text) and an array of tags.

The toString() method of this class takes back the tagged sentence. Instantiate this class by creating the token and the tag arrays created in the previous steps and invoke its toString()method, as mentioned in the below code.

Example

Below mentioned program tags the parts of speech in the mentioned raw text. Let’s save this program in a file with the name PosTaggerExample.java.

Let’s compile and execute the saved Java file from the Command prompt with the help of below commands –

Once you execute the above code reads the given text and detects the parts of speech of these sentences and displays them, as mentioned below.

POS Tagger Performance

Let’s see the below program to tag the parts of speech of a given raw text. It also checks the performance and displays the performance of the tagger. Save this program in a file with the name PosTagger_Performance.java.

Now compile and execute the saved Java file from the Command prompt with below commands –

Once you execute the above code it reads the given text and tags the parts of speech of these sentences and displays them. Along with this it also monitors the performance of the POS tagger and displays it.

POS Tagger Probability

The probs() method of the POSTaggerME class is mainly used to find the probabilities for each tag of the recently tagged sentence.

Below mentioned program displays the probabilities for each tag of the last tagged sentence. Now save this program in a file with the name PosTaggerProbs.java.

Let’s compile and execute the saved Java file from the Command prompt with below commands –

Once you execute the above code reads the given raw text, tags the parts of speech of each token in it, and displays them. Along with the its provides the probabilities for each parts of speech in the given sentence, as mentioned below.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

OpenNLP Topics