OpenNLP Chunking Sentences - OpenNLP

What is meant by chunking sentences?

Chunking a sentences means breaking/dividing a sentence into various parts of words like word groups and verb groups.

Chunking a Sentence using OpenNLP

OpenNLP uses a model, a file named en-chunker.bin to detect the sentences. This is a predefined model in OpenNLP to chunk the sentences with the given raw text.

The opennlp.tools.chunker package includes various classes and interfaces which are used to find non-recursive syntactic annotation like noun phrase chunks.

If you want to chunk a sentence use this method chunk() of the ChunkerME class. This method uses tokens of a sentence and POS tags as parameters. So to start the process of chunking, you need to Tokenize the sentence and generate the parts POS tags of it.

To chunk a sentence using OpenNLP library, you need to −

  • Tokenize the sentence.
  • Generate POS tags for it.
  • Load the en-chunker.bin model using the ChunkerModel class
  • Instantiate the ChunkerME class.
  • Chunk the sentences using the chunk() method of this class.

Below mentioned steps are used to write a program to chunk sentences from the given raw text.

Step 1: Tokenizing the sentence

In the first step you need to Tokenize the sentences with the tokenize() method of the whitespaceTokenizer class, as mentioned in the following code block.

Step 2: Generating the POS tags

Let’s create the POS tags of the sentence with the tag() method of the POSTaggerME class, as mentioned in the following code block.

Step 3: Loading the model

In this step the model for chunking a sentence is presented by the class named ChunkerModel, which includes to the package opennlp.tools.chunker.

To load a sentence detection model −

  • Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the model in String format to its constructor).
  • Instantiate the ChunkerModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block –

Step 4: Instantiating the chunkerME class

The chunkerME class of the package opennlp.tools.chunker includes methods to chunk the sentences. This is called as a maximum-entropy-based chunker.

Apply this class and pass the model object created as mentioned in the earlier step.

Step 5: Chunking the sentence

The chunk() method of the ChunkerME class is used to Break the sentences in the raw text including in it. This method uses two String arrays presenting the tokens and tags, as parameters.

Let’s invoke this method by applying the token array and tag array created in the earlier steps as parameters.

Example

Below mentioned program used to chunk the sentences in the given raw text. Let’s save this program in a file with the name ChunkerExample.java.

Let’s compile and execute the saved Java file from the Command prompt with the below command –

Once you done with executing, the above program reads the given String and chunks the sentences in it, and displays the output as mentioned below.

Detecting the Positions of the Tokens

We can also detect the positions or spans of the chunks using this method which returns an array of objects of the type Span. The class named Span of the opennlp.tools.util package is used to store the start and end integer of sets.

Store the spans returned by the chunkAsSpans()method in the Span array and print them, as mentioned in the following code.

Example

Below mentioned program useful to detect the sentences in the given raw text. Let’s save this program in a file with the name ChunkerSpansEample.java.

Let’s compile and execute the saved Java file from the Command prompt with following commands –

Upon executing, the above code reads the available String and spans of the chunks in it, and provides the following output –

Chunker Probability Detection

The probs() method of the ChunkerME class offers the probabilities of the last decoded sequence.

Below mentioned program is used to print the probabilities of the last decoded sequence by the chunker. Let’s save this program in a file with the name ChunkerProbsExample.java.

Now execute the saved Java file from the Command prompt with below mentioned commands –

Once you execute the above program it reads the given String, chunks it, and prints the probabilities of the last decoded sequence.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

OpenNLP Topics