Chunking a sentences means breaking/dividing a sentence into various parts of words like word groups and verb groups.
OpenNLP uses a model, a file named en-chunker.bin to detect the sentences. This is a predefined model in OpenNLP to chunk the sentences with the given raw text.
The opennlp.tools.chunker package includes various classes and interfaces which are used to find non-recursive syntactic annotation like noun phrase chunks.
If you want to chunk a sentence use this method chunk() of the ChunkerME class. This method uses tokens of a sentence and POS tags as parameters. So to start the process of chunking, you need to Tokenize the sentence and generate the parts POS tags of it.
To chunk a sentence using OpenNLP library, you need to −
Below mentioned steps are used to write a program to chunk sentences from the given raw text.
In the first step you need to Tokenize the sentences with the tokenize() method of the whitespaceTokenizer class, as mentioned in the following code block.
Let’s create the POS tags of the sentence with the tag() method of the POSTaggerME class, as mentioned in the following code block.
In this step the model for chunking a sentence is presented by the class named ChunkerModel, which includes to the package opennlp.tools.chunker.
To load a sentence detection model −
The chunkerME class of the package opennlp.tools.chunker includes methods to chunk the sentences. This is called as a maximum-entropy-based chunker.
Apply this class and pass the model object created as mentioned in the earlier step.
The chunk() method of the ChunkerME class is used to Break the sentences in the raw text including in it. This method uses two String arrays presenting the tokens and tags, as parameters.
Let’s invoke this method by applying the token array and tag array created in the earlier steps as parameters.
Below mentioned program used to chunk the sentences in the given raw text. Let’s save this program in a file with the name ChunkerExample.java.
Let’s compile and execute the saved Java file from the Command prompt with the below command –
Once you done with executing, the above program reads the given String and chunks the sentences in it, and displays the output as mentioned below.
We can also detect the positions or spans of the chunks using this method which returns an array of objects of the type Span. The class named Span of the opennlp.tools.util package is used to store the start and end integer of sets.
Store the spans returned by the chunkAsSpans()method in the Span array and print them, as mentioned in the following code.
Below mentioned program useful to detect the sentences in the given raw text. Let’s save this program in a file with the name ChunkerSpansEample.java.
Let’s compile and execute the saved Java file from the Command prompt with following commands –
Upon executing, the above code reads the available String and spans of the chunks in it, and provides the following output –
The probs() method of the ChunkerME class offers the probabilities of the last decoded sequence.
Below mentioned program is used to print the probabilities of the last decoded sequence by the chunker. Let’s save this program in a file with the name ChunkerProbsExample.java.
Now execute the saved Java file from the Command prompt with below mentioned commands –
Once you execute the above program it reads the given String, chunks it, and prints the probabilities of the last decoded sequence.
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.