OpenNLP Named Entity Recognition - OpenNLP

What is Named Entity Recognition?

Named Entity Recognition is known as the process of finding names, people, places, and other entities. Learn about how to carry out NER through Java program using OpenNLP library.

Named Entity Recognition using open NLP

OpenNLP uses various predefined models namely, en-nerdate.bn, en-ner-location.bin, en-ner-organization.bin, en-ner-person.bin, and en-ner-time.bin to perform various NER tasks. You can find all these files are predefined models designed to detect the respective entities in a given raw text.

The opennlp.tools.namefind package includes different classes and interfaces which are used to perform the NER task. To perform NER task using OpenNLP library, you need to −

  • Load the respective model using the TokenNameFinderModel class.
  • Instantiate the NameFinder class.
  • Find the names and print them.

Here are some steps to be followed to write a program to detect the name entities from a given raw text.

Step 1: Loading the model

Loading the model for sentence detection is highlighted by the class named TokenNameFinderModel, to belong to the package opennlp.tools.namefind.

To load an NER model −

  • Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the appropriate NER model in String format to its constructor).
  • Instantiate the TokenNameFinderModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block.

Step 2: Instantiating the NameFinderME class

The NameFinderME class of the package opennlp.tools.namefind includes various methods to perform the NER tasks. This class applies the Maximum Entropy model to detect the named entities in the given raw text.

Instantiate this class and pass the model object created in the previous step as shown below –

Step 3: Finding the names in the sentence

The find() method of the NameFinderME class is used to find the names in the raw text passed to it. While this method accepts a String variable as a parameter.

You can invoke this method by passing the String format of the sentence to this method.

Step 4: Printing the spans of the names in the sentence

The find() method of the NameFinderME class returns an array of objects of the type Span. The class named Span of the opennlp.tools.util package is used to store the start and endinteger of sets.

You can store the spans returned by the find() method in the Span array and print them, as shown in the following code block.

NER Example

Below mentioned program is used to read the given sentence and recognizes the spans of the names of the persons in it. Now save this program in a file with the name NameFinderME_Example.java.

Now compile and run the saved Java file from the Command prompt with following commands –

Once you execute the above program reads the given String (raw text), detects the names of the persons in it, and displays their positions (spans), as shown below.

Names along with their Positions

The substring() method of the String class uses the beginand the end offsets and returns the respective string. You can use this method to print the names and their spans (positions) as mentioned in the following code block.

Let’s find this program to detect the names from the given raw text and display them along with their positions. Now save this program in a file with the name NameFinderSentences.java.

Now compile and execute the saved Java file from the Command prompt with below commands –

Once you execute the above program the it reads the given String (raw text)and detects the names of the persons in it, and provides their positions (spans) as mentioned below.

Finding the Names of the Location

Once you load the various models, you can detect various named entities. Below mentioned Java program loads the en-ner-location.bin model and detects the location names in the given sentence. Now you can save this program in a file with the name LocationFinder.java.

Let’s compile and execute the saved Java file from the Command prompt with following commands –

Once you execute the above program then it reads the given String (raw text) and detects the names of the persons in it, and displays their positions (spans), as mentioned below.

NameFinder Probability

The probs()method of the NameFinderME class is used to get the probabilities of the last decoded sequence.

Below mentioned program used to print the probabilities. Let’s save this program in a file with the name TokenizerMEProbs.java.

Now compile and execute the saved Java file from the Command prompt with below commands –

Once you execute the above program then it reads the given String, tokenizes the sentences, and prints them. You can also returns the probabilities of the last decoded sequence, as mentioned below.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

OpenNLP Topics