Apache Pig – Eval Function-COUNT_STAR() - Apache Pig

What is COUNT_STAR() function?

The COUNT_STAR() function of Pig Latin is similar to the COUNT() function. It is used to get the number of elements in a bag. While counting the elements, the COUNT_STAR() function includes the NULL values.
Note −
  • To get the global count value (total number of tuples in a bag), we need to perform a Group All operation, and calculate the count_star value using the COUNT_STAR() function.
  • To get the count value of a group (Number of tuples in a group), we need to group it using the Group By operator and proceed with the count_star function.

Syntax

Given below is the syntax of the COUNT_STAR() function.

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. This file contains an empty record.
student_details.txt
And we have loaded this file into Pig with the relation name student_details as shown below.

Calculating the Number of Tuples

We can use the built-in function COUNT_STAR() to calculate the number of tuples in a relation. Let us group the relation student_details using the Group All operator, and store the result in the relation named student_group_all as shown below.
It will produce a relation as shown below.
Let us now calculate the number of tuples/records in the relation.

Verification

Verify the relation student_count using the DUMP operator as shown below.

Output

It will produce the following output, displaying the contents of the relation student_count.
Since we have used the function COUNT_STAR(), it included the null tuple and returned 9.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Apache Pig Topics