Apache pig-Eval function-Diff() - Apache Pig

What is DIFF()function in Apache pig?

The DIFF() function of Pig Latin is used to compare two bags (fields) in a tuple. It takes two fields of a tuple as input and matches them. If they match, it returns an empty bag. If they do not match, it finds the elements that exist in one field (bag) and not found in the other, and returns these elements by wrapping them within a bag.


Given below is the syntax of the DIFF() function.


Generally the DIFF() function compares two bags in a tuple. Given below is its example, here we create two relations, cogroup them, and calculate the difference between them.
Assume that we have two files namely emp_sales.txt and emp_bonus.txt in the HDFS directory /pig_data/ as shown below. The emp_sales.txt contains the details of the employees of the sales department and the emp_bonus.txt contains the employee details who got bonus.
And we have loaded these files into Pig, with the relation names emp_sales and emp_bonus respectively.
Group the records/tuples of the relations emp_sales and emp_bonus with the key sno, using the COGROUP operator as shown below.
Verify the relation cogroup_data using the DUMP operator as shown below.

Calculating the Difference between Two Relations

Let us now calculate the difference between the two relations using DIFF() function and store it in the relation diff_data as shown below.


Verify the relation diff_data using the DUMP operator as shown below.
The diff_data relation will have an empty tuple if the records in emp_bonus and emp_sales match. In other cases, it will hold tuples from both the relations (tuples that differ).
For example, if you consider the records having sno as 1, then you will find them same in both the relations ((1,Robin,22,25000,sales), (1,Robin,22,25000,sales)). Therefore, in the diff_data relation, which is the result of DIFF() function, you will get an empty tuple for sno 1.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Apache Pig Topics