Generated Code - Hadoop

In addition to writing the contents of the database table to HDFS, Sqoop has also provided you with a generated Java source file (widgets.java) written to the current local directory. (After running the sqoop import command above, you can see this file by running ls widgets.java.) Code generation is a necessary part of Sqoop’s import process; as you’ll learn in “Database Imports: A Deeper Look” , Sqoop uses generated code to handle the deserialization of table-specific data from the database source before writing it to HDFS.

The generated class (widgets) is capable of holding a single record retrieved from the imported table. It can manipulate such a record in MapReduce or store it in a SequenceFile in HDFS. (SequenceFiles written by Sqoop during the import process will store each imported row in the “value” element of the SequenceFile’s key-value pair format, using the generated class.)

It is likely that you don’t want to name your generated class widgets since each instance of the class refers to only a single record. We can use a different Sqoop tool to generate source code without performing an import; this generated code will still examine the database table to determine the appropriate data types for each field:

The codegen tool simply generates code; it does not perform the full import. We specified that we’d like it to generate a class named Widget; this will be written to Widget.java. We also could have specified --class-name and other code-generation arguments during the import process we performed earlier. This tool can be used to regenerate code, if you accidentally remove the source file, or generate code with differentsettings than were used during the import.

If you’re working with records imported to SequenceFiles, it is inevitable that you’ll need to use the generated classes (to deserialize data from the SequenceFile storage). You can work with text file-based records without using generated code, but as we’ll see in “Working with Imported Data” , Sqoop’s generated code can handle some tedious aspects of data processing for you.

Additional Serialization Systems

As Sqoop continues to develop, the number of ways Sqoop can serialize and interact with your data is expected to grow. The current implementation of Sqoop at the time of this writing requires generated code that implements the Writable interface. Future versions of Sqoop should support Avro-based serialization and schema generation as well (see “Avro” ), allowing you to use Sqoop in your project without integrating with generated code.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Hadoop Topics