In addition to writing the contents of the database table to HDFS, Sqoop has also provided you with a generated Java source file (widgets.java) written to the current local directory. (After running the sqoop import command above, you can see this file by running ls widgets.java.) Code generation is a necessary part of Sqoop’s import process; as you’ll learn in “Database Imports: A Deeper Look” , Sqoop uses generated code to handle the deserialization of table-specific data from the database source before writing it to HDFS.
The generated class (widgets) is capable of holding a single record retrieved from the imported table. It can manipulate such a record in MapReduce or store it in a SequenceFile in HDFS. (SequenceFiles written by Sqoop during the import process will store each imported row in the “value” element of the SequenceFile’s key-value pair format, using the generated class.)
It is likely that you don’t want to name your generated class widgets since each instance of the class refers to only a single record. We can use a different Sqoop tool to generate source code without performing an import; this generated code will still examine the database table to determine the appropriate data types for each field:
The codegen tool simply generates code; it does not perform the full import. We specified that we’d like it to generate a class named Widget; this will be written to Widget.java. We also could have specified --class-name and other code-generation arguments during the import process we performed earlier. This tool can be used to regenerate code, if you accidentally remove the source file, or generate code with differentsettings than were used during the import.
If you’re working with records imported to SequenceFiles, it is inevitable that you’ll need to use the generated classes (to deserialize data from the SequenceFile storage). You can work with text file-based records without using generated code, but as we’ll see in “Working with Imported Data” , Sqoop’s generated code can handle some tedious aspects of data processing for you.
Additional Serialization Systems
As Sqoop continues to develop, the number of ways Sqoop can serialize and interact with your data is expected to grow. The current implementation of Sqoop at the time of this writing requires generated code that implements the Writable interface. Future versions of Sqoop should support Avro-based serialization and schema generation as well (see “Avro” ), allowing you to use Sqoop in your project without integrating with generated code.
Hadoop Related Interview Questions
|Informatica Interview Questions||Teradata Interview Questions|
|Hadoop Interview Questions||Java Interview Questions|
|Hadoop MapReduce Interview Questions||Apache Pig Interview Questions|
|Machine learning Interview Questions||NoSQL Interview Questions|
|HBase Interview Questions||MongoDB Interview Questions|
|Data Science R Interview Questions|
The Hadoop Distributed Filesystem
Developing A Mapreduce Application
How Mapreduce Works
Mapreduce Types And Formats
Setting Up A Hadoop Cluster
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.