No matter how efficient you make your ETL system, you still stand a chance of having performance issues. However, as Robin Williams says so eloquently in the film Good Will Hunting, “It’s not your fault.” When you are dealing with very large data sets, sometimes they decide to make their own rules. On more than one occasion, we’ve come across a situation where everything is configured correctly, but for some unexplainable reason, it just doesn’t work!
When a job catches you by surprise and performs with lackluster results, don’t fight it. Simply take a pragmatic approach to find the operation within the process causing the bottleneck and address that specific operation. Monitor areas such as CPU, memory, I/O, and network traffic to determine any high-level bottleneck.
If no substantial bottlenecks are detected outside of the actual ETL process, you need to dive inside the code. Use the process of elimination to narrow down potential bottlenecks. To eliminate operations, you must have the ability to isolate each operation and test it separately. Code isolation tends to be quite difficult if you are hand-coding the entire process in SQL or another procedural language. Virtually all of the ETL tools provide a mechanism to isolate components of a process to determine undesired bottlenecking.
The best strategy is to start with the extraction process; then work your way through each calculation, look-up, aggregation, reformatting, filtering, or any other component of the transformation process; and then finally test the I/O of the actual data load into the data warehouse.
To begin the isolation process for detecting bottlenecks, copy the ETL job and modify the copy of the job to include or exclude appropriate components as needed. As you step through the process, you will likely need to delete the copy and recopy the job to restore changes made to test preceding components. Follow these steps to isolate components of the ETL process to identify bottlenecks:
|
|
Data Warehouse ETL Toolkit Related Tutorials |
|
---|---|
Informatica Tutorial | Data Warehousing Tutorial |
Networking Tutorial | Hadoop Tutorial |
Apache Flume Tutorial |
Data Warehouse ETL Toolkit Related Interview Questions |
|
---|---|
Informatica Interview Questions | Data Warehousing Interview Questions |
Networking Interview Questions | System Administration Interview Questions |
Hadoop Interview Questions | MYSQL DBA Interview Questions |
Data modeling Interview Questions | Hadoop Administration Interview Questions |
Apache Flume Interview Questions | Informatica Admin Interview Questions |
Data Warehouse ETL Toolkit Related Practice Tests |
|
---|---|
Informatica Practice Tests | Data Warehousing Practice Tests |
Networking Practice Tests | System Administration Practice Tests |
Hadoop Practice Tests | MYSQL DBA Practice Tests |
Hadoop Administration Practice Tests |
All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.