Data Warehouse ETL Toolkit Interview Questions & Answers

5 avg. rating (100% score) - 1 votes

Data Warehouse ETL Toolkit Interview Questions & Answers

Data Warehouse ETL Toolkit helps the developers of data warehouse to effectively handle the ETL i.e., Extract, Transform and Load of the development cycle. It has the complete list of various methods to extract the data and ways to transform it and load the data. One can check the availability of the job across cities including Mumbai, Delhi, Bangalore, Pune and Hyderabad. Data Warehouse ETL Toolkit role needs candidates to have good knowledge on scripting languages such as Java, JavaScript. Wisdomjobs has interview questions which are exclusively designed for job seekers to assist them in clearing job interviews. Data Warehouse ETL Toolkit interview questions and answers are useful for developers to attend job interviews and get selected for Data Warehouse ETL Toolkit job position.

Data Warehouse ETL Toolkit Interview Questions

Data Warehouse ETL Toolkit Interview Questions
    1. Question 1. What Is Etl?

      Answer :

      ETL stands for extraction transformation and loading
      ETL provide developers with an interface for designing source-to-target mappings, transformation and job control parameter
      * Extraction
      Take data from an external source and move it to the warehouse pre-processor database
      * Transformation
      Transform data task allows point-to-point generating, modifying and transforming data
      * Loading
      Load data task adds records to a database table in a warehouse.

    2. Question 2. What Is A Three Tier Data Warehouse?

      Answer :

      A data warehouse can be thought of as a three-tier system in which a middle system provides usable data in a secure way to end users. On either side of this middle system are the end users and the back-end data stores.

    3. Question 3. What Is The Metadata Extension?

      Answer :

      Informatica allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. For example, when you create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions.

      Informatica Client applications can contain the following types of metadata extensions:
      Vendor-defined: Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them.

      User-defined: You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions.

    4. Question 4. Can We Override A Native Sql Query Within Informatica? Where Do We Do It? How Do We Do It?

      Answer :

      Yes,we can override a native sql query in source qualifier and lookup transformation.

      In lookup transformation we can find "Sql override" in lookup properties. by using this option we can do this.

    5. Question 5. How Can We Use Mapping Variables In Informatica? Where Do We Use Them?

      Answer :

      Yes. we can use mapping variable in Informatica.

      The Informatica server saves the value of mapping variable to the repository at the end of session run and uses that value next time we run the session.

    6. Question 6. What Are Snapshots? What Are Materialized Views & Where Do We Use Them? What Is A Materialized View Log?

      Answer :

      Snapshots are read-only copies of a master table located on a remote node which is periodically refreshed to reflect changes made to the master table. Snapshots are mirror or replicas of tables.

      Views are built using the columns from one or more tables. The Single Table View can be updated but the view with multi table cannot be updated.

      A View can be updated/deleted/inserted if it has only one base table if the view is based on columns from one or more tables then insert, update and delete is not possible.

      Materialized view:
      A pre-computed table comprising aggregated or joined data from fact and possibly dimension tables. Also known as a summary or aggregate table.

    7. Question 7. Can Informatica Load Heterogeneous Targets From Heterogeneous Sources?

      Answer :

      No, In Informatica 5.2 and
      Yes, in Informatica 6.1 and later.

    8. Question 8. What Is Etl Process ?how Many Steps Etl Contains Explain With Example?

      Answer :

      ETL is extraction , transforming , loading process , you will extract data from the source and apply the business role on it then you will load it in the target

      The steps are :
      1-define the source (create the odbc and the connection to the source DB)
      2-define the target (create the odbc and the connection to the target DB)
      3-create the mapping ( you will apply the business role here by adding transformations , and define how the data flow will go from the source to the target )
      4-create the session (its a set of instruction that run the mapping )
      5-create the work flow (instruction that run the session)

    9. Question 9. What Is Full Load & Incremental Or Refresh Load?

      Answer :

      Full Load: completely erasing the contents of one or more tables and reloading with fresh data.
      Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.

    10. Question 10. Is There Any Way To Read The Ms Excel Data's Directly Into Informatica? Like Is There Any Possibilities To Take Excel File As Target?

      Answer :

      we can’t directly import the xml file in informatica.
      we have to define the Microsoft excel odbc driver on our system and define the name in exce sheet by defining ranges then in informatica open the folder using sources ->import from database->select excel odbc driver->connect->select the excel sheet name .

    11. Question 11. What Is A Staging Area? Do We Need It? What Is The Purpose Of A Staging Area?

      Answer :

      Data staging is actually a collection of processes used to prepare source system data for loading a data warehouse. Staging includes the following steps:
      Source data extraction, Data transformation (restructuring),
      Data transformation (data cleansing, value transformations),
      Surrogate key assignments

    12. Question 12. How Do We Call Shell Scripts From Informatica?

      Answer :

      Specify the Full path of the Shell script the "Post session properties of session/workflow".

    13. Question 13. What Is The Difference Between Power Center & Power Mart?

      Answer :

      PowerCenter - ability to organize repositories into a data mart domain and share metadata across repositories.
      PowerMart - only local repository can be created.

    14. Question 14. Can We Lookup A Table From Source Qualifier Transformation. Ie. Unconnected Lookup

      Answer :

      You cannot lookup from a source qualifier directly. However, you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.

    15. Question 15. Do We Need An Etl Tool? When Do We Go For The Tools In The Market?

      Answer :

      ETL Tool:
      It is used to Extract(E) data from multiple source systems(like RDBMS, Flat files, Mainframes, SAP,XML etc) transform(T) them based on Business requirements and Load(L) in target locations.(like tables, files etc).

      Need of ETL Tool:
      An ETL tool is typically required when data scattered across different systems. (like RDBMS, Flat files, Mainframes, SAP,XML etc).

    16. Question 16. What Is Informatica Metadata And Where Is It Stored?

      Answer :

      Informatica Metadata is data about data which stores in Informatica repositories.

    17. Question 17. Techniques Of Error Handling - Ignore , Rejecting Bad Records To A Flat File , Loading The Records And Reviewing Them (default Values)

      Answer :

      Rejection of records either at the database due to constraint key violation or the informatica server when writing data into target table. These rejected records we can find in the bad files folder where a reject file will be created for a session. we can check why a record has been rejected. And this bad file contains first column a row indicator and second column a column indicator.

      These row indicators or of four types
      D-valid data,
      O-overflowed data,
      N-null data,
      T- Truncated data,
      And depending on these indicators we can changes to load data successfully to target.

    18. Question 18. What Are The Various Methods Of Getting Incremental Records Or Delta Records From The Source Systems?

      Answer :

      One foolproof method is to maintain a field called 'Last Extraction Date' and then impose a condition in the code saying 'current_extraction_date > last_extraction_date'.

    19. Question 19. What Are The Different Versions Of Informatica?

      Answer :

      Here are some popular versions of Informatica.
      Informatica Powercenter 4.1,
      Informatica Powercenter 5.1,
      Powercenter Informatica 6.1.2,
      Informatica Powercenter 7.1.2,
      Informatica Powercenter 8.1,
      Informatica Powercenter 8.5,
      Informatica Powercenter 8.6.

    20. Question 20. What Is Ods (operation Data Source)

      Answer :

      ODS - Operational Data Store.
      ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of granularity.
      Once data was populated in ODS aggregated data will be loaded into into EDW through ODS.

    21. Question 21. What Is Latest Version Of Power Center / Power Mart?

      Answer :

      The Latest Version is 7.2

    22. Question 22. What Are The Various Tools?

      Answer :

      - Cognos Decision Stream
      - Oracle Warehouse Builder
      - Business Objects XI (Extreme Insight)
      - SAP Business Warehouse
      - SAS Enterprise ETL Server

    23. Question 23. Compare Etl & Manual Development?

      Answer :

      ETL - The process of extracting data from multiple sources.(ex. flat files, XML, COBOL, SAP etc) is more simpler with the help of tools.
      Manual - Loading the data other than flat files and oracle table need more effort.

      ETL - High and clear visibility of logic.
      Manual - complex and not so user friendly visibility of logic.

      ETL - Contains Meta data and changes can be done easily.
      Manual - No Meta data concept and changes needs more effort.

      ETL- Error handling, log summary and load progress makes life easier for developer and maintainer.
      Manual - need maximum effort from maintenance point of view.

      ETL - Can handle Historic data very well.
      Manual - as data grows the processing time degrades.

      These are some differences b/w manual and ETL development.

    24. Question 24. When Do We Analyze The Tables? How Do We Do It?

      Answer :

      The ANALYZE statement allows you to validate and compute statistics for an index, table, or cluster. These statistics are used by the cost-based optimizer when it calculates the most efficient plan for retrieval. In addition to its role in statement optimization, ANALYZE also helps in validating object structures and in managing space in your system. You can choose the following operations: COMPUTER, ESTIMATE, and DELETE. Early version of Oracle7 produced unpredictable results when the ESTIMATE operation was used. It is best to compute your statistics.

    25. Question 25. How Do You Calculate Fact Table Granularity?

      Answer :

      Granularity, is the level of detail in which the fact table is describing, for example if we are making time analysis so the granularity maybe day based - month based or year based

    26. Question 26. What Are The Modules In Power Mart?

      Answer :

      1. PowerMart Designer
      2. Server
      3. Server Manager
      4. Repository
      5. Repository Manager

    27. Question 27. If A Flat File Contains 1000 Records How Can I Get First And Last Records Only?

      Answer :

      By using Aggregator transformation with first and last functions we can get first and last record.

    28. Question 28. Lets Suppose We Have Some 10,000 Odd Records In Source System And When Load Them Into Target.how Do We Ensure That All 10,000 Records That Are Loaded To Target Doesn't Contain Any Garbage Values?

      Answer :

      we can do ltrim, rtrim in the expression or can have check for null and then insert the data.

    29. Question 29. How Do We Extract Sap Data Using Informatica? What Is Abap? What Are Idocs?

      Answer :

      SAP Data can be loaded into Informatica in the form of Flat files.
      Condition:
      Informatica source qualifier column sequence must match the SAP source file.

    30. Question 30. What Is The Difference Between Joiner And Lookup

      Answer :

      joiner is used to join two or more tables to retrieve data from tables (just like joins in sql).
      Look up is used to check and compare source table and target table . (just like correlated sub-query in sql).

    31. Question 31. What Are The Various Test Procedures Used To Check Whether The Data Is Loaded In The Backend, Performance Of The Mapping, And Quality Of The Data Loaded In Informatica.

      Answer :

      The best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.

    32. Question 32. What Is The Difference Between Etl Tool And Olap Tools

      Answer :

      ETL tool is meant for extraction data from the legacy systems and load into specified data base with some process of cleansing data.
      ex: Informatica, data stage ....etc

      OLAP is meant for Reporting purpose. in OLAP data available in Multidirectional model. so that u can write simple query to extract data from the data base.
      ex: Business objects, Cognos....etc

    33. Question 33. What Are Active Transformation / Passive Transformations?

      Answer :

      Active transformation can change the number of rows that pass through it. (Decrease or increase rows)
      Passive transformation cannot change the number of rows that pass through it.

    34. Question 34. What Are The Different Lookup Methods Used In Informatica?

      Answer :

      1. Connected lookup
      2. Unconnected lookup

      Connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values. it does not contain return port.

      Unconnected lookup can return only one column. it contain return port.

    35. Question 35. What Are Parameter Files ? Where Do We Use Them?

      Answer :

      Parameter file defines the value for parameter and variable used in a workflow, work let or session.

    36. Question 36. What Are The Various Transformation Available?

      Answer :

      • Aggregator Transformation
      • Expression Transformation
      • Filter Transformation
      • Joiner Transformation
      • Lookup Transformation
      • Normalizer Transformation
      • Rank Transformation
      • Router Transformation
      • Sequence Generator Transformation
      • Stored Procedure Transformation
      • Sorter Transformation
      • Update Strategy Transformation
      • XML Source Qualifier Transformation
      • Advanced External Procedure Transformation
      • External Transformation

    37. Question 37. How To Determine What Records To Extract?

      Answer :

      When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current month) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes.

    38. Question 38. What Are Snapshots? What Are Materialized Views & Where Do We Use Them? What Is A Materialized View Do?

      Answer :

      Materialized view is a view in which data is also stored in some temp table. i.e if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB. But In materialized View data is stored in some temp tables.

    39. Question 39. Give Some Popular Tools?

      Answer :

      Popular Tools:
      IBM WebSphere Information Integration (Accentual DataStage)
      Ab Initio
      Informatica
      Talend

    40. Question 40. Give Some Etl Tool Functionalities?

      Answer :

      While the selection of a database and a hardware platform is a must, the selection of an ETL tool is highly recommended, but it's not a must. When you evaluate ETL tools, it pays to look for the following characteristics:

      Functional capability: This includes both the 'transformation' piece and the 'cleansing' piece. In general, the typical ETL tools are either geared towards having strong transformation capabilities or having strong cleansing capabilities, but they are seldom very strong in both. As a result, if you know your data is going to be dirty coming in, make sure your ETL tool has strong cleansing capabilities. If you know there are going to be a lot of different data transformations, it then makes sense to pick a tool that is strong in transformation.

      Ability to read directly from your data source: For each organization, there is a different set of data sources. Make sure the ETL tool you select can connect directly to your source data.

      Metadata support: The ETL tool plays a key role in your metadata because it maps the source data to the destination, which is an important piece of the metadata. In fact, some organizations have come to rely on the documentation of their ETL tool as their metadata source. As a result, it is very important to select an ETL tool that works with your overall metadata strategy.

    41. Question 41. How To Fine Tune The Mappings?

      Answer :

      1.Use filter condition in source qualifies without using filter
      2.use persistence and shared cache in look up t/r
      3.use in aggregations t/r in sorted i/p, group by ports
      4.in expression use operators instead of functions
      5.increase the cache size
      6. increase the commit interval

    42. Question 42. Where Do We Use Connected And Un Connected Lookups

      Answer :

      If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.

    43. Question 43. What Are The Various Tools? - Name A Few.

      Answer :

      - Abinitio
      - DataStage
      - Informatica
      - Cognos Decision Stream
      - Oracle Warehouse Builder
      - Business Objects XI (Extreme Insight)
      - SAP Business Warehouse
      - SAS Enterprise ETL Server

Data Warehouse Etl Toolkit Tutorial

All Tutorials

All Practice Tests

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Data Warehouse ETL Toolkit Tutorial