ETL Testing Interview Questions & Answers

4 avg. rating (80% score) - 2 votes

ETL Testing Interview Questions & Answers

If you are looking for a job in Automation Testing combined with ETL (Extract ,Transform and Loading) Testing Automation Testing using UFT/Selenium then this is the right platform. ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse. Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database. There are many jobs on ETL testing some of the most popular jobs in it are software engineer, QA analyst, ETL test engineer, quality analyst, manual tester, ETL developer, data engineer, data consultant, python developer, data lead, ETL architect, integration engineer, big data developer and analyst etc. please have a look at our ETL testing jobs interview questions and answers page to shortlist in your interview.

ETL Testing Interview Questions

ETL Testing Interview Questions
    1. Question 1. What Is Etl?

      Answer :

      ETL stands for extraction, transformation and loading.

      ETL provide developers with an interface for designing source-to-target mappings, transformation and job control parameter.

      Extraction :

      Take data from an external source and move it to the warehouse pre-processor database.

      Transformation:

      Transform data task allows point-to-point generating, modifying and transforming data.

      Loading:

      Load data task adds records to a database table in a warehouse.

    2. Question 2. What Is The Difference Between Etl Tool And Olap Tools?

      Answer :

      ETL tool is meant for extraction data from the legacy systems and load into specified database with some process of cleansing data.

      ex: Informatica, data stage ....etc

      OLAP is meant for Reporting purpose in OLAP data available in Multidirectional model. so that you can write simple query to extract data from the data base.

      ex: Business objects, Cognos....etc

    3. Question 3. Can We Lookup A Table From Source Qualifier Transformation. Ie. Unconnected Lookup?

      Answer :

      You cannot lookup from a source qualifier directly. However, you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.

    4. Question 4. What Is Ods (operation Data Source)?

      Answer :

      • ODS - Operational Data Store.
      • ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of granularity.
      • Once data was populated in ODS aggregated data will be loaded into EDW through ODS.

    5. Question 5. Where Do We Use Connected And Unconnected Lookups?

      Answer :

      • If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.
      • If you require dynamic cache i.e where your data will change dynamically then you can go for connected lookup. If your data is static where your data won't change when the session loads you can go for unconnected lookups .

    6. Question 6. Where Do We Use Semi And Non Additive Facts?

      Answer :

      Additive: A measure can participate arithmetic calculations using all or any dimensions.

      Ex: Sales profit

      Semi additive: A measure can participate arithmetic calculations using some dimensions.

      Ex: Sales amount

      Non Additive:A measure can't participate arithmetic calculations using dimensions.

      Ex: temperature

    7. Question 7. What Are Non-additive Facts In Detail?

      Answer :

      • A fact may be measure, metric or a dollar value. Measure and metric are non additive facts.
      • Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.
      • A non additive fact, for eg; measure height(s) for 'citizens by geographical location' , when we rollup 'city' data to 'state' level data we should not add heights of the citizens rather we may want to use it to derive 'count'.

    8. Question 8. What Is A Staging Area? Do We Need It? What Is The Purpose Of A Staging Area?

      Answer :

      Data staging is actually a collection of processes used to prepare source system data for loading a data warehouse. Staging includes the following steps:

      • Source data extraction, Data transformation (restructuring),
      • Data transformation (data cleansing, value transformations),
      • Surrogate key assignments.

    9. Question 9. What Is Latest Version Of Power Center / Power Mart?

      Answer :

      The Latest Version is 7.2

    10. Question 10. What Are The Modules In Power Mart?

      Answer :

      1.  PowerMart Designer
      2.  Server
      3.  Server Manager
      4.  Repository
      5.  Repository Manager

    11. Question 11. What Are Active Transformation / Passive Transformations?

      Answer :

      • Active transformation can change the number of rows that pass through it. (Decrease or increase rows)
      • Passive transformation cannot change the number of rows that pass through it.

    12. Question 12. What Are The Different Lookup Methods Used In Informatica?

      Answer :

      Connected lookup:

      Connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values it does not contain return port.

      Unconnected lookup:

      Unconnected lookup can return only one column it contain return port.

    13. Question 13. Can Informatica Load Heterogeneous Targets From Heterogeneous Sources?

      Answer :

      • No, In Informatica 5.2 and
      • Yes, in Informatica 6.1 and later.

    14. Question 14. How Do We Call Shell Scripts From Informatica?

      Answer :

      Specify the Full path of the Shell script the "Post session properties of session/workflow".

    15. Question 15. What Is Informatica Metadata And Where Is It Stored?

      Answer :

      Informatica Metadata is data about data which stores in Informatica repositories.

    16. Question 16. What Is A Mapping, Session, Worklet, Workflow, Mapplet?

      Answer :

      • A mapping represents dataflow from sources to targets.
      • A mapplet creates or configures a set of transformations.
      • A workflow is a set of instructions that tell the Informatica server how to execute the tasks.
      • A worklet is an object that represents a set of tasks.
      • A session is a set of instructions that describe how and when to move data from sources to targets.

    17. Question 17. What Are Parameter Files? Where Do We Use Them?

      Answer :

      Parameter file defines the value for parameter and variable used in a workflow, worklet or session.

    18. Question 18. Can We Override A Native Sql Query Within Informatica? Where Do We Do It? How Do We Do It?

      Answer :

      Yes, we can override a native sql query in source qualifier and lookup transformation.

      In lookup transformation we can find "Sql override" in lookup properties. By using this option we can do this.

    19. Question 19. Can We Use Procedural Logic Inside Infromatica? If Yes How , If No How Can We Use External Procedural Logic In Informatica?

      Answer :

      Yes, you can use advanced external transformation, You can use c++ language on unix and c++, vb vc++ on windows server.

    20. Question 20. Do We Need An Etl Tool? When Do We Go For The Tools In The Market?

      Answer :

      ETL Tool:

      It is used to Extract(E) data from multiple source systems(like RDBMS, Flat files, Mainframes, SAP, XML etc) transform(T) then based on Business requirements and Load(L) in target locations.(like tables, files etc).

      Need of ETL Tool:

      An ETL tool is typically required when data scattered across different systems.(like RDBMS, Flat files, Mainframes, SAP, XML etc).

    21. Question 21. How To Determine What Records To Extract?

      Answer :

      When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current month) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes.

    22. Question 22. What Is Full Load & Incremental Or Refresh Load?

      Answer :

      Full Load: completely erasing the contents of one or more tables and reloading with fresh data.

      Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.

    23. Question 23. When Do We Analyze The Tables? How Do We Do It?

      Answer :

      The ANALYZE statement allows you to validate and compute statistics for an index, table, or cluster. These statistics are used by the cost-based optimizer when it calculates the most efficient plan for retrieval. In addition to its role in statement optimization, ANALYZE also helps in validating object structures and in managing space in your system. You can choose the following operations: COMPUTER, ESTIMATE, and DELETE. Early version of Oracle7 produced unpredictable results when the ESTIMATE operation was used. It is best to compute your statistics.

      EX:
      select OWNER,
      sum(decode(nvl(NUM_ROWS,9999), 9999,0,1)) analyzed,
      sum(decode(nvl(NUM_ROWS,9999), 9999,1,0)) not_analyzed,
      count(TABLE_NAME) total
      from dba_tables
      where OWNER not in ('SYS', 'SYSTEM')
      group by OWNER

    24. Question 24. Compare Etl & Manual Development?

      Answer :

      These are some differences b/w manual and ETL development.
      ETL

      • The process of extracting data from multiple sources.(ex. flatfiles, XML, COBOL, SAP etc) is more simpler with the help of tools.
      • High and clear visibility of logic.
      • Contains Meta data and changes can be done easily.
      • Error handling, log summary and load progress makes life easier for developer and maintainer.
      • Can handle Historic data very well.


      Manual

      • Loading the data other than flat files and oracle table need more effort.
      • complex and not so user friendly visibility of logic.
      • No Meta data concept and changes needs more effort.
      • need maximum effort from maintenance point of view.
      • as data grows the processing time degrades.

    25. Question 25. What Is Real Time Data-wearhousing?

      Answer :

      • In real time data-warehousing, the warehouse is updated every time the system performs a transaction.
      • It reflects the real time business data.
      • This means that when the query is fired in the warehouse, the state of the business at that time will be returned.

    26. Question 26. Explain The Use Lookup Tables And Aggregate Tables?

      Answer :

      • An aggregate table contains summarized view of data.
      • Lookup tables, using the primary key of the target, allow updating of records based on the lookup condition.

    27. Question 27. Define Slowly Changing Dimensions (scd)?

      Answer :

      SCD are dimensions whose data changes very slowly.
      eg: city or an employee.

      • This dimension will change very slowly.
      • The row of this data in the dimension can be either replaced completely without any track of old record OR a new row can be inserted, OR the change can be tracked.

    28. Question 28. What Is Cube Grouping?

      Answer :

      A transformer built set of similar cubes is known as cube grouping. They are generally used in creating smaller cubes that are based on the data in the level of dimension.

    29. Question 29. What Is Data Wearhousing?

      Answer :

      • A data warehouse can be considered as a storage area where relevant data is stored irrespective of the source.
      • Data warehousing merges data from multiple sources into an easy and complete form.

    30. Question 30. What Is Virtual Data Wearhousing?

      Answer :

      A virtual data warehouse provides a collective view of the completed data. It can be considered as a logical data model of the containing metadata.

    31. Question 31. What Is Active Data Wearhousing?

      Answer :

      An active data warehouse represents a single state of the business. It considers the analytic perspectives of customers and suppliers. It helps to deliver the updated data through reports.

    32. Question 32. What Is Data Modeling And Data Mining?

      Answer :

      Data Modeling is a technique used to define and analyze the requirements of data that supports organization’s business process. In simple terms, it is used for the analysis of data objects in order to identify the relationships among these data objects in any business.

      Data Mining is a technique used to analyze datasets to derive useful insights/information. It is mainly used in retail, consumer goods, telecommunication and financial organizations that have a strong consumer orientation in order to determine the impact on sales, customer satisfaction and profitability.

    33. Question 33. What Are Critical Success Factors?

      Answer :

      Key areas of activity in which favorable results are necessary for a company to obtain its goal.
      There are four basic types of CSFs which are:

      • Industry CSFs
      • Strategy CSFs
      • Environmental CSFs
      • Temporal CSFs

    34. Question 34. What Is Data Cube Technology Used For?

      Answer :

      Data cubes are commonly used for easy interpretation of data. It is used to represent data along with dimensions as some measures of business needs. Each dimension of the cube represents some attribute of the database. E.g profit per day, month or year.

    35. Question 35. What Is Data Cleaning?

      Answer :

      • Data cleaning is also known as data scrubbing.
      • Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged.

    36. Question 36. Explain How To Mine An Olap Cube?

      Answer :

      An extension of data mining can be used for slicing the data the source cube in discovered data mining.

      The case table is dimensioned at the time of mining a cube.

    37. Question 37. What Are Different Stages Of Data Mining?

      Answer :

      A stage of data mining is a logical process for searching large amount information for finding important data.

      Stage 1: Exploration: One will want to explore and prepare data. The goal of the exploration stage is to find important variables and determine their nature.
      Stage 2: pattern identification: Searching for patterns and choosing the one which allows making best prediction, is the primary action in this stage.
      Stage 3: Deployment stage: Until consistent pattern is found in stage 2, which is highly predictive, this stage cannot be reached. The pattern found in stage 2, can be applied for the purpose to see whether the desired outcome is achieved or not.

    38. Question 38. What Are The Different Problems That Data Mining Can Slove?

      Answer :

      Data mining can be used in a variety of fields/industries like marketing of products and services, AI, government intelligence.

      The US FBI uses data mining for screening security and intelligence for identifying illegal and incriminating e-information distributed over internet.

    39. Question 39. What Is Data Purging?

      Answer :

      Deleting data from data warehouse is known as data purging. Usually junk data like rows with null values or spaces are cleaned up.

      Data purging is the process of cleaning this kind of junk values.

    40. Question 40. What Is Bus Schema?

      Answer :

      A BUS schema is to identify the common dimensions across business processes, like identifying conforming dimensions. It has conformed dimension and standardized definition of facts.

    41. Question 41. Define Non-additive Facts?

      Answer :

      Non additive facts are facts that cannot be summed up for any dimensions present in fact table. These columns cannot be added for producing any results.

    42. Question 42. What Is Conformed Fact? What Are Conformed Dimensions Used For?

      Answer :

      Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. They have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions.

    43. Question 43. What Is A Three Tier Data Warehouse?

      Answer :

      A data warehouse can be thought of as a three-tier system in which a middle system provides usable data in a secure way to end users. On either side of this middle system are the end users and the back-end data stores.

    44. Question 44. What Is The Metadata Extension?

      Answer :

      • Informatica allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. For example, when you create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions.
      • Informatica Client applications can contain the following types of metadata extensions:
      • Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them.
      • User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions.

    45. Question 45. How Can We Use Mapping Variables In Informatica? Where Do We Use Them?

      Answer :

      Yes. we can use mapping variable in Informatica.
      The Informatica server saves the value of mapping variable to the repository at the end of session run and uses that value next time we run the session.

    46. Question 46. What Is Etl Process ?how Many Steps Etl Contains Explain With Example?

      Answer :

      ETL is extraction, transforming, loading process, you will extract data from the source and apply the business role on it then you will load it in the target the steps are :

      • define the source(create the odbc and the connection to the source DB)
      • define the target (create the odbc and the connection to the target DB)
      • create the mapping ( you will apply the business role here by adding transformations , and define how the data flow will go from the source to the target )
      • create the session (its a set of instruction that run the mapping )
      • create the work flow (instruction that run the session)

    47. Question 47. Give Some Popular Tools?

      Answer :

      Popular Tools:

      • IBM Web Sphere Information Integration (Accentual DataStage)
      • Ab Initio
      • Informatica
      • Talend

    48. Question 48. Give Some Etl Tool Functionalities?

      Answer :

      While the selection of a database and a hardware platform is a must, the selection of an ETL tool is highly recommended, but it's not a must. When you evaluate ETL tools, it pays to look for the following characteristics:

      • Functional capability: This includes both the 'transformation' piece and the 'cleansing' piece. In general, the typical ETL tools are either geared towards having strong transformation capabilities or having strong cleansing capabilities, but they are seldom very strong in both. As a result, if you know your data is going to be dirty coming in, make sure your ETL tool has strong cleansing capabilities. If you know there are going to be a lot of different data transformations, it then makes sense to pick a tool that is strong in transformation.
      • Ability to read directly from your data source: For each organization, there is a different set of data sources. Make sure the ETL tool you select can connect directly to your source data.
      • Metadata support: The ETL tool plays a key role in your metadata because it maps the source data to the destination, which is an important piece of the metadata. In fact, some organizations have come to rely on the documentation of their ETL tool as their metadata source. As a result, it is very important to select an ETL tool that works with your overall metadata strategy.

    49. Question 49. How Can I Edit The Xml Target, Are There Anyways Apart From The Editing The Xsd File. Can I Directly Edit The Xml Directly In Informatica Designer?

      Answer :

      No you cannot edit it from Informatica designer. But still you can change the precession of the ports if xml source is imported from DTD file.

    50. Question 50. What Is The Difference Between Informatica 7.0&8.0?

      Answer :

      The only difference b/w informatica 7 & 8 is... 8 is a SOA (Service Oriented Architecture) whereas 7 is not. SOA in informatica is handled through different grid designed in server.

    51. Question 51. Where Do We Use Connected And Un Connected Lookups?

      Answer :

      If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.

    52. Question 52. What Are The Various Tools? - Name A Few.

      Answer :

      •  Abinitio
      •  DataStage
      •  Informatica
      •  Cognos Decision Stream
      •  Oracle Warehouse Builder
      •  Business Objects XI (Extreme Insight)
      •  SAP Business Warehouse
      •  SAS Enterprise ETL Server

    53. Question 53. What Are The Various Test Procedures Used To Check Whether The Data Is Loaded In The Backend, Performance Of The Mapping, And Quality Of The Data Loaded In Informatica?

      Answer :

      The best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.

    54. Question 54. What Is The Difference Between Joiner And Lookup?

      Answer :

      joiner is used to join two or more tables to retrieve data from tables(just like joins in sql).

      Look up is used to check and compare source table and target table .(just like correlated sub-query in sql).

    55. Question 55. If A Flat File Contains 1000 Records How Can I Get First And Last Records Only?

      Answer :

      By using Aggregator transformation with first and last functions we can get first and last record.

    56. Question 56. How Do You Calculate Fact Table Granularity?

      Answer :

      Granularity, is the level of detail in which the fact table is describing, for example if we are making time analysis so the granularity maybe day based - month based or year based.

    57. Question 57. What Are The Different Versions Of Informatica?

      Answer :

      Here are some popular versions of Informatica.

      • Informatica Powercenter 4.1,
      • Informatica Powercenter 5.1,
      • Powercenter Informatica 6.1.2,
      • Informatica Powercenter 7.1.2,
      • Informatica Powercenter 8.1,
      • Informatica Powercenter 8.5,
      • Informatica Powercenter 8.6.

    58. Question 58. Techniques Of Error Handling - Ignore , Rejecting Bad Records To A Flat File , Loading The Records And Reviewing Them (default Values)?

      Answer :

      Rejection of records either at the database due to constraint key violation or the informatica server when writing data into target table. These rejected records we can find in the bad files folder where a reject file will be created for a session. We can check why a record has been rejected. And this bad file contains first column a row indicator and second column a column indicator.

      These row indicators are of four types
      D-valid data,
      O-overflowed data,
      N-null data,
      T- Truncated data,
      And depending on these indicators we can changes to load data successfully to target.

    59. Question 59. What Is The Difference Between Power Center & Power Mart?

      Answer :

      PowerCenter - ability to organize repositories into a data mart domain and share metadata across repositories.
      PowerMart - only local repository can be created.

    60. Question 60. What Are Snapshots? What Are Materialized Views & Where Do We Use Them? What Is A Materialized View Log?

      Answer :

      Snapshots are read-only copies of a master table located on a remote node which is periodically refreshed to reflect changes made to the master table. Snapshots are mirror or replicas of tables.

      Views are built using the columns from one or more tables. The Single Table View can be updated but the view with multi table cannot be updated.

      A View can be updated/deleted/inserted if it has only one base table if the view is based on columns from one or more tables then insert, update and delete is not possible.

      Materialized view
      A pre-computed table comprising aggregated or joined data from fact and possibly dimension tables. Also known as a summary or aggregate table.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

ETL Testing Tutorial