Introduction to Parallel Execution Tuning - Data Warehousing

Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Parallel execution improves processing for:

  • Queries requiring large table scans, joins, or partitioned index scans
  • Creation of large indexes
  • Creation of large tables (including materialized views)
  • Bulk inserts, updates, merges, and deletes

You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access large objects (LOBs). Parallel execution benefits systems with all of the following characteristics:

  • Symmetric multiprocessors (SMPs), clusters, or massively parallel systems
  • Sufficient I/O bandwidth
  • Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)
  • Sufficient memory to support additional memory-intensive processes, such as sorts, hashing, and I/O buffers

If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth.

When to Implement Parallel Execution
The benefits of parallel execution can be seen in DSS and data warehousing environments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple DML or SELECT statements that characterize OLTP applications would not see any benefit from being executed in parallel.

When Not to Implement Parallel Execution
Parallel execution is not normally useful for:

  • Environments in which the typical query or transaction is very short (a few seconds or less). This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may outweigh the benefits of parallelism.
  • Environments in which the CPU, memory, or I/O resources are already heavily utilized. Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution will not yield any benefits and indeed may be detrimental to performance.

Operations That Can Be Parallelized
You can use parallel execution for any of the following:

  • Access methods

    Some examples are table scans, index full scans, and partitioned index range scans.

  • Join methods

    Some examples are nested loop, sort merge, hash, and star transformation.

  • DDL statements

    Some examples are CREATE TABLE AS SELECT, CREATE INDEX, REBUILD INDEX, REBUILD INDEX PARTITION, and MOVE/SPLIT/COALESCE PARTITION.

You can normally use parallel DDL where you use regular DDL. There are, however, some additional details to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB columns.

All of these DDL operations can be performed in NOLOGGING mode for either parallel or serial execution.

The CREATE TABLE statement for an index-organized table can be parallelized either with or without an AS SELECT clause.

Different parallelism is used for different operations. Parallel create (partitioned) table as select and parallel create (partitioned) index run with a degree of parallelism equal to the number of partitions.

Parallel operations require accurate statistics to perform optimally.

  • DML statements
  • Some examples are INSERT AS SELECT, updates, deletes, and MERGE operations.

Parallel DML (parallel insert, update, merge, and delete) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT ... SELECT statements to insert rows into multiple tables as part of a single DML statement. You can normally use parallel DML where you use regular DML.

Although data manipulation language (DML) normally includes queries, the term parallel DML refers only to inserts, updates, upserts and deletes done in parallel.

  • Miscellaneous SQL operations
  • Some examples are GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, as well as aggregate and table functions.

  • Parallel query
  • You can parallelize queries and subqueries in SELECT statements, as well as the query portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE).

  • SQL*Loader
  • You can parallelize the use of SQL*Loader, where large amounts of data are routinely encountered. To speed up your loads, you can use a parallel direct-path load as in the following example:

sqlldr USERID=SCOTT/TIGER CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE
sqlldr USERID=SCOTT/TIGER CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE
sqlldr USERID=SCOTT/TIGER CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE

You can also use a parameter file to achieve the same thing.

An important point to remember is that indexes are not maintained during a parallel load.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Data Warehousing Topics