Parallel Data Load

Parallel data load can refer to parallel thread processing methods available to you to optimize the five stages of the data load pipeline. It can also refer to importing multiple data files concurrently.

Parallelize Threads in the Data Load Pipeline

One aspect of parallel data load describes the pipeline optimization you can achieve by using the configuration settings DLTHREADSPREPARE and DLTHREADSWRITE. While the minimum and default number of threads allocated for a data load is 5 (one thread per stage of the pipeline), these settings enable you to add threads to selected stages in the pipeline. For example, with the following configuration, you can increase the threads used in the Prepare and Write stages from 1 each to 4 each:


DLSINGLETHREADPERSTAGE Sample Basic FALSE
DLTHREADSPREPARE Sample Basic 4
DLTHREADSWRITE Sample Basic 4

With the above configuration, the data load is set to run with 11 threads.

Note:

Parallel data load operations do not dynamically create threads, but instead use a set number of threads from a pre-created pool of threads. You can customize the size of the thread pool. For more information, refer to WORKERTHREADS.

Import Data Files Concurrently

Another aspect of parallel data load refers to the concurrent loading of multiple data files into an Essbase cube. When working with large data sets (for example, a set of ten 2 GB files), loading the data sources concurrently enables you to fully utilize the CPU resources and I/O channels of modern servers with multiple processors and high-performance storage subsystems.

You can also adjust the number of threads used in multiple-file data loads. For example, specifying the above configuration while also specifying two data files results in the creation of two data load pipelines, each having 11 threads.

To load data in parallel, select an option:

Use Jobs in the Essbase web interface. Refer to Load Data.
Use Cube Designer. Refer to Load Data in Cube Designer.
Use MaxL, specifying multiple data files to the import data statement by using a wildcard character (* and/or ?) to match all data sources files you intend to use. If necessary, control the number of threads spawned by the parallel data load, using the using max_threads clause.