Input Data and Data Fields in SQL*Loader

7.4 Input Data and Data Fields in SQL*Loader

Learn how SQL*Loader loads data and identifies record fields.

How SQL*Loader Reads Input Data and Data Files
SQL*Loader reads data from one or more data files (or operating system equivalents of files) specified in the control file.
Fixed Record Format
A file is in fixed record format when all records in a data file are the same byte length.
Variable Record Format and SQL*Loader
A file is in variable record format when the length of each record in a character field is included at the beginning of each record in the data file.
Stream Record Format and SQL*Loader
A file is in stream record format when the records are not specified by size; instead SQL*Loader forms records by scanning for the record terminator.
Logical Records and SQL*Loader
SQL*Loader organizes input data into physical records, according to the specified record format. By default, a physical record is a logical record.
Data Field Setting and SQL*Loader
Learn how SQL*Loader determines the field setting on the logical record after a logical record is formed.

Parent topic: Understanding How to Use SQL*Loader

7.4.1 How SQL*Loader Reads Input Data and Data Files

SQL*Loader reads data from one or more data files (or operating system equivalents of files) specified in the control file.

From SQL*Loader's perspective, the data in the data file is organized as records. A particular data file can be in fixed record format, variable record format, or stream record format. The record format can be specified in the control file with the INFILE parameter. If no record format is specified, then the default is stream record format.

Note:

If data is specified inside the control file (that is, INFILE * was specified in the control file), then the data is interpreted in the stream record format with the default record terminator.

Parent topic: Input Data and Data Fields in SQL*Loader

7.4.2 Fixed Record Format

A file is in fixed record format when all records in a data file are the same byte length.

Although the fixed record format is the least flexible format, using it results in better performance than variable or stream format. Fixed format is also simple to specify. For example:

INFILE datafile_name "fix n"

This example specifies that SQL*Loader should interpret the particular data file as being in fixed record format where every record is n bytes long.

The following example shows a control file that specifies a data file (example1.dat) to be interpreted in the fixed record format. The data file in the example contains five physical records; each record has fields that contain the number and name of an employee. Each of the five records is 11 bytes long, including spaces. For the purposes of explaining this example, periods are used to represent spaces in the records, but in the actual records there would be no periods. With that in mind, the first physical record is 396,...ty,. which is exactly eleven bytes (assuming a single-byte character set). The second record is 4922,beth, followed by the newline character (\n) which is the eleventh byte, and so on. (Newline characters are not required with the fixed record format; it is simply used here to illustrate that if used, it counts as a byte in the record length.)

Example 7-1 Loading Data in Fixed Record Format

Loading data:

load data
infile 'example1.dat'  "fix 11"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1, col2)

Contents of example1.dat:

396,...ty,.4922,beth,\n
68773,ben,.
1,.."dave",
5455,mike,.

Note that the length is always interpreted in bytes, even if character-length semantics are in effect for the file. This is necessary because the file can contain a mix of fields. Some are processed with character-length semantics, and others are processed with byte-length semantics.

Related Topics

Character-Length Semantics

Parent topic: Input Data and Data Fields in SQL*Loader

7.4.3 Variable Record Format and SQL*Loader

A file is in variable record format when the length of each record in a character field is included at the beginning of each record in the data file.

This format provides some added flexibility over the fixed record format and a performance advantage over the stream record format. For example, you can specify a data file that is to be interpreted as being in variable record format as follows:

INFILE "datafile_name" "var n"

In this example, n specifies the number of bytes in the record length field. If n is not specified, then SQL*Loader assumes a length of 5 bytes. Specifying n larger than 40 results in an error.

The following example shows a control file specification that tells SQL*Loader to look for data in the data file example2.dat and to expect variable record format where the record's first three bytes indicate the length of the field. The example2.dat data file consists of three physical records. The first is specified to be 009 (9) bytes long, the second is 010 (10) bytes long (plus a 1-byte newline), and the third is 012 (12) bytes long (plus a 1-byte newline). Note that newline characters are not required with the variable record format. This example also assumes a single-byte character set for the data file. For the purposes of this example, periods in example2.dat represent spaces; the fields do not contain actual periods.

Example 7-2 Loading Data in Variable Record Format

Loading data:

load data
infile 'example2.dat'  "var 3"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1 char(5),
 col2 char(7))

Contents of example2.dat:

009.396,.ty,0104922,beth,01268773,benji,

Note that the lengths are always interpreted in bytes, even if character-length semantics are in effect for the file. This is necessary because the file can contain a mix of fields, some processed with character-length semantics and others processed with byte-length semantics.

Related Topics

Character-Length Semantics

Parent topic: Input Data and Data Fields in SQL*Loader

7.4.4 Stream Record Format and SQL*Loader

A file is in stream record format when the records are not specified by size; instead SQL*Loader forms records by scanning for the record terminator.

Stream record format is the most flexible format, but using it can result in a negative effect on performance. The specification of a data file to be interpreted as being in stream record format looks similar to the following:

INFILE datafile_name ["str terminator_string"]

In the preceding example, str indicates that the file is in stream record format. The terminator_string is specified as either 'char_string' or X'hex_string' where:

'char_string' is a string of characters enclosed in single or double quotation marks
X'hex_string' is a byte string in hexadecimal format

When the terminator_string contains special (nonprintable) characters, it should be specified as a X'hex_string' byte string. However, you can specify some nonprintable characters as ('char_string') by using a backslash. For example:

\n indicates a line feed
\t indicates a horizontal tab
\f indicates a form feed
\v indicates a vertical tab
\r indicates a carriage return

If the character set specified with the NLS_LANG initialization parameter for your session is different from the character set of the data file, then character strings are converted to the character set of the data file. This is done before SQL*Loader checks for the default record terminator.

Hexadecimal strings are assumed to be in the character set of the data file, so no conversion is performed.

On UNIX-based platforms, if no terminator_string is specified, then SQL*Loader defaults to the line feed character, \n.

On Windows-based platforms, if no terminator_string is specified, then SQL*Loader uses either \n or \r\n as the record terminator, depending on which one it finds first in the data file. This means that if you know that one or more records in your data file has \n embedded in a field, but you want \r\n to be used as the record terminator, then you must specify it.

The following example illustrates loading data in stream record format where the terminator string is specified using a character string, '|\n'. The use of the backslash character allows the character string to specify the nonprintable line feed character.

7.4.5 Logical Records and SQL*Loader

SQL*Loader organizes input data into physical records, according to the specified record format. By default, a physical record is a logical record.

For added flexibility, SQL*Loader can be instructed to combine several physical records into a logical record.

SQL*Loader can be instructed to follow one of the following logical record-forming strategies:

Combine a fixed number of physical records to form each logical record.
Combine physical records into logical records while a certain condition is true.

Related Topics

Parent topic: Input Data and Data Fields in SQL*Loader

7.4.6 Data Field Setting and SQL*Loader

Learn how SQL*Loader determines the field setting on the logical record after a logical record is formed.

Field setting is a process in which SQL*Loader uses control-file field specifications to determine which parts of logical record data correspond to which control-file fields. It is possible for two or more field specifications to claim the same data. Also, it is possible for a logical record to contain data that is not claimed by any control-file field specification.

Most control-file field specifications claim a particular part of the logical record. This mapping takes the following forms:

The byte position of the data field's beginning, end, or both, can be specified. This specification form is not the most flexible, but it provides high field-setting performance.
The strings delimiting (enclosing, terminating, or both) a particular data field can be specified. A delimited data field is assumed to start where the last data field ended, unless the byte position of the start of the data field is specified.
You can specify the byte offset, the length of the data field, or both. This way each field starts a specified number of bytes from where the last one ended and continues for a specified length.
Length-value data types can be used. In this case, the first n number of bytes of the data field contain information about how long the rest of the data field is.

Starting with Oracle Database 23ai, you can use SQL*Loader to load schemaless documents (documents that lack a fixed data structure, such as JSON or XML-based application data) into Oracle Database as SODA collections.

Related Topics

Parent topic: Input Data and Data Fields in SQL*Loader