22.10 PARSE Function
This function enables you to parse XML, XLSX, CSV, or JSON files and returns a generic table of the following structure:
LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300
Values are generally returned in VARCHAR2
format. A returned table row can have a maximum of 300 columns. The maximum length for a VARCHAR2 table column is 4000 bytes; there is no line length maximum. 20 out of the 300 supported columns can be handled as a CLOB.
File parsing happens on-the-fly as this function is invoked. Data does not write to a collection nor to a temporary table.
About Parsing File Profiles
If the p_file_profile
parameter is not passed, the function computes a file profile with column information during parsing.
If p_detect_data_types
is passed as Y
(default), the function also detects column data types during parsing. Retrieve the computed file profile using GET_FILE_PROFILE
after the function finishes:
- Invoke
PARSE
- Use this table function to parse the files and get rows and columns in order to display a data preview. - Invoke
GET_FILE_PROFILE
- Retrieve file profile information in JSON format. - Process the data - Generate a SQL query based on the data profile to perform custom processing.
Note:
XLSX parsing occurs in phases:- First,
APEX_ZIP
extracts individual XML files from the XLSX archive. - Then, the
XMLTABLE SQL
function parses the actual XLSX.
About CLOB Support
Starting with APEX release 19.2, this package supports string values larger than 4,000 bytes. 20 out of the 300 supported columns can be handled as a CLOB
. The level of CLOB
support depends upon the file type being parsed.
CSV and XLSX
CLOB
values are supported up to 32K.CLOB
columns can be detected during discovery.- When the data profile is discovered, values below 4000 bytes are normally returned as
COLNNN
.CLOB
values are returned in theCLOBNN
column and the first 1000 characters are returned asCOLNNN
. If a data profile is passed in and that hasCLOB
column defined, all values are returned in theCLOBNN
column only.
XML
CLOB
values with more than 32K are supported.CLOB
columns can be detected during discovery.- When the data profile is discovered, values below 4000 bytes are normally returned as
COLNNN
.CLOB
values are returned in theCLOBNN
column and the first 1000 characters are returned asCOLNNN
. If a data profile is passed in and that hasCLOB
column defined, all values are returned in theCLOBNN
column only.
JSON
CLOB
values with more than 32K are supported.CLOB
columns are not detected during discovery;CLOB
support is only active if a file profile containingCLOB
column is passed in as thep_file_profile
parameter.- Since
JSON_TABLE
does not support CLOBs on 12c databases, the parser uses XMLTYPE-based processing if a file profile with CLOB columns is passed in. Processing will be significantly slower.
About Large CSV Files
If the BLOB passed to APEX_DATA_PARSER.PARSE
is less than 50 MB, Oracle APEX copies the BLOB to an internal, cached temporary LOB. Thus all CSV parsing is done in memory. For larger BLOBs, APEX does CSV parsing on the original BLOB locator. If it is selected from a table, CSV parsing can happen on disk but might be significantly slower. Note that a performance degradation may occur when parsed CSV files grow beyond 50 MB.
However, developers can also use the DBMS_LOB.CREATETEMPORARY
(passing CACHE => TRUE
) and DBMS_LOB.COPY
procedures in order to explicitly create a cached temporary LOB, even for a larger file. Instead of the original BLOB, the cached temporary LOB can be passed to APEX_DATA_PARSER.PARSE
. This approach also enables in-memory parsing for files larger than 50 MB.
See Also:
CREATETEMPORARY Procedures and COPY Procedures in Oracle Database PL/SQL Packages and Types Reference.Syntax
APEX_DATA_PARSER.PARSE (
p_content IN BLOB,
p_file_name IN VARCHAR2 DEFAULT NULL,
p_file_type in t_file_type DEFAULT NULL,
p_file_profile IN CLOB DEFAULT NULL,
--
p_detect_data_types IN VARCHAR2 DEFAULT 'Y',
p_decimal_char IN VARCHAR2 DEFAULT NULL,
--
p_xlsx_sheet_name IN VARCHAR2 DEFAULT NULL,
--
p_row_selector IN VARCHAR2 DEFAULT NULL,
--
p_csv_row_delimiter IN VARCHAR2 DEFAULT LF,
p_csv_col_delimiter IN VARCHAR2 DEFAULT NULL,
p_csv_enclosed IN VARCHAR2 DEFAULT '"',
--
p_skip_rows IN PLS_INTEGER DEFAULT NULL,
p_add_headers_row IN VARCHAR2 DEFAULT 'N',
--
p_nullif IN VARCHAR2 DEFAULT NULL,
p_force_trim_whitespace IN VARCHAR2 DEFAULT 'N',
--
p_file_charset IN VARCHAR2 DEFAULT 'AL32UTF8',
p_max_rows IN NUMBER DEFAULT NULL,
p_return_rows IN NUMBER DEFAULT NULL,
--
p_store_profile_to_collection IN VARCHAR2 DEFAULT NULL,
p_xml_namespaces IN VARCHAR2 DEFAULT NULL,
--
p_fix_excel_precision IN VARCHAR2 DEFAULT 'N' )
RETURN apex_t_parser_table pipelined;
Parameters
Parameter | Description |
---|---|
p_content |
The file content to be parsed as a BLOB. |
p_file_name |
The name of the file; only used to derive the file type. Either p_file_name , p_file_type or p_file_profile must be passed in.
|
p_file_type |
The type of the file to be parsed. Use this to explicitly pass the file type in. Either p_file_name , p_file_type or p_file_profile must be passed in.
|
p_file_profile |
File profile to be used for parsing. The file profile might have been computed in a previous parse() or discover() invocation. If passed in again, the function skips some profile detection logic and use the passed in profile in order to improve performance.
|
p_detect_data_types |
Whether to detect data types ( If If When no data types are detected, all columns are reported as |
p_decimal_char |
Use this decimal character when trying to detect NUMBER data types. If not specified,the procedure will auto-detect the decimal character.
|
p_xlsx_sheet_name |
For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found. |
p_row_selector |
For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
|
p_csv_row_delimiter |
Override the default row delimiter for CSV parsing. Limited to one character and defaults to Linefeed (LF). Note that the Linefeed row delimiter also handles "Carriage Return/Linefeed" (CRLF). |
p_csv_col_delimiter |
Use a specific CSV column delimiter. If omitted, the function will detect the column delimiter based on the first row contents. |
p_csv_enclosed |
Override the default enclosure character for CSV parsing. |
p_skip_rows |
Skip the first N rows when parsing. |
p_add_headers_row |
For XML, JSON: Emit the column headers (tag, attr names) as the first row. |
p_nullif |
Similar to SQL NULLIF function. If the column has this value, return NULL .
|
p_force_trim_whitespace |
Whether to force trim enquoted whitespace from parsed values. |
p_file_charset |
Encoding of the file to parse. Defaults to AL32UTF8 if omitted or NULL is explicitly passed in.
|
p_max_rows |
Stop parsing after p_max_rows have been returned.
|
p_return_rows |
Amount of rows to return. This is useful when the parser parses more rows (for data type detection) than it is supposed to return. When the specified amount of rows have been emitted, the function will continue parsing (and refining the detected data types) until p_max_rows has been reached, or until the rownum < x clause of the SQL query kicks in and stops execution.
|
p_store_profile_to_collection |
Store the File profile which has been computed during parse into a collection. The collection will be cleared, if it exists. Only be used for computed profiles. |
p_xml_namespaces |
XML namespaces to use when parsing XML files. |
p_fix_excel_precision |
Whether to round numbers in XLSX files to 15 significant digits. This is useful for XLSX files generated by Microsoft Excel. Excel stores numeric values as floating point numbers with a maximum of 15 significant digits. For calculation results, this can lead to rounding issues, which are fixed using this parameter. See also: Floating-point arithmetic may give inaccurate results in Excel at Microsoft 365. |
Returns
Returns rows of the apex_t_parser_row
type.
LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300
Example
select line_number, col001,col002,col003,col004,col005,col006,col007,col008
from table(
apex_data_parser.parse(
p_content => {BLOB containing XLSX spreadsheet},
p_file_name => 'test.xlsx',
p_xlsx_sheet_name => 'sheet1.xml') ) ;
LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL008
----------- -------- ------------ ------------ -------- --------------- -------- ------------ --------
1 0 First Name Last Name Gender Country Age Date Id
2 1 Dulce Abril Female United States 32 15/10/2017 1562
3 2 Mara Hashimoto Female Great Britain 25 16/08/2016 1582
4 3 Philip Gent Male France 36 21/05/2015 2587
5 4 Kathleen Hanner Female United States 25 15/10/2017 3549
6 5 Nereida Magwood Female United States 58 16/08/2016 2468
7 6 Gaston Brumm Male United States 24 21/05/2015 2554
8 7 Etta Hurn Female Great Britain 56 15/10/2017 3598
9 8 Earlean Melgar Female United States 27 16/08/2016 2456
10 9 Vincenza Weiland Female United States 40 21/05/2015 6548
: : : : : : : : :
select line_number, col001,col002,col003,col004,col005,col006,col007,col008
from table(
apex_data_parser.parse(
p_content => {BLOB containing JSON file},
p_file_name => 'test.json') ) ;
LINE_NUMBER COL001 COL002 COL003 COL004 COL005
----------- --------- ---------------------------------------------- --------------- --------------
1 Feature 1.5 41km E of Cape Yakataga, Alaska 1536513727239 1536514117117
2 Feature 0.21 11km ENE of Aguanga, CA 1536513299520 1536513521231
3 Feature 1.84 5km SSW of Pahala, Hawaii 1536513262940 1536513459610
4 Feature 2.55 9km W of Volcano, Hawaii 1536513100890 1536513446680
5 Feature 1.3 62km ESE of Cape Yakataga, Alaska 1536512917361 1536513322236
6 Feature 1.79 7km SW of Tiptonville, Tennessee 1536512379690 1536512668010
7 Feature 1.9 126km NNW of Arctic Village, Alaska 1536512346186 1536512846567
8 Feature 1.4 105km NW of Arctic Village, Alaska 1536512140162 1536512846334
See Also:
- CREATETEMPORARY Procedures in Oracle Database PL/SQL Packages and Types Reference
- COPY Procedures in Oracle Database PL/SQL Packages and Types Reference
- Floating-point arithmetic may give inaccurate results in Excel at Microsoft 365
Parent topic: APEX_DATA_PARSER