PARSE Function

This function enables you to parse XML, XLSX, CSV, or JSON files and returns a generic table of the following structure:

LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300

Values are generally returned in VARCHAR2 format. A returned table row can have a maximum of 300 columns. The maximum length for a VARCHAR2 table column is 4000 bytes; there is no line length maximum. 20 out of the 300 supported columns can be handled as a CLOB.

File parsing happens on-the-fly as this function is invoked. Data does not write to a collection nor to a temporary table.

About Parsing File Profiles

If the p_file_profile parameter is not passed, the function computes a file profile with column information during parsing.

If p_detect_data_types is passed as Y (default), the function also detects column data types during parsing. Retrieve the computed file profile using GET_FILE_PROFILE after the function finishes:

Invoke PARSE - Use this table function to parse the files and get rows and columns in order to display a data preview.
Invoke GET_FILE_PROFILE - Retrieve file profile information in JSON format.
Process the data - Generate a SQL query based on the data profile to perform custom processing.

Note:

XLSX parsing occurs in phases:

First, APEX_ZIP extracts individual XML files from the XLSX archive.
Then, the XMLTABLE SQL function parses the actual XLSX.

About CLOB Support

Starting with APEX release 19.2, this package supports string values larger than 4,000 bytes. 20 out of the 300 supported columns can be handled as a CLOB. The level of CLOB support depends upon the file type being parsed.

CSV and XLSX

CLOB values are supported up to 32K.
CLOB columns can be detected during discovery.
When the data profile is discovered, values below 4000 bytes are normally returned as COLNNN. CLOB values are returned in the CLOBNN column and the first 1000 characters are returned as COLNNN. If a data profile is passed in and that has CLOB column defined, all values are returned in the CLOBNN column only.

XML

CLOB values with more than 32K are supported.
CLOB columns can be detected during discovery.
When the data profile is discovered, values below 4000 bytes are normally returned as COLNNN. CLOB values are returned in the CLOBNN column and the first 1000 characters are returned as COLNNN. If a data profile is passed in and that has CLOB column defined, all values are returned in the CLOBNN column only.

JSON

CLOB values with more than 32K are supported.
CLOB columns are not detected during discovery; CLOB support is only active if a file profile containing CLOB column is passed in as the p_file_profile parameter.
Since JSON_TABLE does not support CLOBs on 12c databases, the parser uses XMLTYPE-based processing if a file profile with CLOB columns is passed in. Processing will be significantly slower.

About Large CSV Files

If the BLOB passed to APEX_DATA_PARSER.PARSE is less than 50 MB, Oracle APEX copies the BLOB to an internal, cached temporary LOB. Thus all CSV parsing is done in memory. For larger BLOBs, APEX does CSV parsing on the original BLOB locator. If it is selected from a table, CSV parsing can happen on disk but might be significantly slower. Note that a performance degradation may occur when parsed CSV files grow beyond 50 MB.

However, developers can also use the DBMS_LOB.CREATETEMPORARY (passing CACHE => TRUE ) and DBMS_LOB.COPY procedures in order to explicitly create a cached temporary LOB, even for a larger file. Instead of the original BLOB, the cached temporary LOB can be passed to APEX_DATA_PARSER.PARSE. This approach also enables in-memory parsing for files larger than 50 MB.

See Also:

CREATETEMPORARY Procedures and COPY Procedures in Oracle Database PL/SQL Packages and Types Reference.

Syntax

APEX_DATA_PARSER.PARSE (
    p_content                      IN BLOB,
    p_file_name                    IN VARCHAR2     DEFAULT NULL,
    p_file_type                    in t_file_type  DEFAULT NULL,
    p_file_profile                 IN CLOB         DEFAULT NULL,
    --
    p_detect_data_types            IN VARCHAR2     DEFAULT 'Y',
    p_decimal_char                 IN VARCHAR2     DEFAULT NULL,
    --
    p_xlsx_sheet_name              IN VARCHAR2     DEFAULT NULL,
    --
    p_row_selector                 IN VARCHAR2     DEFAULT NULL,
    --
    p_csv_row_delimiter            IN VARCHAR2     DEFAULT LF,
    p_csv_col_delimiter            IN VARCHAR2     DEFAULT NULL,
    p_csv_enclosed                 IN VARCHAR2     DEFAULT '"',
    --
    p_skip_rows                    IN PLS_INTEGER  DEFAULT NULL,
    p_add_headers_row              IN VARCHAR2     DEFAULT 'N',
    --
    p_nullif                       IN VARCHAR2     DEFAULT NULL,
    p_force_trim_whitespace        IN VARCHAR2     DEFAULT 'N',
    --
    p_file_charset                 IN VARCHAR2     DEFAULT 'AL32UTF8',
    p_max_rows                     IN NUMBER       DEFAULT NULL,
    p_return_rows                  IN NUMBER       DEFAULT NULL,
    --
    p_store_profile_to_collection  IN VARCHAR2     DEFAULT NULL,
    p_xml_namespaces               IN VARCHAR2     DEFAULT NULL,
    --
    p_fix_excel_precision          IN VARCHAR2     DEFAULT 'N' )
    RETURN apex_t_parser_table pipelined;

Parameters

Parameter	Description
`p_content`	The file content to be parsed as a BLOB.
`p_file_name`	The name of the file; only used to derive the file type. Either `p_file_name`, `p_file_type` or `p_file_profile` must be passed in.
`p_file_type`	The type of the file to be parsed. Use this to explicitly pass the file type in. Either `p_file_name`, `p_file_type` or `p_file_profile` must be passed in.
`p_file_profile`	File profile to be used for parsing. The file profile might have been computed in a previous `parse()` or `discover()` invocation. If passed in again, the function skips some profile detection logic and use the passed in profile in order to improve performance.
`p_detect_data_types`	Whether to detect data types (`NUMBER, DATE, TIMESTAMP`) during parsing. If `Y` (default), the function computes the file profile and also add data type information to it. If `N`, all columns are `VARCHAR2`. When no data types are detected, all columns are reported as `VARCHAR2(4000)`.
`p_decimal_char`	Use this decimal character when trying to detect `NUMBER` data types. If not specified,the procedure will auto-detect the decimal character.
`p_xlsx_sheet_name`	For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
`p_row_selector`	For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will: For XML files: Use `//` (first tag under the root tag) as the row selector. For JSON files: Look for a JSON array and use the first array found.
`p_csv_row_delimiter`	Override the default row delimiter for CSV parsing. Limited to one character and defaults to Linefeed (LF). Note that the Linefeed row delimiter also handles "Carriage Return/Linefeed" (CRLF).
`p_csv_col_delimiter`	Use a specific CSV column delimiter. If omitted, the function will detect the column delimiter based on the first row contents.
`p_csv_enclosed`	Override the default enclosure character for CSV parsing.
`p_skip_rows`	Skip the first N rows when parsing.
`p_add_headers_row`	For XML, JSON: Emit the column headers (tag, attr names) as the first row.
`p_nullif`	Similar to SQL `NULLIF` function. If the column has this value, return `NULL`.
`p_force_trim_whitespace`	Whether to force trim enquoted whitespace from parsed values.
`p_file_charset`	Encoding of the file to parse. Defaults to `AL32UTF8` if omitted or `NULL` is explicitly passed in.
`p_max_rows`	Stop parsing after `p_max_rows` have been returned.
`p_return_rows`	Amount of rows to return. This is useful when the parser parses more rows (for data type detection) than it is supposed to return. When the specified amount of rows have been emitted, the function will continue parsing (and refining the detected data types) until `p_max_rows` has been reached, or until the `rownum` < x clause of the SQL query kicks in and stops execution.
`p_store_profile_to_collection`	Store the File profile which has been computed during parse into a collection. The collection will be cleared, if it exists. Only be used for computed profiles.
`p_xml_namespaces`	XML namespaces to use when parsing XML files.
`p_fix_excel_precision`	Whether to round numbers in XLSX files to 15 significant digits. This is useful for XLSX files generated by Microsoft Excel. Excel stores numeric values as floating point numbers with a maximum of 15 significant digits. For calculation results, this can lead to rounding issues, which are fixed using this parameter. See also: Floating-point arithmetic may give inaccurate results in Excel at Microsoft 365.

Returns

Returns rows of the apex_t_parser_row type.

LINE_NUMBER COL001 COL002 COL003 COL004 ... COL300

Example

select line_number, col001,col002,col003,col004,col005,col006,col007,col008
   from table( 
              apex_data_parser.parse(
                  p_content         => {BLOB containing XLSX spreadsheet},
                  p_file_name       => 'test.xlsx',
                  p_xlsx_sheet_name => 'sheet1.xml') ) ;

LINE_NUMBER COL001   COL002       COL003       COL004   COL005          COL006   COL007       COL008 
----------- -------- ------------ ------------ -------- --------------- -------- ------------ --------
          1 0        First Name   Last Name    Gender   Country         Age      Date         Id
          2 1        Dulce        Abril        Female   United States   32       15/10/2017   1562
          3 2        Mara         Hashimoto    Female   Great Britain   25       16/08/2016   1582
          4 3        Philip       Gent         Male     France          36       21/05/2015   2587
          5 4        Kathleen     Hanner       Female   United States   25       15/10/2017   3549
          6 5        Nereida      Magwood      Female   United States   58       16/08/2016   2468
          7 6        Gaston       Brumm        Male     United States   24       21/05/2015   2554
          8 7        Etta         Hurn         Female   Great Britain   56       15/10/2017   3598
          9 8        Earlean      Melgar       Female   United States   27       16/08/2016   2456
         10 9        Vincenza     Weiland      Female   United States   40       21/05/2015   6548
          : :        :            :            :        :               :        :            :

select line_number, col001,col002,col003,col004,col005,col006,col007,col008
   from table( 
              apex_data_parser.parse(
                  p_content         => {BLOB containing JSON file},
                  p_file_name       => 'test.json') ) ;

LINE_NUMBER COL001    COL002   COL003                                COL004          COL005
----------- --------- ---------------------------------------------- --------------- --------------
          1 Feature   1.5      41km E of Cape Yakataga, Alaska       1536513727239   1536514117117
          2 Feature   0.21     11km ENE of Aguanga, CA               1536513299520   1536513521231
          3 Feature   1.84     5km SSW of Pahala, Hawaii             1536513262940   1536513459610
          4 Feature   2.55     9km W of Volcano, Hawaii              1536513100890   1536513446680
          5 Feature   1.3      62km ESE of Cape Yakataga, Alaska     1536512917361   1536513322236
          6 Feature   1.79     7km SW of Tiptonville, Tennessee      1536512379690   1536512668010
          7 Feature   1.9      126km NNW of Arctic Village, Alaska   1536512346186   1536512846567
          8 Feature   1.4      105km NW of Arctic Village, Alaska    1536512140162   1536512846334