22.4 DISCOVER Function

This is a function to discover the column profile of a file. This function calls parse() and then returns the generated file profile. This function is a shortcut which can be used instead of first calling parse() and then get_file_profile().

Syntax

APEX_DATA_PARSER.DISCOVER (
    p_content                   IN BLOB,
    p_file_name                 IN VARCHAR2,
    --
    p_decimal_char              IN VARCHAR2     DEFAULT NULL,
    --
    p_xlsx_sheet_name           IN VARCHAR2     DEFAULT NULL,
    --
    p_row_selector              IN VARCHAR2     DEFAULT NULL,
    --
    p_csv_row_delimiter         IN VARCHAR2     DEFAULT LF,
    p_csv_col_delimiter         IN VARCHAR2     DEFAULT NULL,
    p_csv_enclosed              IN VARCHAR2     DEFAULT '"',
    --
    p_skip_rows                 IN PLS_INTEGER  DEFAULT NULL,
    --
    p_nullif                    IN VARCHAR2     DEFAULT NULL,
    p_force_trim_whitespace     IN VARCHAR2     DEFAULT 'Y',
    --
    p_file_charset              IN VARCHAR2     DEFAULT 'AL32UTF8',
    p_max_rows                  IN NUMBER       DEFAULT 200,
    --
    p_xml_namespaces            IN VARCHAR2     DEFAULT NULL )
    RETURN CLOB;

Parameter

Parameter Description
p_content The file content to be parsed as a BLOB.
p_file_name The name of the file used to derive the file type.
p_decimal_char Use this decimal character when trying to detect NUMBER data types. If not specified, the procedure will auto-detect the decimal character.
p_xlsx_sheet_name For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
p_row_selector

Row selector to use for XML or JSON files. Pointer to the array/list of rows within the JSON or XML file.

If omitted, the function will:

  • For XML files: Use /*/* (first tag under the root tag) as the row selector.
  • For JSON files: Look for a JSON array and use the first array found.
p_csv_row_delimiter Override the default row delimiter for CSV parsing.
p_csv_col_delimiter Use a specific CSV column delimiter. If omitted, the function detects the column delimiter based on the first row contents.
p_csv_enclosed Override the default enclosure character for CSV parsing.
p_skip_rows The amount of rows to skip before parsing. For XML and JSON parsing to get the same behavior as for CSV and XLSX parsing.
p_nullif Similar to SQL NULLIF function: If the column has this value, return NULL.
p_force_trim_whitespace Whether to force trim enquoted whitespace from parsed values.
p_file_charset File encoding, if not UTF-8 (AL32UTF8).
p_max_rows Stop discovery after P_MAX_ROWS rows have been processed.
p_xml_namespaces XML namespaces to use when parsing XML files.

Returns

CLOB containing the file profile in JSON format.

Example

select apex_data_parser.discover(
           p_content => {BLOB containing XLSX file},
           p_file_name=>'large.xlsx' ) as profile_json
from dual;


PROFILE_JSON
-----------------------------------------------------------
{
    "file-encoding" : "AL32UTF8",
    "single-row" : false,
    "file-type" : 1,
    "parsed-rows" : 2189,
    "columns" : [
       {
          "name" : "C0",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2
       },
       {
          "selector" : "",
          "format-mask" : "",
          "data-type" : 1,
          "name" : "FIRST_NAME"
       },
       {
          "name" : "LAST_NAME",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 1
       },

       :

       {
          "name" : "DATE_",
          "format-mask" : "DD\"/\"MM\"/\"YYYY",
          "data-type" : 3,
          "selector" : ""
       },
       {
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2,
          "name" : "ID"
       }
    ],
    "row-selector" : "",
    "headings-in-first-row" : true,
    "xslx-worksheet" : "sheet1.xml",
    "csv-delimiter" : ""
 }