DISCOVER Function

This is a function to discover the column profile of a file. This function calls parse() and then returns the generated file profile. This function is a shortcut which can be used instead of first calling parse() and then get_file_profile().

Syntax

APEX_DATA_PARSER.DISCOVER (
    p_content                   IN BLOB,
    p_file_name                 IN VARCHAR2,
    --
    p_decimal_char              IN VARCHAR2     DEFAULT NULL,
    --
    p_xlsx_sheet_name           IN VARCHAR2     DEFAULT NULL,
    --
    p_row_selector              IN VARCHAR2     DEFAULT NULL,
    --
    p_csv_row_delimiter         IN VARCHAR2     DEFAULT LF,
    p_csv_col_delimiter         IN VARCHAR2     DEFAULT NULL,
    p_csv_enclosed              IN VARCHAR2     DEFAULT '"',
    --
    p_skip_rows                 IN PLS_INTEGER  DEFAULT NULL,
    --
    p_nullif                    IN VARCHAR2     DEFAULT NULL,
    p_force_trim_whitespace     IN VARCHAR2     DEFAULT 'Y',
    --
    p_file_charset              IN VARCHAR2     DEFAULT 'AL32UTF8',
    p_max_rows                  IN NUMBER       DEFAULT 200,
    --
    p_xml_namespaces            IN VARCHAR2     DEFAULT NULL )
    RETURN CLOB;

Parameter

Parameter	Description
`p_content`	The file content to be parsed as a BLOB.
`p_file_name`	The name of the file used to derive the file type.
`p_decimal_char`	Use this decimal character when trying to detect `NUMBER` data types. If not specified, the procedure will auto-detect the decimal character.
`p_xlsx_sheet_name`	For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
`p_row_selector`	Row selector to use for XML or JSON files. Pointer to the array/list of rows within the JSON or XML file. If omitted, the function will: For XML files: Use `//` (first tag under the root tag) as the row selector. For JSON files: Look for a JSON array and use the first array found.
`p_csv_row_delimiter`	Override the default row delimiter for CSV parsing.
`p_csv_col_delimiter`	Use a specific CSV column delimiter. If omitted, the function detects the column delimiter based on the first row contents.
`p_csv_enclosed`	Override the default enclosure character for CSV parsing.
`p_skip_rows`	The amount of rows to skip before parsing. For XML and JSON parsing to get the same behavior as for CSV and XLSX parsing.
`p_nullif`	Similar to SQL `NULLIF` function: If the column has this value, return NULL.
`p_force_trim_whitespace`	Whether to force trim enquoted whitespace from parsed values.
`p_file_charset`	File encoding, if not `UTF-8 (AL32UTF8)`.
`p_max_rows`	Stop discovery after `P_MAX_ROWS` rows have been processed.
`p_xml_namespaces`	XML namespaces to use when parsing XML files.

Returns

CLOB containing the file profile in JSON format.

Example

select apex_data_parser.discover(
           p_content => {BLOB containing XLSX file},
           p_file_name=>'large.xlsx' ) as profile_json
from dual;


PROFILE_JSON
-----------------------------------------------------------
{
    "file-encoding" : "AL32UTF8",
    "single-row" : false,
    "file-type" : 1,
    "parsed-rows" : 2189,
    "columns" : [
       {
          "name" : "C0",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2
       },
       {
          "selector" : "",
          "format-mask" : "",
          "data-type" : 1,
          "name" : "FIRST_NAME"
       },
       {
          "name" : "LAST_NAME",
          "format-mask" : "",
          "selector" : "",
          "data-type" : 1
       },

       :

       {
          "name" : "DATE_",
          "format-mask" : "DD\"/\"MM\"/\"YYYY",
          "data-type" : 3,
          "selector" : ""
       },
       {
          "format-mask" : "",
          "selector" : "",
          "data-type" : 2,
          "name" : "ID"
       }
    ],
    "row-selector" : "",
    "headings-in-first-row" : true,
    "xslx-worksheet" : "sheet1.xml",
    "csv-delimiter" : ""
 }