22.4 DISCOVER Function
This is a function to discover the column profile of a file. This function calls parse()
and then returns the generated file profile. This function is a shortcut which can be used instead of first calling parse()
and then get_file_profile()
.
Syntax
APEX_DATA_PARSER.DISCOVER (
p_content IN BLOB,
p_file_name IN VARCHAR2,
--
p_decimal_char IN VARCHAR2 DEFAULT NULL,
--
p_xlsx_sheet_name IN VARCHAR2 DEFAULT NULL,
--
p_row_selector IN VARCHAR2 DEFAULT NULL,
--
p_csv_row_delimiter IN VARCHAR2 DEFAULT LF,
p_csv_col_delimiter IN VARCHAR2 DEFAULT NULL,
p_csv_enclosed IN VARCHAR2 DEFAULT '"',
--
p_skip_rows IN PLS_INTEGER DEFAULT NULL,
--
p_nullif IN VARCHAR2 DEFAULT NULL,
p_force_trim_whitespace IN VARCHAR2 DEFAULT 'Y',
--
p_file_charset IN VARCHAR2 DEFAULT 'AL32UTF8',
p_max_rows IN NUMBER DEFAULT 200,
--
p_xml_namespaces IN VARCHAR2 DEFAULT NULL )
RETURN CLOB;
Parameter
Parameter | Description |
---|---|
p_content |
The file content to be parsed as a BLOB. |
p_file_name |
The name of the file used to derive the file type. |
p_decimal_char |
Use this decimal character when trying to detect NUMBER data types. If not specified, the procedure will auto-detect the decimal character.
|
p_xlsx_sheet_name |
For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found. |
p_row_selector |
Row selector to use for XML or JSON files. Pointer to the array/list of rows within the JSON or XML file. If omitted, the function will:
|
p_csv_row_delimiter |
Override the default row delimiter for CSV parsing. |
p_csv_col_delimiter |
Use a specific CSV column delimiter. If omitted, the function detects the column delimiter based on the first row contents. |
p_csv_enclosed |
Override the default enclosure character for CSV parsing. |
p_skip_rows |
The amount of rows to skip before parsing. For XML and JSON parsing to get the same behavior as for CSV and XLSX parsing. |
p_nullif |
Similar to SQL NULLIF function: If the column has this value, return NULL.
|
p_force_trim_whitespace |
Whether to force trim enquoted whitespace from parsed values. |
p_file_charset |
File encoding, if not UTF-8 (AL32UTF8) .
|
p_max_rows |
Stop discovery after P_MAX_ROWS rows have been processed.
|
p_xml_namespaces |
XML namespaces to use when parsing XML files. |
Returns
CLOB containing the file profile in JSON format.
Example
select apex_data_parser.discover(
p_content => {BLOB containing XLSX file},
p_file_name=>'large.xlsx' ) as profile_json
from dual;
PROFILE_JSON
-----------------------------------------------------------
{
"file-encoding" : "AL32UTF8",
"single-row" : false,
"file-type" : 1,
"parsed-rows" : 2189,
"columns" : [
{
"name" : "C0",
"format-mask" : "",
"selector" : "",
"data-type" : 2
},
{
"selector" : "",
"format-mask" : "",
"data-type" : 1,
"name" : "FIRST_NAME"
},
{
"name" : "LAST_NAME",
"format-mask" : "",
"selector" : "",
"data-type" : 1
},
:
{
"name" : "DATE_",
"format-mask" : "DD\"/\"MM\"/\"YYYY",
"data-type" : 3,
"selector" : ""
},
{
"format-mask" : "",
"selector" : "",
"data-type" : 2,
"name" : "ID"
}
],
"row-selector" : "",
"headings-in-first-row" : true,
"xslx-worksheet" : "sheet1.xml",
"csv-delimiter" : ""
}
Parent topic: APEX_DATA_PARSER