About Importing Metadata from XML Data Sources
Learn how to import metadata from Extensible Markup Language (XML) documents.
This section contains the following topics:
About Using XML as a Data Source
The Oracle BI Server supports the use of XML data as a data source for the Physical layer in the repository.
Depending on the method used to access XML data sources, a URL might represent a data source.
The following are data sources:
-
A static XML file or HTML file that contains XML data islands on the Internet including intranet or extranet. For example:
tap://216.217.17.176/[DE0A48DE-1C3E-11D4-97C9-00105AA70303].XML
-
Dynamic XML generated from a server site. For example:
tap://www.aspserver.com/example.asp
-
An XML file or HTML file that contains XML data islands on a local or network drive. For example:
d:\xmldir\example.xml
d:\htmldir\island.htm
You can also specify a directory path for local or network XML files, or you can use the asterisk ( * ) as a wildcard with the filenames. If you specify a directory path without a filename specification like
d:/xmldir
, all files with the XML suffix are imported. For example:d:\xmldir\
d:\xmldir\exam*.xml
d:\htmldir\exam*.htm
d:\htmldir\exam*.html
-
An HTML file that contains tables are wrapped in opening and closing
<table>
and</table>
tags. The HTML file may reside on the Internet including intranet or extranet, or on a local or network drive, see About Using HTML Tables as a Data Source.
URLs can include repository or session variables, providing support for HTTP data sources that accept user IDs and passwords embedded in the URL. For example:
http://somewebserver/cgi.pl?userid=valueof(session_variable1)&password= valueof(session_variable2)
This functionality also lets you create an XML data source with a location that's dynamically determined by some run-time parameters, see Use Variables in the Oracle BI Repository.
If the Oracle BI Server needs to access any non-local files, for example, network files or files on the Internet, you must run the Oracle BI Server using a valid user ID and password with sufficient network privileges to access these remote files.
Import Metadata from XML Data Sources Using the XML Gateway
When you use Oracle BI Server XML Gateway, the metadata import process flattens the XML document to a tabular form, and creates the XML file name using the stem of the table name. The second level element in the XML document is set as the row delimiter.
The stem is the filename without the suffix. All leaf nodes are imported as columns in the table. The hierarchical access path to leaf nodes is also imported.
The Oracle BI Server XML Gateway uses the metadata information contained in an XML schema. The XML schema is contained within the XML document, or is referenced within the root element of the XML document.
When a schema isn't available, all XML data is imported as text data. In building the repository, you can alter the data types of the columns in the Physical layer, overriding the data types for the corresponding columns defined in the schema. The gateway converts the incoming data to the type you specified in the Physical layer. You can also map the text data type to other data types in the Business Model and Mapping layer of the Administration Tool using the CAST
operator.
The Oracle BI Server XML Gateway doesn't support:
-
Resolution of external references contained in an XML document, other than a reference to an external XML schema, see Examples of XML Documents Generated by the Oracle BI Server XML Gateway.
-
Element and attribute inheritance contained within the Microsoft XML schema.
-
Element types of a mixed content model such as XML elements that contain a mixture of elements and CDATA such as
<p>hello <b>Joe</b>, how are you doing?</p>
.
If you're importing metadata into an existing database in the Physical layer, confirm that the COUNT_STAR_SUPPORTED
option is selected in the Database properties dialog. If you import metadata without selecting the COUNT_STAR_SUPPORTED option, the Update Row Count option doesn't display in the right-click menu for the database's physical tables.
The Map to Logical Model and Publish to Warehouse screens are available only for ADF data sources.
URLs for the XML data source can include repository or session variables. If you browse for the XML data source, you can select a single file. For XML documents, you must specify the suffix .xml
as part of the file name in the URL. If you don't specify the xml suffix, the documents are treated as HTML documents.
You can type an optional user name and password for connections to HTTP sites that employ the HTTP Basic Authentication security mode. The Oracle BI Server XML Gateway also supports Secure HTTP protocol and Integrated Windows Authentication (for Windows 2000), formerly called NTLM or Windows NT Challenge/Response authentication.
After you import XML data, you must adjust connection pool settings. See Create or Change Connection Pools. You can do the following:
-
In the Connection Pool dialog, type a name and optional description for the connection on the General tab.
-
Click the XML tab to set additional connection properties, including the URL refresh interval and the length of time to wait for a URL to load before timing out.
Because XML data sources are updated frequently and in real time, you can specify a refresh interval for Oracle BI Server XML Gateway data sources. The default timeout interval for queries (URL loading time-out) is 15 minutes. See About the Refresh Interval for XML Data Sources in Administering Oracle Analytics Server.
Examples of XML Documents Generated by the Oracle BI Server XML Gateway
These examples show sample XML documents and the corresponding columns that are generated by the Oracle BI Server XML Gateway.
XML Schema Contained in an External File
The following sample XML data document (mytest.xml) references an XML schema contained in an external file. The schema file is shown following the data document. The generated XML schema information available for import to the repository is shown at the end.
<?xml version="1.0"?> <test xmlns="x-schema:mytest_sch.xml"> <row> <p1>0</p1> <p2 width="5"> <p3>hi</p3> <p4> <p6>xx0</p6> <p7>yy0</p7> </p4> <p5>zz0</p5> </p2> </row> <row> <p1>1</p1> <p2 width="6"> <p3>how are you</p3> <p4> <p6>xx1</p6> <p7>yy1</p7> </p4> <p5>zz1</p5> </p2> </row> <row> <p1>a</p1> <p2 width="7"> <p3>hi</p3> <p4> <p6>xx2</p6> <p7>yy2</p7> </p4> <p5>zz2</p5> </p2> </row> <row> <p1>b</p1> <p2 width="8"> <p3>how are they</p3> <p4> <p6>xx3</p6> <p7>yy3</p7> </p4> <p5>zz2</p5> </p2> </row> </test>
The corresponding schema file follows:
<Schema xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="test" content="eltOnly" order="many"> <element type="row"/> </ElementType> <ElementType name="row" content="eltOnly" order="many"> <element type="p1"/> <element type="p2"/> </ElementType> <ElementType name="p2" content="eltOnly" order="many"> <AttributeType name="width" dt:type="int" /> <attribute type="width" /> <element type="p3"/> <element type="p4"/> <element type="p5"/> </ElementType> <ElementType name="p4" content="eltOnly" order="many"> <element type="p6"/> <element type="p7"/> </ElementType> <ElementType name="p1" content="textOnly" dt:type="string"/> <ElementType name="p3" content="textOnly" dt:type="string"/> <ElementType name="p5" content="textOnly" dt:type="string"/> <ElementType name="p6" content="textOnly" dt:type="string"/> <ElementType name="p7" content="textOnly" dt:type="string"/> </Schema>
The name of the table generated from the preceding XML data document (mytest.xml) would be mytest
and the column names would be p1
, p3
, p6
, p7
, p5
, and width
.
In addition, to preserve the context in which each column occurs in the document and to distinguish between columns derived from XML elements with identical names but appearing in different contexts, a list of fully qualified column names is generated, based on the XPath proposal of the World Wide Web Consortium, as follows:
//test/row/p1 //test/row/p2/p3 //test/row/p2/p4/p6 //test/row/p2/p4/p7 //test/row/p2/p5 //test/row/p2@width
Nested Table Structures in an XML Document
The following example is a more complex example that demonstrates the use of nested table structures in an XML document. You can omit references to an external schema file, in which case all elements are treated as being of the Varchar
character type.
===Invoice.xml=== <INVOICE> <CUSTOMER> <CUST_ID>1</CUST_ID> <FIRST_NAME>Nancy</FIRST_NAME> <LAST_NAME>Fuller</LAST_NAME> <ADDRESS> <ADD1>507 - 20th Ave. E.,</ADD1> <ADD2>Apt. 2A</ADD2> <CITY>Seattle</CITY> <STATE>WA</STATE> <ZIP>98122</ZIP> </ADDRESS> <PRODUCTS> <CATEGORY> <CATEGORY_ID>CAT1</CATEGORY_ID> <CATEGORY_NAME>NAME1</CATEGORY_NAME> <ITEMS> <ITEM> <ITEM_ID>1</ITEM_ID> <NAME></NAME> <PRICE>0.50</PRICE> <QTY>2000</QTY> </ITEM> <ITEM> <ITEM_ID>2</ITEM_ID> <NAME>SPRITE</NAME> <PRICE>0.30</PRICE> <QTY></QTY> </ITEM> </ITEMS> </CATEGORY> <CATEGORY> <CATEGORY_ID>CAT2</CATEGORY_ID> <CATEGORY_NAME>NAME2</CATEGORY_NAME> <ITEMS> <ITEM> <ITEM_ID>11</ITEM_ID> <NAME>ACOKE</NAME> <PRICE>1.50</PRICE> <QTY>3000</QTY> </ITEM> <ITEM> <ITEM_ID>12</ITEM_ID> <NAME>SOME SPRITE</NAME> <PRICE>3.30</PRICE> <QTY>2000</QTY> </ITEM> </ITEMS> </CATEGORY> </PRODUCTS> </CUSTOMER> <CUSTOMER> <CUST_ID>2</CUST_ID> <FIRST_NAME>Andrew</FIRST_NAME> <LAST_NAME>Carnegie</LAST_NAME> <ADDRESS> <ADD1>2955 Campus Dr.</ADD1> <ADD2>Ste. 300</ADD2> <CITY>San Mateo</CITY> <STATE>CA</STATE> <ZIP>94403</ZIP> </ADDRESS> <PRODUCTS> <CATEGORY> <CATEGORY_ID>CAT22</CATEGORY_ID> <CATEGORY_NAME>NAMEA1</CATEGORY_NAME> <ITEMS> <ITEM> <ITEM_ID>122</ITEM_ID> <NAME>DDDCOKE</NAME> <PRICE>11.50</PRICE> <QTY>2</QTY> </ITEM> <ITEM> <ITEM_ID>22</ITEM_ID> <NAME>PSPRITE</NAME> <PRICE>9.30</PRICE> <QTY>1978</QTY> </ITEM> </ITEMS> </CATEGORY> <CATEGORY> <CATEGORY_ID>CAT24</CATEGORY_ID> <CATEGORY_NAME>NAMEA2</CATEGORY_NAME> <ITEMS> <ITEM> <ITEM_ID>19</ITEM_ID> <NAME>SOME COKE</NAME> <PRICE>1.58</PRICE> <QTY>3</QTY> </ITEM> <ITEM> <ITEM_ID>15</ITEM_ID> <NAME>DIET SPRITE</NAME> <PRICE>9.30</PRICE> <QTY>12000</QTY> </ITEM> </ITEMS> </CATEGORY> </PRODUCTS> </CUSTOMER> <CUSTOMER> <CUST_ID>3</CUST_ID> <FIRST_NAME>Margaret</FIRST_NAME> <LAST_NAME>Leverling</LAST_NAME> <ADDRESS> <ADD1>722 Moss Bay Blvd.</ADD1> <ADD2> </ADD2> <CITY>Kirkland</CITY> <STATE>WA</STATE> <ZIP>98033</ZIP> </ADDRESS> <PRODUCTS> <CATEGORY> <CATEGORY_ID>CAT31</CATEGORY_ID> <CATEGORY_NAME>NAMEA3</CATEGORY_NAME> <ITEMS> <ITEM> <ITEM_ID>13</ITEM_ID> <NAME>COKE33</NAME> <PRICE>30.50</PRICE> <QTY>20033</QTY> </ITEM> <ITEM> <ITEM_ID>23</ITEM_ID> <NAME>SPRITE33</NAME> <PRICE>0.38</PRICE> <QTY>20099</QTY> </ITEM> </ITEMS> </CATEGORY> <CATEGORY> <CATEGORY_ID>CAT288</CATEGORY_ID> <CATEGORY_NAME>NAME H</CATEGORY_NAME> <ITEMS> <ITEM> <ITEM_ID>19</ITEM_ID> <NAME>COLA</NAME> <PRICE>1.0</PRICE> <QTY>3</QTY> </ITEM> <ITEM> <ITEM_ID>18</ITEM_ID> <NAME>MY SPRITE</NAME> <PRICE>8.30</PRICE> <QTY>123</QTY> </ITEM> </ITEMS> </CATEGORY> </PRODUCTS> </CUSTOMER> </INVOICE>
The generated XML schema consists of one table (INVOICE
) with the following column names and their corresponding fully qualified names.
Column | Fully Qualified Name |
---|---|
ADD1 |
//INVOICE/CUSTOMER/ADDRESS/ADD1 |
ADD2 |
//INVOICE/CUSTOMER/ADDRESS/ADD2 |
CITY |
//INVOICE/CUSTOMER/ADDRESS/CITY |
STATE |
//INVOICE/CUSTOMER/ADDRESS/STATE |
ZIP |
//INVOICE/CUSTOMER/ADDRESS/ZIP |
CUST_ID |
//INVOICE/CUSTOMER/CUST_ID |
FIRST_NAME |
//INVOICE/CUSTOMER/FIRST_NAME |
LAST_NAME |
//INVOICE/CUSTOMER/LAST_NAME |
CATEGORY_ID |
//INVOICE/CUSTOMER/PRODUCTS/CATEGORY/CATEGORY_ID |
CATEGORY_NAME |
//INVOICE/CUSTOMER/PRODUCTS/CATEGORY/CATEGORY_NAME |
ITEM_ID |
//INVOICE/CUSTOMER/PRODUCTS/CATEGORY/ITEMS/ITEM/ITEM_ID |
NAME |
//INVOICE/CUSTOMER/PRODUCTS/CATEGORY/ITEMS/ITEM/NAME |
PRICE |
//INVOICE/CUSTOMER/PRODUCTS/CATEGORY/ITEMS/ITEM/PRICE |
QTY |
//INVOICE/CUSTOMER/PRODUCTS/CATEGORY/ITEMS/ITEM/QTY |
Only tags with values are extracted as columns. An XML query generates fully qualified tag names, to help ensure appropriate columns are retrieved.
The following shows the results of a sample query against the INVOICE
table:
SELECT first_name, last_name, price, qty, name FROM invoice ------------------------------------------------------------ FIRST_NAME LAST_NAME PRICE QTY NAME ------------------------------------------------------------ Andrew Carnegie 1.58 3 SOME COKE Andrew Carnegie 11.50 2 DDDCOKE Andrew Carnegie 9.30 12000 DIET SPRITE Andrew Carnegie 9.30 1978 PSPRITE Margar Leverling 0.38 20099 SPRITE33 Margar Leverling 1.0 3 COLA Margar Leverling 30.50 20033 COKE33 Margar Leverling 8.30 123 MY SPRITE Nancy Fuller 0.30 SPRITE Nancy Fuller 0.50 2000 Nancy Fuller 1.50 3000 ACOKE Nancy Fuller 3.30 2000 SOME SPRITE ------------------------------------------------------------ Row count: 12
About Using HTML Tables as a Data Source
The Oracle BI Server XML Gateway also supports the use of tables in HTML files as a data source. The HTML file can be identified as a URL pointing to a file on the internet, including intranet or extranet, or as a file on a local or network drive.
Even though tables, defined by the <table>
and </table>
tag pair, are native constructs of the HTML 4.0 specification, they're often used by Web designers as a general formatting device to achieve specific visual effects rather than as a data structure. The Oracle BI Server XML Gateway is currently the most effective in extracting tables that include specific column headers, defined by <th>
and </th>
tag pairs.
For tables that don't contain specific column headers, the Oracle BI Server XML Gateway employs some simple heuristics to make a best effort to determine the portions of an HTML file that appear to be genuine data tables.
The following is a sample HTML file with one table.
<html> <body> <table border=1 cellpadding=2 cellspacing=0> <tr> <th colspan=1>Transaction</th> <th colspan=2>Measurements</th> </tr> <tr> <th>Quality</th> <th>Count</th> <th>Percent</th> </tr> <tr> <td>Failed</td> <td>66,672</td> <td>4.1%</td> </tr> <tr> <td>Poor</td> <td>126,304</td> <td>7.7%</td> </tr> <tr> <td>Warning</td> <td>355,728</td> <td>21.6%</td> </tr> <tr> <td>OK</td> <td>1,095,056</td> <td>66.6%</td> </tr> <tr> <td colspan=1>Grand Total</td> <td>1,643,760</td> <td>100.0%</td> </tr> </table> </body> </html>
The table name is derived from the HTML filename, and the column names are formed by concatenating the headings, defined by the <th> and </th> tag pairs, for the corresponding columns, separated by an underscore.
Assuming that our sample file is named 18.htm
, the table name would include 18_0
, because it's the first table in that HTML file, with the following column names and their corresponding fully qualified names:
Column | Fully Qualified Name |
---|---|
Transaction_Quality |
\\18_0\Transaction_Quality |
Measurements_Count |
\\18_0\Measurements_Count |
Measurements_Percent |
\\18_0\Measurements_Percent |
If the table column headings appear in more than one row, the column names are formed by concatenating the corresponding field contents of those header rows.
For tables without any heading tag pairs, the Oracle BI Server XML Gateway assumes the field values, as delimited by the <td>
and </td>
tag pairs, in the first row to be the column names. The columns are named by the order in which they appear such as c0, c1, and c2.
See Import Metadata from XML Data Sources Using XML ODBC and Examples of XML Documents.
Import Metadata from XML Data Sources Using XML ODBC
Learn how to import metadata using ODBC.
Using the XML ODBC database type, you can access XML data sources through an ODBC interface. The data types of the XML elements representing physical columns in physical tables are derived from the data types of the XML elements as defined in the XML schema.
In the absence of a proper XML schema, the default data type of string is used. Data Type settings in the Physical layer don't override those defined in the XML data sources. When accessing XML data without XML schema, use the CAST
operator to perform data type conversions in the Business Model and Mapping layer of the Administration Tool.
If you're importing metadata into an existing database in the Physical layer, confirm that the COUNT_STAR_SUPPORTED
option is selected in the Features tab of the Database properties dialog. If you import metadata without selecting the COUNT_STAR_SUPPORTED
option, the Update Row Count option doesn't display in the right-click menu for the database's physical tables.
When you import through the Oracle BI Server, the data source name (DSN) entries are on the Oracle BI Server computer, not on the local computer.
Example of an XML ODBC Data Source
The example shows an XML ODBC data source in the Microsoft ADO persisted file format.
The example in this section shows an XML ODBC data source in the Microsoft ADO persisted file format. Both the data and the schema could be contained inside the same document.
XML ODBC Example
<xml xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882' xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882' xmlns:rs='urn:schemas-microsoft-com:rowset' xmlns:z='#RowsetSchema'> <s:Schema id='RowsetSchema'> <s:ElementType name='row' content='eltOnly' rs:CommandTimeout='30' rs:updatable='true'> <s:AttributeType name='ShipperID' rs:number='1' rs:writeunknown='true' rs:basecatalog='Paint' rs:basetable='Shippers' rs:basecolumn='ShipperID'> <s:datatype dt:type='i2' dt:maxLength='2' rs:precision='5' rs:fixedlength='true' rs:benull='false'/> </s:AttributeType> <s:AttributeType name='CompanyName' rs:number='2' rs:writeunknown='true' rs:basecatalog='Paint' rs:basetable='Shippers' rs:basecolumn='CompanyName'> <s:datatype dt:type='string' rs:dbtype='str' dt:maxLength='40' rs:benull='false'/> </s:AttributeType> <s:AttributeType name='Phone' rs:number='3' rs:nullable='true' rs:writeunknown='true' rs:basecatalog='Paint' rs:basetable='Shippers' rs:basecolumn='Phone'> <s:datatype dt:type='string' rs:dbtype='str' dt:maxLength='24' rs:fixedlength='true'/> </s:AttributeType> <s:extends type='rs:rowbase'/> </s:ElementType> </s:Schema> <rs:data> <z:row ShipperID='1' CompanyName='Speedy Express' Phone='(503) 555-9831 '/> <z:row ShipperID='2' CompanyName='United Package' Phone='(503) 555-3199 '/> <z:row ShipperID='3' CompanyName='Federal Shipping' Phone='(503) 555-9931 '/> </rs:data> </xml>
Examples of XML Documents
These examples of several different situations and explains how the Oracle BI Server XML access method handles those situations.
-
The XML documents 83.xml and 8_sch.xml demonstrate the use of the same element declarations in different scope. For example, <p3> could appear within <p2> as well as within <p4>.
Because the element <p3> in the preceding examples appears in two different scopes, each element is given a distinct column name by appending an index number to the second occurrence of the element during the import process. In this case, the second occurrence becomes p3_1. If <p3> occurs in additional contexts, they become p3_2, p3_3.
-
The XML documents 83.xml and 84.xml (shown in demonstrate that multiple XML files can share the same schema (8_sch.xml).
83.xml
===83.xml=== <?xml version="1.0"?> <test xmlns="x-schema:8_sch.xml">| <row> <p1>0</p1> <p2 width="5" height="2"> <p3>hi</p3> <p4> <p3>hi</p3> <p6>xx0</p6> <p7>yy0</p7> </p4> <p5>zz0</p5> </p2> </row> <row> <p1>1</p1> <p2 width="6" height="3"> <p3>how are you</p3> <p4> <p3>hi</p3> <p6>xx1</p6> <p7>yy1</p7> </p4> <p5>zz1</p5> </p2> </row> </test>
8_sch.xml
===8_sch.xml=== <Schema xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <AttributeType name="height" dt:type="int" /> <ElementType name="test" content="eltOnly" order="many"> <AttributeType name="height" dt:type="int" /> <element type="row"/> </ElementType> <ElementType name="row" content="eltOnly" order="many"> <element type="p1"/> <element type="p2"/> </ElementType> <ElementType name="p2" content="eltOnly" order="many"> <AttributeType name="width" dt:type="int" /> <AttributeType name="height" dt:type="int" /> <attribute type="width" /> <attribute type="height" /> <element type="p3"/> <element type="p4"/> <element type="p5"/> </ElementType> <ElementType name="p4" content="eltOnly" order="many"> <element type="p3"/> <element type="p6"/> <element type="p7"/> </ElementType> <ElementType name="test0" content="eltOnly" order="many"> <element type="row"/> </ElementType> <ElementType name="p1" content="textOnly" dt:type="string"/> <ElementType name="p3" content="textOnly" dt:type="string"/> <ElementType name="p5" content="textOnly" dt:type="string"/> <ElementType name="p6" content="textOnly" dt:type="string"/> <ElementType name="p7" content="textOnly" dt:type="string"/> </Schema>
84.xml
===84.xml=== <?xml version="1.0"?> <test0 xmlns="x-schema:8_sch.xml"> <row> <p1>0</p1> <p2 width="5" height="2"> <p3>hi</p3> <p4> <p3>hi</p3> <p6>xx0</p6> <p7>yy0</p7> </p4> <p5>zz0</p5> </p2> </row> <row> <p1>1</p1> <p2 width="6" height="3"> <p3>how are you</p3> <p4> <p3>hi</p3> <p6>xx1</p6> <p7>yy1</p7> </p4> <p5>zz1</p5> </p2> </row> </test0>
Island2.htm
===island2.htm=== <HTML> <HEAD> <TITLE>HTML Document with Data Island</TITLE> </HEAD> <BODY> <p>This is an example of an XML data island in I.E. 5</p> <XML ID="12345"> test> <row> <field1>00</field1> <field2>01</field2> </row> <row> <field1>10</field1> <field2>11</field2> </row> <row> <field1>20</field1> <field2>21</field2> </row> </test> </XML> <p>End of first example.</p> <XML ID="12346"> <test> <row> <field11>00</field11> <field12>01</field12> </row> <row> <field11>10</field11> <field12>11</field12> </row> <row> <field11>20</field11> <field12>21</field12> </row> </test> </XML> <p>End of second example.</p> </BODY> </HTML>