11.3 XML Section Searching with Oracle Text

Like HTML documents, XML documents have tagged text that you can use to define blocks of text for section searching. You can search the contents of a section with the WITHIN or INPATH operators.

The following sections describe the different types of XML searching:

11.3.1 Automatic Sectioning

To set up your indexing operation to automatically create sections from XML documents, use the AUTO_SECTION_GROUP section group. The system creates zone sections for XML tags. Attribute sections are created for the tags that have attributes and for the sections named in the form tag@attribute.

For example, the following statement uses the AUTO_SECTION_GROUP to create the myindex index on a column containing the XML files:

CREATE INDEX myindex
ON xmldocs(xmlfile)
 INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ctxsys.default_datastore 
             filter ctxsys.null_filter 
             section group ctxsys.auto_section_group'
           );

11.3.2 Attribute Searching

You can search XML attribute text in one of two ways:

  • Creating Attribute Sections

    Create attribute sections with CTX_DDL.ADD_ATTR_SECTION and then index with XML_SECTION_GROUP. If you use AUTO_SECTION_GROUP when you index, attribute sections are created automatically. You can query attribute sections with the WITHIN operator.

    Consider an XML file that defines the BOOK tag with a TITLE attribute:

    <BOOK TITLE="Tale of Two Cities"> 
      It was the best of times. 
    </BOOK> 
    

    To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section:

    begin
    ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
    ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');
    end;
    

    To index:

    CREATE INDEX myindex
    ON xmldocs(xmlfile)
    INDEXTYPE IS ctxsys.context
    PARAMETERS ('datastore ctxsys.default_datastore 
                 filter ctxsys.null_filter 
                 section group myxmlgroup'
               );
    

    To query the booktitle XML attribute section:

    'Cities within booktitle'
  • Searching Attributes with the INPATH Operator

    Index with the PATH_SECTION_GROUP and query attribute text with the INPATH operator.

11.3.3 Document Type Sensitive Sections

For an XML document set that contains the <book> tag declared for different document types, you may want to create a distinct book section for each document type to improve search capability. The following scenario shows you how to create book sections for each document type.

Assume that mydocname1 is declared as an XML document type (root element):

<!DOCTYPE mydocname1 ... [...

Within mydocname1,, the <book> element is declared. For this tag, you can create a section named mybooksec1 that is sensitive to the tag's document type:

begin
ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec1', 'mydocname1(book)');
end;

Assume that mydocname2 is declared as another XML document type (root element):

<!DOCTYPE mydocname2 ... [...

Within mydocname2,, the <book> element is declared. For this tag, you can create a section named mybooksec2 that is sensitive to the tag's document type:

begin
ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec2', 'mydocname2(book)');
end;

To query within the mybooksec1 section, use WITHIN:

'oracle within mybooksec1'

11.3.4 Path Section Searching

XML documents can have parent-child tag structures such as:

<A> <B> <C> dog </C> </B> </A>

In this scenario, tag C is a child of tag B, which is a child of tag A.

With Oracle Text, you can search paths with PATH_SECTION_GROUP. This section group enables you to specify direct parentage in queries, such as to find all documents that contain the term dog in element C, which is a child of element B, and so on.

With PATH_SECTION_GROUP, you can also perform attribute value searching and attribute equality testing.

The new operators associated with this feature are

  • INPATH

  • HASPATH

This section contains the following topics.

11.3.4.1 Creating an Index with PATH_SECTION_GROUP

To enable path section searching, index your XML document set with PATH_SECTION_GROUP. For example:

Create the preference.

begin
ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');
end;

Create the index.

CREATE INDEX myindex
ON xmldocs(xmlfile)
INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore ctxsys.default_datastore 
             filter ctxsys.null_filter 
             section group xmlpathgroup'
           );

When you create the index, you can use the INPATH and HASPATH operators.

11.3.4.2 Top-Level Tag Searching

To find all documents that contain the term dog in the top-level tag <A>:

dog INPATH (/A)

or

dog INPATH(A)

11.3.4.3 Any-Level Tag Searching

To find all documents that contain the term dog in the <A> tag at any level:

dog INPATH(//A)

This query finds the following documents:

<A>dog</A>

and

<C><B><A>dog</A></B></C>

11.3.4.4 Direct Parentage Searching

To find all documents that contain the term dog in a B element that is a direct child of a top-level A element:

dog INPATH(A/B)

This query finds the following XML document:

<A><B>My dog is friendly.</B></A>

but it does not find:

<C><B>My dog is friendly.</B></C>

11.3.4.5 Tag Value Testing

You can test the value of tags. For example, the query:

dog INPATH(A[B="dog"])

Finds the following document:

<A><B>dog</B></A>

But does not find:

<A><B>My dog is friendly.</B></A>

11.3.4.6 Attribute Searching

You can search the content of attributes. For example, the query:

dog INPATH(//A/@B)

Finds the document:

<C><A  B="snoop dog"> </A> </C>

11.3.4.7 Attribute Value Testing

You can test the value of attributes. For example, the query:

California INPATH (//A[@B = "home address"])

Finds the document:

<A B="home address">San Francisco, California, USA</A>

But it does not find:

<A B="work address">San Francisco, California, USA</A>

11.3.4.8 Path Testing

You can test if a path exists with the HASPATH operator. For example, the query:

HASPATH(A/B/C)

finds and returns a score of 100 for the document

<A><B><C>dog</C></B></A>

without the query having to reference dog at all.

11.3.4.9 Section Equality Testing with HASPATH

You can use the HASPATH operator for section quality tests. For example, consider the following query:

dog INPATH A

It finds:

<A>dog</A>

but it also finds:

<A>dog park</A>

To limit the query to the term dog and nothing else, you can use a section equality test with the HASPATH operator. For example,

HASPATH(A="dog")

finds and returns a score of 100 only for the first document, not for the second document.

See Also:

Oracle Text Reference to learn more about using the INPATH and HASPATH operators