11.3 XML Section Searching with Oracle Text
Like HTML documents, XML documents have tagged text that you can use to define blocks of text for section searching. You can search the contents of a section with the WITHIN
or INPATH
operators.
The following sections describe the different types of XML searching:
11.3.1 Automatic Sectioning
To set up your indexing operation to automatically create sections from XML documents, use the AUTO_SECTION_GROUP
section group. The system creates zone sections for XML tags. Attribute sections are created for the tags that have attributes and for the sections named in the form tag@attribute.
For example, the following statement uses the AUTO_SECTION_GROUP
to create the myindex index on a column containing the XML files:
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('datastore ctxsys.default_datastore filter ctxsys.null_filter section group ctxsys.auto_section_group' );
11.3.2 Attribute Searching
You can search XML attribute text in one of two ways:
-
Creating Attribute Sections
Create attribute sections with
CTX_DDL
.ADD_ATTR_SECTION
and then index withXML_SECTION_GROUP.
If you useAUTO_SECTION_GROUP
when you index, attribute sections are created automatically. You can query attribute sections with theWITHIN
operator.Consider an XML file that defines the
BOOK
tag with aTITLE
attribute:<BOOK TITLE="Tale of Two Cities"> It was the best of times. </BOOK>
To define the title attribute as an attribute section, create an
XML_SECTION_GROUP
and define the attribute section:begin ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title'); end;
To index:
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('datastore ctxsys.default_datastore filter ctxsys.null_filter section group myxmlgroup' );
To query the booktitle XML attribute section:
'Cities within booktitle'
-
Searching Attributes with the INPATH Operator
Index with the
PATH_SECTION_GROUP
and query attribute text with theINPATH
operator.
See Also:
11.3.3 Document Type Sensitive Sections
For an XML document set that contains the <book>
tag declared for different document types, you may want to create a distinct book section for each document type to improve search capability. The following scenario shows you how to create book sections for each document type.
Assume that mydocname1
is declared as an XML document type (root element):
<!DOCTYPE mydocname1 ... [...
Within mydocname1,
, the <book>
element is declared. For this tag, you can create a section named mybooksec1
that is sensitive to the tag's document type:
begin
ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec1', 'mydocname1(book)');
end;
Assume that mydocname2
is declared as another XML document type (root element):
<!DOCTYPE mydocname2 ... [...
Within mydocname2,
, the <book>
element is declared. For this tag, you can create a section named mybooksec2
that is sensitive to the tag's document type:
begin
ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec2', 'mydocname2(book)');
end;
To query within the mybooksec1
section, use WITHIN
:
'oracle within mybooksec1'
11.3.4 Path Section Searching
XML documents can have parent-child tag structures such as:
<A> <B> <C> dog </C> </B> </A>
In this scenario, tag C is a child of tag B, which is a child of tag A.
With Oracle Text, you can search paths with PATH_SECTION_GROUP.
This section group enables you to specify direct parentage in queries, such as to find all documents that contain the term dog in element C, which is a child of element B, and so on.
With PATH_SECTION_GROUP,
you can also perform attribute value searching and attribute equality testing.
The new operators associated with this feature are
-
INPATH
-
HASPATH
This section contains the following topics.
11.3.4.1 Creating an Index with PATH_SECTION_GROUP
To enable path section searching, index your XML document set with PATH_SECTION_GROUP.
For example:
Create the preference.
begin ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP'); end;
Create the index.
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('datastore ctxsys.default_datastore filter ctxsys.null_filter section group xmlpathgroup' );
When you create the index, you can use the INPATH
and HASPATH
operators.
11.3.4.2 Top-Level Tag Searching
To find all documents that contain the term dog in the top-level tag <A>:
dog INPATH (/A)
or
dog INPATH(A)
11.3.4.3 Any-Level Tag Searching
To find all documents that contain the term dog in the <A>
tag at any level:
dog INPATH(//A)
This query finds the following documents:
<A>dog</A>
and
<C><B><A>dog</A></B></C>
11.3.4.4 Direct Parentage Searching
To find all documents that contain the term dog in a B element that is a direct child of a top-level A element:
dog INPATH(A/B)
This query finds the following XML document:
<A><B>My dog is friendly.</B></A>
but it does not find:
<C><B>My dog is friendly.</B></C>
11.3.4.5 Tag Value Testing
You can test the value of tags. For example, the query:
dog INPATH(A[B="dog"])
Finds the following document:
<A><B>dog</B></A>
But does not find:
<A><B>My dog is friendly.</B></A>
11.3.4.6 Attribute Searching
You can search the content of attributes. For example, the query:
dog INPATH(//A/@B)
Finds the document:
<C><A B="snoop dog"> </A> </C>
11.3.4.7 Attribute Value Testing
You can test the value of attributes. For example, the query:
California INPATH (//A[@B = "home address"])
Finds the document:
<A B="home address">San Francisco, California, USA</A>
But it does not find:
<A B="work address">San Francisco, California, USA</A>
11.3.4.8 Path Testing
You can test if a path exists with the HASPATH
operator. For example, the query:
HASPATH(A/B/C)
finds and returns a score of 100 for the document
<A><B><C>dog</C></B></A>
without the query having to reference dog at all.
11.3.4.9 Section Equality Testing with HASPATH
You can use the HASPATH
operator for section quality tests. For example, consider the following query:
dog INPATH A
It finds:
<A>dog</A>
but it also finds:
<A>dog park</A>
To limit the query to the term dog and nothing else, you can use a section equality test with the HASPATH
operator. For example,
HASPATH(A="dog")
finds and returns a score of 100 only for the first document, not for the second document.
See Also:
Oracle Text Reference to learn more about using the INPATH
and HASPATH
operators