11.2 HTML Section Searching with Oracle Text
HTML has internal structure in the form of tagged text that you can use for section searching. For example, define a section called headings
for the <H1>
tag, and then search for terms only within these tags across your document set.
To query, you use the WITHIN
operator. Oracle Text returns all documents that contain your query term within the headings
section. For example, if you want to find all documents that contain the word oracle
within headings,
enter the following query:
'oracle within headings'
This section contains these topics:
11.2.1 Creating HTML Sections
The following code defines a section group called htmgroup
of type HTML_SECTION_GROUP.
It then creates a zone section in htmgroup
called heading
identified by the <H1>
tag:
begin ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'heading', 'H1'); end;
You can then index your documents as follows:
create index myindex on docs(htmlfile) indextype is ctxsys.context parameters('filter ctxsys.null_filter section group htmgroup');
After indexing with the htmgroup
section group, you can query within the heading
section by issuing this query:
'Oracle WITHIN heading'
11.2.2 Searching HTML Meta Tags
With HTML documents, you can also create sections for NAME/CONTENT
pairs in <META>
tags. When you do so, you can limit your searches to text within CONTENT.
Consider an HTML document that has the following META
tag:
<META NAME="author" CONTENT="ken">
Create a zone section that indexes all CONTENT
attributes for the META
tag whose NAME
value is author:
begin ctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'author', 'meta@author'); end
After indexing with the htmgroup
section group, you can query the document:
'ken WITHIN author'