17.5 Performing Sentiment Analysis with the RSI
The XML Query Result Set Interface (RSI) enables you to perform sentiment analysis on a set of documents by using either the default sentiment classifier or a user-defined sentiment classifier. The documents on which sentiment analysis must be performed are stored in a document table.
Use the sentiment
element in the input RSI to indicate that sentiment analysis, in addition to other operations specified in the Result Set Descriptor (RSD), must be performed at query time. If you specify a value for the classifier
attribute of the sentiment
element, then the specified sentiment classifier is used to perform the sentiment analysis. If the classifier
attribute is omitted, then Oracle Text performs sentiment analysis by using the default sentiment classifier. The sentiment
element contains a child element called item
that specifies the topic or concept about which a sentiment must be generated during sentiment analysis.
You can generate either a single sentiment score for each document or separate sentiment scores for each topic within the document. Use the agg
attribute of the item
element to generate a single aggregated sentiment score for each document.
You can perform sentiment classification by using a keyword query or the ABOUT
operator. When you use the ABOUT
operator, the result set includes synonyms of the keyword that are identified by using the thesaurus.
To perform sentiment analysis by using RSI:
Example 17-6 Input the RSD to Perform Sentiment Analysis
The following example performs sentiment analysis and generates a sentiment for the ‘lens’ topic. The driving query is a keyword query for ‘camera.’ The sentiment
element specifies that sentiment analysis must be performed by using the clsfier_camera
sentiment classifier. This classifier was previously created and trained by using the CTX_CLS.SA_TRAIN_MODEL
procedure. The camera_revidx
context index is on the document set table.
The sentiment score ranges from -100 to 100. A positive score indicates positive sentiment, whereas a negative score indicates negative sentiment. The absolute value of the score is indicative of the magnitude of positive and negative sentiment.
To perform sentiment analysis and obtain a sentiment score for each topic within the document:
-
Create the
rs
result set table that will store the results of the search operation.SQL> var rs clob; SQL> exec dbms_lob.createtemporary(:rs, TRUE, DBMS_LOB.SESSION);
-
Perform sentiment analysis as part of a search query.
The keyword being searched for is ‘camera.’ The topic for which sentiment analysis is performed is ‘lens.’
begin ctx_query.result_set('camera_revidx','camera',' <ctx_result_set_descriptor> <hitlist start_hit_num="1" end_hit_num="10" order="score desc"> <sentiment classifier="clsfier_camera"> <item topic="lens" /> <item topic="picture quality" agg="true" /> </sentiment> </hitlist> </ctx_result_set_descriptor>',:rs); end; /
-
View the results stored in the result table.
Other applications can use the XML result set for further processing. For brevity, some output was removed. For each segment within the document, a score represents the sentiment score for the segment.
SQL> select xmltype(:rs) from dual; XMLTYPE(:RS) -------------------------------------------------------------------------------- <ctx_result_set> <hitlist> <hit> <sentiment> <item topic="lens"> <segment> <segment_text>The first time it was sent in was because the <b>lens </b> door failed to turn on the camera and it was almost to come off of its track . Eight months later, the flash quit working in all modes AND the door was failing AGAIN!</segment_text> <segment_score>-81</segment_score> </segment> </item> <item topic="picture quality"> <score> -75 </score> </item> </sentiment> </hit> <hit> <sentiment> <item topic="lens"> <segment> <segment_text>I was actually quite impressed with it. Powerful zoom , sharp <b>lens</b>, decent picture quality. I also played with some other Panasonic models in various stores just to get a better feel for them, as well as spent a few hours on </segment_text> <segment_score> 67 </segment_score> </segment> </item> <item topic="picture quality"> <score>-1</score> </item> </sentiment> </hit> . . . . . . </hitlist> </ctx_result_set>
See Also: