17.4 Performing Sentiment Analysis with the CTX_DOC Package

Use the procedures in the CTX_DOC package to perform sentiment analysis on a single document within a document set. For each document, you can either determine a single sentiment score for the entire document or individual sentiment scores for each topic within the document.

Before you perform sentiment analysis, you must create a context index on the document set. The following command creates a camera_revidx context index on the document set in the camera_reviews table:

create index camera_revidx on camera_reviews(review_text) indextype is
ctxsys.context parameters ('lexer mylexer stoplist ctxsys.default_stoplist');

To perform sentiment analysis with the CTX_DOC package, use one of the following methods:

  • Run the CTX_DOC.SENTIMENT_AGGREGATE procedure with the required parameters.

    This procedure provides a single consolidated sentiment score for the entire document.

    The sentiment score is a value in the range of -100 to 100, and it indicates the strength of the sentiment. A negative score represents a negative sentiment and a positive score represents a positive sentiment. Based on the sentiment scores, you can group scores into labels such as Strongly Negative (–80 to –100), Negative (–80 to –50), Neutral (-50 to +50), Positive (+50 to +80), and Strongly Positive (+80 to +100).

  • Run the CTX_DOC.SENTIMENT procedure with the required parameters.

    This procedure returns the individual segments within the document that contain the search term, and provides an associated sentiment score for each segment.

Example 17-2 Obtaining a Single Sentiment Score for a Document

The following example uses the clsfier_camera sentiment classifier to provide a single aggregate sentiment score for the entire document. The sentiment classifier was created and trained. The table containing the document set has a camera_revidx context index. The doc_id of the document within the document table for which sentiment analysis must be performed is 49. The topic for which a sentiment score is being generated is ‘Nikon.’

select ctx_doc.sentiment_aggregate('camera_revidx','49','Nikon','clsfier_camera') from dual;

CTX_DOC.SENTIMENT_AGGREGATE('CAMERA_REVIDX','49','NIKON','CLSFIER_CAMERA')
--------------------------------------------------------------------------
                            74
1 row selected.

Example 17-3 Obtaining a Single Sentiment Score with the Default Classifier

The following example uses the default sentiment classifier to provide an aggregate sentiment score for the entire document. The table containing the document set has a camera_revidx context index. The doc_id of the document within the document table for which sentiment analysis must be performed is 1.

select ctx_doc.sentiment_aggregate('camera_revidx','1') from dual;

CTX_DOC.SENTIMENT_AGGREGATE('CAMERA_REVIDX','1')
--------------------------------------------
                                           2

1 row selected.

Example 17-4 Obtaining Sentiment Scores for Each Topic Within a Document

The following example uses the clsfier_camera sentiment classifier to generate sentiment scores for each segment within the document. The sentiment classifier was created and trained. The table containing the document set has a camera_revidx context index . The doc_id of the document within the document table for which sentiment analysis must be performed is 49. The topic for which a sentiment score is being generated is ‘Nikon.’ The restab result table, which will be populated with the analysis results, was created with the columns snippet (CLOB) and score (NUMBER).

exec ctx_doc.sentiment('camera_revidx','49','Nikon','restab','clsfier_camera', starttag=>'<<', endtag=>'>>');

SQL> select * from restab;
SNIPPET						
--------------------------------------------------------------------------------
     SCORE
----------
It took <<Nikon>> a while to produce a superb compact 85mm lens, but this time they finally got it right.
        65

Without a doubt, this is a fine portrait lens for photographing head-and-shoulder portraits (The only lens which is optically better is 
<<Nikon>>'s legendary 10
5mm f2.5 Nikkor lens, and its close optical twin, the 105mm f2.8 Micro Nikkor.
        75

Since the 105mm f2.5 Nikkor lens doesn't have an autofocus version, then this might be the perfect moderate telephoto lens for owners of 
<<Nikon>> autofocus 
SLR cameras.
        84
3 rows selected.

Example 17-5 Obtaining a Sentiment Score for a Topic Within a Document

The following example uses the tdrbrtsent03_cl sentiment classifier to generate a sentiment score for each segment within the document. The sentiment classifier was created and trained. The table containing the document set has a tdrbrtsent03_idx context index. The doc_id of the document within the document table for which sentiment analysis must be performed is 1. The topic for which a sentiment score is being generated is ‘movie.’ The tdrbrtsent03_rtab result table, which will be populated with the analysis results was created with the columns snippet and score.

SQL> exec ctx_doc.sentiment('tdrbrtsent03_idx','1','movie','tdrbrtsent03_rtab','tdrbrtsent03_cl');
PL/SQL procedure successfully completed.  

SQL> select * from tdrbrtsent03_rtab;
SNIPPET
--------------------------------------------------------------------------------      
SCORE
---------- 
the <b>movie</b> is a bit overlong , but nicholson is such good fun that the running time passes by pretty quickly
 -62

1 row selected.

See Also: