Integrated Text Mining
Integrated text mining in OML allows you to perform text analysis directly within the Oracle Database using SQL and PL/SQL. This integration enables you to extract meaningful insights from unstructured text data without the need to move data outside the database environment.
Unstructured text data is neither numerical nor categorical. Unstructured
text includes items such as web pages, document libraries, Power Point presentations,
product specifications, emails, comment fields in reports, and call center notes. It has
been said that unstructured text accounts for more than three quarters of all enterprise
data. Extracting meaningful information from unstructured text can be critical to the
success of a business. Oracle interprets columns of VARCHAR2
(>4000), and CLOB
as text. You can also identify columns of
CHAR
, VARCHAR2
(<=4000),
BFILE
, and BLOB
as text attributes (unstructured
text).
Machine learning operations on text is the process of applying machine learning techniques to text terms, also called text features or tokens. Text terms are words or groups of words that have been extracted from text documents and assigned numeric weights. These are transformed into a format the algorithms can analyze. Text terms are the fundamental unit of text that can be manipulated and analyzed. Oracle Text is an Oracle Database technology that provides term extraction, word and theme searching, and other utilities for querying text.
Key features include:
- In-Database processing: Perform text mining operations within the database, leveraging Oracle's scalability and performance.
- Text preprocessing functions: Includes functions to clean and tokenize text data, converting it into a structured format suitable for analysis.
- Feature Extraction: Convert unstructured text into structured numerical data suitable for machine learning algorithms.
- Machine learning algorithms: Apply algorithms such as classification, clustering, and anomaly detection to text data.
- SQL and PL/SQL integration: Text mining tasks can be run using SQL and PL/SQL procedures, allowing seamless integration with existing data and workflows.
Parent topic: Features of In-Database Algorithms