1.5 The Impact of AI Enrichment: Practical Examples
See how enriching your schema directly impacts LLM effectiveness through these real-world examples.
Example 1: Healthcare Events-Tracking Schema
This example uses a schema designed to track healthcare events, focusing on doctor activity and the specific procedures performed during patient events.
- The
Practitionertable: Stores information about doctors, including their name, specialization, and status. - The
Eventtable: Records high-level patient appointments, including when and where they occurred and the responsible doctor. - The
EventDetailtable: Logs individual procedures (like surgery) performed during an event, capturing the procedure description, exact time stamp, and duration.
CREATE TABLE Practitioner (
practitioner_id NUMBER PRIMARY KEY,
nm VARCHAR2(80),
speciality VARCHAR2(40), -- This represents 'department'
status VARCHAR2(20) -- e.g., 'Active', 'Inactive'
);
CREATE TABLE Event (
event_id NUMBER PRIMARY KEY,
dt DATE, -- Date or time of the overall event
practitioner_id NUMBER,
location VARCHAR2(40) -- e.g., 'OR1', 'ER', 'ICU'
);
CREATE TABLE EventDetail (
event_id NUMBER,
act_desc VARCHAR2(200),
act_time TIMESTAMP,
duration_minutes NUMBER, -- Duration of this action or procedure
CONSTRAINT FK_EVENT_ID FOREIGN KEY (event_id) REFERENCES Event(event_id)
);
- The term
practitioneris synonymous with doctor. - The
specialitycolumn represents the doctor's department. - The
act_desccolumn in theEventDetailtable records the procedure performed, such as surgery. - The
act_timecolumn in theEventDetailtable provides the time stamp for time-based filtering of procedures.
Business query
Consider a hospital administrator asking the following question to an LLM:
"For each doctor, display their name, department, and
the count of surgeries performed after 8 PM within the last
six months."
Without AI enrichment
Without any annotations, the LLM must interpret the raw schema. This could lead to a flawed query like the one below:
SELECT
PRAC.nm,
PRAC.speciality,
COUNT(EVT.event_id) AS surgery_count
FROM
Practitioner PRAC
JOIN
Event EVT
ON PRAC.practitioner_id = EVT.practitioner_id
WHERE
EVT.dt >= ADD_MONTHS(SYSDATE, -6)
AND EXTRACT(HOUR FROM EVT.dt) >= 20
GROUP BY
PRAC.nm,
PRAC.speciality;
- It counts
EVT.event_id, incorrectly tallying all appointments after 8 PM instead of only the surgeries. - It filters on
EVT.dt, which is the start time of an appointment and not of a specific procedure. A 1 PM appointment could include a 9 PM surgery, which this query would miss. - It misses a required join with the
EventDetailtable.
With AI enrichment
- Set the
Practitionertable alias todoctor. - Set the
Practitioner.specialitycolumn alias todepartment. - Provide a description and sample values for
the
EventDetail.act_desccolumn, clarifying that it contains the specific procedures performed during an event, such as surgery, consultation, and so on.
With this enriched context, the LLM can generate a far more accurate and reliable query:
SELECT
PRAC.nm,
PRAC.speciality,
COUNT(*) AS surgery_count
FROM
Practitioner PRAC
JOIN
Event EVT
ON PRAC.practitioner_id = EVT.practitioner_id
JOIN
EventDetail ED
ON EVT.event_id = ED.event_id
WHERE
ED.act_desc = 'Surgery'
AND ED.act_time >= ADD_MONTHS(SYSDATE, -6)
AND EXTRACT(HOUR FROM ED.act_time) >= 20
GROUP BY
PRAC.nm,
PRAC.speciality;
- Now, it performs the necessary join to the
EventDetailtable, which is essential to find specific procedures. - It filters on
ED.act_desc = 'Surgery', correctly understanding from the annotation that surgery is a type of procedure. - It uses the correct time stamp,
ED.act_time, to find surgeries that actually occurred after 8 PM.
As this example illustrates, annotations bridge the gap between business terminology and the physical schema, guiding the LLM to understand data relationships and generate correct SQL.
Example 2: Human Resources Schema
In the context of a Human Resources (HR) schema, consider a business analyst asking the following question to an LLM:
List each department and the average salary of
employees hired after 2020. Order the result by average
salary, highest first.
Without AI enrichment
Without any annotations, the LLM can only read the raw table and column names in the schema. It may guess at names and joins, returning a flawed query:
SELECT
d.department_name,
AVG(e.salary) AS avg_salary
FROM
HR_PER_ALL_ASSIGNMENTS e
JOIN
DEPT d
ON e.depart_id = d.id
WHERE
e.hire_dt > DATE '2020-01-01'
GROUP BY
d.department_name
ORDER BY
avg_salary DESC;
This query has several issues. The model has guessed at
column names like salary and
hire_dt and joined on
depart_id, which may not be the correct
key, creating a risk of mismatch.
With AI enrichment
After you apply the annotations, the LLM receives much richer context. The generated query is more accurate:
SELECT
d.dept_name,
ROUND(AVG(e.salary_usd), 2) AS avg_salary_usd
FROM
EMPLOYEE e
JOIN
DEPARTMENT d
ON e.dept_no = d.dept_no -- guided by join column metadata
WHERE
e.hire_date >= DATE '2020-01-01'
GROUP BY
d.dept_name
ORDER BY
avg_salary_usd DESC;
The improvements here are significant:
- The model uses the
salary_usdcolumn instead of a generic salary field, guided by theUNITSannotation, which also informs the rounding to two decimal places appropriate for currency. - It chooses the correct join column
(
dept_no) as specified in theJOIN COLUMNannotation. - It selects the correct column
(
hire_date) for the date predicate, avoiding outdated aliases.
Key Takeaways
As illustrated in these examples, annotations enhance query accuracy by guiding the LLM to select correct join columns, apply appropriate units, use valid predicates, decipher object names, and more.
When you apply annotations consistently, you see a faster turnaround for data questions, fewer faulty joins, and a reduction in production defects from incorrect SQL. For an LLM, each annotation reduces the search space of possible query plans, allowing it to focus on the exact pattern you intended.
For organizations with strict compliance rules, annotations also allow administrators to control the exact contextual information shared with an LLM, which helps reduce the unintentional disclosure of sensitive data that may be present in free-form column comments.