1.5 The Impact of AI Enrichment: Practical Examples

See how enriching your schema directly impacts LLM effectiveness through these real-world examples.

Example 1: Healthcare Events-Tracking Schema

This example uses a schema designed to track healthcare events, focusing on doctor activity and the specific procedures performed during patient events.

The schema consists of the following three tables:
  • The Practitioner table: Stores information about doctors, including their name, specialization, and status.
  • The Event table: Records high-level patient appointments, including when and where they occurred and the responsible doctor.
  • The EventDetail table: Logs individual procedures (like surgery) performed during an event, capturing the procedure description, exact time stamp, and duration.
CREATE TABLE Practitioner (
    practitioner_id NUMBER PRIMARY KEY,
    nm VARCHAR2(80),
    speciality VARCHAR2(40),    -- This represents 'department'
    status VARCHAR2(20)         -- e.g., 'Active', 'Inactive'
);

CREATE TABLE Event (
    event_id NUMBER PRIMARY KEY,
    dt DATE,                    -- Date or time of the overall event
    practitioner_id NUMBER,
    location VARCHAR2(40)       -- e.g., 'OR1', 'ER', 'ICU'
);

CREATE TABLE EventDetail (
    event_id NUMBER,
    act_desc VARCHAR2(200),
    act_time TIMESTAMP,
    duration_minutes NUMBER,    -- Duration of this action or procedure
    CONSTRAINT FK_EVENT_ID FOREIGN KEY (event_id) REFERENCES Event(event_id)
);
Schema interpretation notes
  • The term practitioner is synonymous with doctor.
  • The speciality column represents the doctor's department.
  • The act_desc column in the EventDetail table records the procedure performed, such as surgery.
  • The act_time column in the EventDetail table provides the time stamp for time-based filtering of procedures.

Business query

Consider a hospital administrator asking the following question to an LLM:

"For each doctor, display their name, department, and the count of surgeries performed after 8 PM within the last six months."

Without AI enrichment

Without any annotations, the LLM must interpret the raw schema. This could lead to a flawed query like the one below:

SELECT
    PRAC.nm,
    PRAC.speciality,
    COUNT(EVT.event_id) AS surgery_count
FROM
    Practitioner PRAC
JOIN
    Event EVT 
    ON PRAC.practitioner_id = EVT.practitioner_id
WHERE
    EVT.dt >= ADD_MONTHS(SYSDATE, -6)
    AND EXTRACT(HOUR FROM EVT.dt) >= 20
GROUP BY
    PRAC.nm,
    PRAC.speciality;
This query has several critical errors:
  • It counts EVT.event_id, incorrectly tallying all appointments after 8 PM instead of only the surgeries.
  • It filters on EVT.dt, which is the start time of an appointment and not of a specific procedure. A 1 PM appointment could include a 9 PM surgery, which this query would miss.
  • It misses a required join with the EventDetail table.

With AI enrichment

By applying the following annotations, you can provide the LLM with the necessary semantic context.
  • Set the Practitioner table alias to doctor.
  • Set the Practitioner.speciality column alias to department.
  • Provide a description and sample values for the EventDetail.act_desc column, clarifying that it contains the specific procedures performed during an event, such as surgery, consultation, and so on.

With this enriched context, the LLM can generate a far more accurate and reliable query:

SELECT
    PRAC.nm,
    PRAC.speciality,
    COUNT(*) AS surgery_count
FROM
    Practitioner PRAC
JOIN
    Event EVT 
    ON PRAC.practitioner_id = EVT.practitioner_id
JOIN
    EventDetail ED 
    ON EVT.event_id = ED.event_id
WHERE
    ED.act_desc = 'Surgery'
    AND ED.act_time >= ADD_MONTHS(SYSDATE, -6)
    AND EXTRACT(HOUR FROM ED.act_time) >= 20
GROUP BY
    PRAC.nm,
    PRAC.speciality;
Annotations markedly improve the query:
  • Now, it performs the necessary join to the EventDetail table, which is essential to find specific procedures.
  • It filters on ED.act_desc = 'Surgery', correctly understanding from the annotation that surgery is a type of procedure.
  • It uses the correct time stamp, ED.act_time, to find surgeries that actually occurred after 8 PM.

As this example illustrates, annotations bridge the gap between business terminology and the physical schema, guiding the LLM to understand data relationships and generate correct SQL.

Example 2: Human Resources Schema

In the context of a Human Resources (HR) schema, consider a business analyst asking the following question to an LLM:

List each department and the average salary of employees hired after 2020. Order the result by average salary, highest first.

Without AI enrichment

Without any annotations, the LLM can only read the raw table and column names in the schema. It may guess at names and joins, returning a flawed query:

SELECT 
    d.department_name, 
    AVG(e.salary) AS avg_salary 
FROM 
    HR_PER_ALL_ASSIGNMENTS e 
JOIN 
    DEPT d 
    ON e.depart_id = d.id 
WHERE 
    e.hire_dt > DATE '2020-01-01' 
GROUP BY 
    d.department_name 
ORDER BY 
    avg_salary DESC;

This query has several issues. The model has guessed at column names like salary and hire_dt and joined on depart_id, which may not be the correct key, creating a risk of mismatch.

With AI enrichment

After you apply the annotations, the LLM receives much richer context. The generated query is more accurate:

SELECT 
    d.dept_name, 
    ROUND(AVG(e.salary_usd), 2) AS avg_salary_usd
FROM 
    EMPLOYEE e
JOIN 
    DEPARTMENT d 
    ON e.dept_no = d.dept_no  -- guided by join column metadata
WHERE 
    e.hire_date >= DATE '2020-01-01'
GROUP BY 
    d.dept_name
ORDER BY 
    avg_salary_usd DESC;

The improvements here are significant:

  • The model uses the salary_usd column instead of a generic salary field, guided by the UNITS annotation, which also informs the rounding to two decimal places appropriate for currency.
  • It chooses the correct join column (dept_no) as specified in the JOIN COLUMN annotation.
  • It selects the correct column (hire_date) for the date predicate, avoiding outdated aliases.

Key Takeaways

As illustrated in these examples, annotations enhance query accuracy by guiding the LLM to select correct join columns, apply appropriate units, use valid predicates, decipher object names, and more.

When you apply annotations consistently, you see a faster turnaround for data questions, fewer faulty joins, and a reduction in production defects from incorrect SQL. For an LLM, each annotation reduces the search space of possible query plans, allowing it to focus on the exact pattern you intended.

For organizations with strict compliance rules, annotations also allow administrators to control the exact contextual information shared with an LLM, which helps reduce the unintentional disclosure of sensitive data that may be present in free-form column comments.