Add AI Guardrails to OCI Generative AI Model Endpoints
- Services: Generative AI
- Release Date: April 16, 2025
You can now add AI guardrails, a suite of safety and governance features to OCI Generative AI model endpoints. Designed for enterprise use cases, AI guardrails can help with moderating model behavior, securing sensitive content, and aligning outputs with business and compliance requirements. Key features:
- Content Moderation Filters
-
AI guardrails aim to classify harmful content, including hate speech, harassment, violence, and explicit material. These filters can be applied to user queries and AI responses, with controls that aim to protect against unsafe content that enter or exit the system. An internal moderation model powers the classification in identifying toxic content.
- Prompt Injection and Jailbreak Prevention
-
AI guardrails help in detecting attempts to override AI safety instructions through prompt injection attacks. The system scans for malicious instructions within user prompts or embedded contexts to prevent unauthorized model behavior. This feature aims to protect against both direct and indirect attacks, such as hidden instructions within uploaded documents.
- Privacy & PII Protection
-
AI guardrails aim to identify personally identifiable information (PII) in both inputs and outputs. Predefined detectors recognize sensitive data such as names, phone numbers, email, and addresses. This feature aims to prevent unintended data leakage and ensures compliance with industry standards.
- API-Based Guardrail Enforcement
-
Adding guardrails to an endpoint, integrates the guardrails directly into the OCI Generative AI models through secure API-based guardrail enforcement. Supports real-time moderation. To add AI guardrails to model endpoints, see Creating an endpoint and Updating an endpoint.
For information about the service, see Generative AI documentation.