6 Frequently Asked Questions
This FAQ chapter includes a product overview, key components and features, deployment and administration details, and answers to common customer questions with resource links.
Product Overview
Growing security threats and ever-expanding privacy regulations have made it necessary to limit the exposure of sensitive data. Copying production data for non-production purposes such as development and data analytics proliferates sensitive data, expanding the security and compliance boundary and increasing the likelihood of data breaches. Oracle Data Masking and Subsetting provides a flexible solution that discovers, masks, and subsets sensitive data, allowing organizations to safely share data across their non-production environments.
What is Data Masking and Subsetting?
Data masking replaces sensitive data such as credit card numbers with fictitious yet realistic-looking data. Data subsetting is the process of retaining or extracting a selected portion of data from a larger database.
Why do I need to mask and subset data?
Copying production data to non-production, outsourced, partner, and cloud environments for test, development, and other purposes proliferates sensitive information such as credit card numbers and social security numbers. This increases the risk of a data breach as non-production environments are generally not as protected or monitored as diligently as production environments. For this reason, data privacy standards such as PCI-DSS recommend rendering sensitive production data unreadable when used for test and development.
Subsetting extracts only the necessary information from a large database for sharing with internal and external teams, reducing the resources required to store and manage that data in test and development. Masking and subsetting sensitive data in non-production environments helps improve security and minimize compliance and infrastructure costs.
How does masking improve security and minimize compliance costs?
Masking sensitive data in test and development environments reduces the overall compliance boundary, restricting it to only production environments. Rendering masked data unreadable in these environments limits the risk of a data breach and helps minimize compliance costs.
Why are data masking and subsetting important for cloud computing?
Organizations understand the advantage of leveraging a cloud platform for test and development. However, they are concerned about uploading sensitive on-premises production data to the cloud because of data privacy and compliance reasons. Other concerns are the storage cost associated with the cloud platform and the network cost due to data transfers.
Oracle Data Masking and Subsetting addresses these concerns by enabling cloud users to mask sensitive data on-premises before uploading it to the cloud. The product helps reduce storage and network costs by extracting a subset of production data for upload to the cloud.
Is data masking reversible?
Data masking usually means permanently replacing the data and ensuring that no one can retrieve the original data. But, sometimes you might want to see the original data. Reversible masking is helpful when businesses need to mask and send their data to a third party for analysis, reporting, or some other business purpose. After the processed data is received from the third party, the original data can be recovered. The Encrypt masking format is the only format that supports reversible masking, enabling the masked data to be reverted to its original form when necessary.
Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during presentation, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted version.
Components and Features
What are the main components of Oracle Data Masking and Subsetting?
- Data Discovery provides automated procedures to discover sensitive columns and parent-child relationships. The discovery results are stored as an application data model, which is reusable across multiple databases.
- Data Masking assists in mapping masking formats to discovered sensitive columns, creating reusable masking scripts. It also provides a workflow to mask data.
- Data Subsetting helps create reusable goal/condition-based subsetting rules on a database. It also provides a workflow to generate subsets.
- Masking Format library provides a comprehensive set of predefined masking formats to mask sensitive data such as credit card numbers, national identifiers, and phone numbers. It also allows creating new masking formats to meet domain-specific requirements.
How does the product preserve the relational integrity of the data in an application?
- The product uses automated discovery procedures to gather referential integrity or parent-child relationships between the columns before the masking and subsetting process.
- During the masking and subsetting process, parent and child columns are processed consistently to preserve the integrity between these columns.
- When masking a parent table that has child relationships, the tool automatically masks the child tables to preserve referential integrity.
Can masking support multi-byte or international characters?
Several masking options support multi-byte or international characters, such as UTF-8. The suitable masking formats include Array List, Shuffle, Substitute, Table Column, and User Defined Function.
Does the product include predefined masking formats?
Yes, Oracle Data Masking and Subsetting provides out-of-the-box masking formats covering a broad range of sensitive data, such as national identifiers of multiple countries, credit card numbers of various vendors, phone numbers, and more.
Which masking techniques are supported by the product?
- Shuffle Masking randomly shuffles data within a table/ view. For example, columns containing salaries can be shuffled to break the employee-salary mapping.
- Encryption encrypts sensitive data using a cryptographic key while preserving the data's format. It's a reversible masking option, and you can decrypt your data using the same key. This feature is useful when masked data sent to a third party has to be merged with further updates.
- Conditional Masking masks column data using different masking formats based on user-defined conditions. For example, in a column, the US identifiers can be masked using the Social Security Number format and the UK identifiers using the National Insurance Number format.
- Compound Masking masks related columns as a group, ensuring the masked data across the related columns retain the same. For example, address fields such as city, state,
- Deterministic Masking generates consistent masked output for a given input across application schemas and databases.
- User-defined PL/SQL Masking enables you to define custom masking logic or migrate your existing masking scripts.
Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version.
Which subsetting techniques are supported by the product?
Oracle Data Masking and Subsetting simplifies the task of subsetting through its goal or condition-based subsetting techniques. A goal can be a relative table size, such as extracting a 1% subset of a table containing 10 billion rows. Condition-based subsetting is useful for creating a subset using the data itself. For example, you can use a time-based condition, such as discarding all user records created before a particular year. Another example is a region-based condition, where you might extract only Asia Pacific information to support new application development.
Does Oracle Data Masking and Subsetting work with packaged applications like Oracle E-Business Suite and Oracle Fusion Applications?
As Oracle Data Masking and Subsetting is a database-centric solution, it works for all supported databases regardless of the application. However, care is required when setting up data models and masking and subsetting definitions to avoid misconfigurations that could break complex applications.
Can I mask and subset databases running in Oracle Cloud?
You can mask databases in the Oracle Cloud but subsetting is not yet supported. Oracle Data Masking for cloud databases works much like on-premises databases.
What masking formats support deterministic masking?
Currently, Substitute and Encrypt masking formats provide deterministic masking transformation.
-
Encrypt Masking Format: This transformation encrypts and decrypts the original data using a secure key string. The input data format is preserved during encryption and decryption. This transformation uses industry-standard 3DES algorithm. This transformation is helpful when businesses need to mask and send their data to a third-party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third-party, the original data can be recovered using the same key string that was used to encrypt the data.
-
Substitute Masking Format: The Substitute format uses a table of values from which to substitute the original value with a mask value. As long as this table of values does not change, the mask is deterministic or consistent across the three databases.
How does reversibility work with Encrypt masking format?
Reversibility can be performed using Encrypt Masking format which is Key Based Reversible Masking. It uses 3DES algorithm where a seed value is provided to encrypt and decrypt the data.
If I use Oracle Data Masking and Subsetting in Enterprise Manager to generate the masking script, can I run these scripts directly on the Database Server for masking?
Yes, you can run the generated scripts directly on the database but the officially supported and recommended way to perform data masking is through EM.
If a customer masks data on one server (which is on-premises) and then the customer moves this data to another Oracle Database (e.g., on OCI cloud or any other third-party cloud), can Oracle Data Masking and Subsetting perform the decryption on this new server?
Yes, the customer can mask and subset databases in Oracle Database Cloud Service (DBCS). Oracle Data Masking and Subsetting for cloud databases works much like it does for on-premises databases. Oracle Data Masking and Subsetting license is included in DBCS High Performance, Extreme Performance, and Exadata Service.
Deployment and Administration
How do I download and install Oracle Data Masking and Subsetting?
Oracle Data Masking and Subsetting is pre-installed with Oracle Enterprise Manager. To use Oracle Data Masking and Subsetting, you must have the Data Masking and Subsetting license pack.
What are the different ways to mask and subset data?
- In-Database Masking and Subsetting: The target data is
first copied (cloned) to a separate location. Oracle Data Masking and
Subsetting operates on the cloned data. After processing is complete, the
resulting masked data can be cloned and distributed for non-production.
Note:
In-Database masking and subsetting directly operates on the underlying data, on the original copy of the data and is NOT recommended to be performed on production databases. - In-Export Masking and Subsetting: The masking and subsetting rules are applied while the data is extracted from the target database, and the resulting data is written to Oracle Data Pump dump files. In this mode of operation, Oracle Data Masking and Subsetting can run directly on the production system and unmasked data does not leave the production database. After processing is complete, the dump file containing masked data can be imported into non-production databases.
How does Oracle Data Masking and Subsetting compare to other similar Oracle offerings?
Oracle Solutions |
Description |
Oracle Data Masking and Subsetting |
Software for creating masked and subsetted copies of production data for use in non-production environments such as testing and development databases. |
Oracle Data Safe (Data Discovery/ Data Masking) |
Discover and mask sensitive data with a cloud service that supports Oracle Databases everywhere: in the Oracle Cloud, on-premises, and third-party clouds. |
Oracle Data Redaction |
Redacts sensitive data from query results before display through client applications. Enforces redaction at runtime, with low overhead, and according to conditions set in policies. |
Oracle Label Security |
Implements Multi-Level Security (MLS), enabling rows with differing sensitivity to reside in the same table. Explicitly labels rows with group, compartment, and sensitivity levels then matches them with user labels. |
Can Data Masking and Subsetting run without Enterprise Manager?
All Data Masking and Subsetting objects are centrally located in the Oracle Enterprise Manager repository, which facilitates centralized creation and administration of Application Data Models, Data Masking and Subsetting rules or definitions. So, we recommend using it through Enterprise Manager.
Recurring Customer Queries
Recurring Issues | Alternate Questions | Response |
---|---|---|
Masking Reversibility |
Can masked data be restored to its original state? Is it possible to reverse data masking to retrieve the original data? Does data masking allow for the original data to be recovered? Can masked data be decrypted back to its original form? Is there a way to revert masked data to its initial state? |
Data masking usually means permanently replacing the data to ensure that no one can retrieve the original data. We recommend that masking should only be performed directly on non-production databases or in-export to ensure critical data isn't lost. If the original data is required even in a non-production environment, a backup should be taken before masking. However, sometimes you might need to view the original data. The "Encrypt masking format" offered by DMS is the only format that supports masking reversibility, allowing the masked data to be reverted to its original form when necessary. Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version. |
Privileges |
What all privileges are required to use Data Masking and Subsetting? What are minimum privileges required to use DMS features? |
Go to the Prerequisites chapter for details on privileges: Prerequisites For Basic set privileges to perform Data masking operation in a Database Vault enabled environment, please refer the following Oracle documentation URL: |
Masking Performance |
|
1. Improving Masking Performance in Oracle Data Masking and Subsetting Based on the best practices and known issues from Oracle Data Masking and Subsetting Performance Tuning, here are key steps to enhance masking performance: Updating Statistics
2. Analyzing Masking Performance in Oracle Data Masking and Subsetting Performance Calculation: The total time for masking can be calculated as: "Time taken for cloning tables to be masked"+"Time taken for CTAS of each table"+"Time taken to disable and enable existing indexes and constraints"+"Time taken to apply masking logic (X)" The variable 'X' depends on the masking format. For example, fixed string masking is faster, while encryption takes longer. Hardware Impact: Masking performance is directly proportional to the underlying hardware capabilities. Parallelism: Adjust the PARALLEL_MAX_SERVERS parameter based on the formula:
Performance Testing: Use masking definitions to analyze the performance impact on a test system. This analysis can provide insights into the time taken to mask data and any changes in the execution plan. Masking Methods: In-Export Masking: Involves additional time for exporting the table to a dump file. In-Place Masking: Involves additional cloning time from the source to the staging environment. The total time for In-Export masking is the sum of the export time and the time taken to apply the masking logic. |
New DMS Customer |
|
|
Masking Formats | What are some of the examples for Personal Information (PI) Columns and their recommended Masking formats? |
Here is the list of available basic Masking Formats: Array List Delete Encrypt Fixed Number Fixed String Null Value Preserve Original Data Random Decimal Numbers Random Digits Random Dates Random Strings Random Numbers SQL Expression Regular Expression Post-Processing Function Shuffle Subsitute Substring Table Column Truncate User-Defined Function |
Regular Expression Basics |
What are some of the examples of Regex? How can we implement Regex for a sensitive column? |
Refer to tables Basic Building Blocks of Regex and Regular Expression Example. |
Table 6-1 Basic Building Blocks of Regex
Basic Characters | Examples | Definition |
---|---|---|
. (dot) |
Pattern: b.t Matches: "bat", "bet", "bit", "bot", "but", etc. |
Matches any single character except newline (\n). |
^ |
Pattern: ^start Matches: "start" at the beginning of a line. |
Matches the start of the string or line. |
$ |
Pattern: end$ Matches: "end" at the end of a line. |
Matches the end of the string or line. |
| |
Pattern: cat|dog Matches: "cat" or "dog" |
Alternation, matches either the expression before or after the |. |
[] |
Pattern: [aeiou] Matches: Any single vowel character ("a", "e", "i", "o", "u") |
Character class, matches any one of the characters inside the brackets. |
() |
Pattern: (ab) Matches: "ab" |
Grouping, groups multiple tokens together and captures matched text. |
\ |
Pattern: \d Matches: The pattern \d matches any digit (0-9) |
Escape character, used to escape a special character or indicate a special sequence. |
Table 6-2 Regular Expression Example
Regular Expression | Explanation | Example |
---|---|---|
[A-Za-z]{3} | Matches exactly three alphabetic characters (uppercase or lowercase). | Fgy |
[0-9]{3} | Matches exactly three digits. | 674 |
[A-Za-z]{3}+[0-9]{3} | Matches exactly three alphabetic characters (uppercase or lowercase) followed by exactly three digits. | Fgy674 |
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} | This pattern matches a typical email address format. It includes character classes ([a-zA-Z0-9._%+-]), the + quantifier for one or more occurrences, the @ literal character, and the . (dot) for domain separation. | f@8.co |
(?:https?://)?(?:www\.)?[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} |
(?:https?://)?: Non-capturing group that matches "http://" or "https://", optionally. optionally.\. (?:www\.)?: Non-capturing group that matches "www." [a-zA-Z0-9.-]+: Matches one or more alphanumeric characters, dots, or hyphens (for the domain name). [a-zA-Z]{2,}: Matches a dot followed by at least two alphabetic characters (for the top-level domain). |
https://www.example.com |
#([a-fA-F0-9]{6}|[a-fA-F0-9]{3}) | This pattern matches a hexadecimal color code starting with #, followed by either six ({6}) or three ({3}) characters from the set [a-fA-F0-9]. It uses () for grouping and {} for specifying repetitions. | #FfF011 |
\d{4}-\d{2}-\d{2} | This pattern matches a date in the format YYYY-MM-DD. It uses \d for digits and {} for specifying exact repetitions. | 2024-07-16 |
Masking Format Examples for Common Sensitive Types
Column Name | Masking Format | Masking Format Entries | Original Data | Masked Data |
CustomerID | Fixed String Random Strings | Fixed String:
'F'
Random Strings: |
Fjg860 | Faqjdnfh623 |
Age | Shuffle | Shuffle | 26 | 34 |
FirstName | Random Strings | Random
Strings:
Start Length: 1 |
Kate | aaabgd |
LastName | Substring | Substring:
Start Position: 1 |
Wilson | Wil |
DateOfBirth | Random Dates | Random Dates:
Start Date: 07/18/50 |
7/3/1998 | 8/5/1963 |
SocialSecurityNumber (SSN) | Random Number Post-Processing Function | Random Number:
Start Integer: 20000000 |
049-66-6786 | 136-76-4899 |
EmailAddress | SQL Expression | SQL
Expression:
%FirstName% || '.' || %LastName% || '@company.com ' |
ktewilson@gmail.com | Kate.Wilson@company.com |
PhoneNumber | Random Digits Random Number Post-Processing Function | Random Digits:
Start Length: 7 |
6033550232 | 619-989-9213 |
AddressLine1 | Shuffle (Choose Group Masking option) | Shuffle Columns | 303 Madison | 26 West |
AddressLine2 | San Street | Wheelock St | ||
City | Redwood City | Hanover | ||
PostalCode | 94002 | 3755 | ||
Country | United States | United States | ||
Gender | Array List | Array List:
Female, Male |
Female | Male |
MaritalStatus | Null Value | Null Value | Single | N/A |
Nationality | Truncate | Truncate | American | N/A |
PassportNumber | Substitute | Regular Expression:
[A-Z]{1}[0-9]{7} |
P2378283 | V4697997 |
DriverLicenseNumber | Encrypt | Regular Expression:
[D][0-9]{7} |
D7482464 | D8208541 |
CreditCardNumber | Random Digits Post-Processing Function | Random Digits:
Start Length: 10 |
4111111111111110 | 4556771680150140 |
BankAccountNumber | Substring | Substring:
Start Position: 3 |
123456789 | 345 |
AccountHolderName | Table Column | Schema:
HR Table: New_table Column: LastName |
Kate Wilson | Wilson |
TaxIdentificationNumber (TIN) | Fixed String Random Digits | Fixed String: TIN
Random Digits: |
TIN123456 | TIN232163 |
EmploymentStatus | Preserve Original Data | Employed | Employed | |
MedicalRecordNumber | Fixed String | MRNxxxxxx | MRN123456 | MRNxxxxxx |
UserName | SQL Expression | SQL
Expression:
%FirstName% || '_' || %LastName% |
katewilson1980 | Kate_Wilson |
Password | delete | N/A | xYdkI34b | N/A |
Where can I find more information on Oracle Data Masking and Subsetting?
For more information, such as product data sheet, documentation, customer references, and blog, please visit the Oracle Data Masking and Subsetting page on oracle.com:
https://www.oracle.com/security/database-security/data-masking/