Federated Distributed Database Reference

SYNC SCHEMA Operations

DDL Synchronization

DDL synchronization is an operation that SYNC SCHEMA runs just after the deployment of the shards in a federated distributed database configuration.

The goal of this operation is to import the object definitions from all of the shards, compare the definitions across the shards, and generate DDLs for the objects that exist on all of the shards (common objects). Once the DDLs are run and the objects are created, you can reference these objects in multi-shard queries.

Import Users

A user or schema is a candidate for import by SYNC SCHEMA if it exists on all of the shards and owns importable schema objects.

You can narrow the list of users to be imported by passing a list of users in the -SCHEMA parameter. For example,

gdsctl> sync schema -schema scott
gdsctl> sync schema -schema scott,myschema

For case-sensitive schemas use quoted identifiers.

gdsctl> sync schema -schema "O'Brien",scott

To include all non-Oracle schemas, use the value ALL in the SCHEMA parameter.

gdsctl> sync schema -schema all

Before importing the users, SYNC SCHEMA verifies that any discovered users exist on all shards, and no user already exists on the shard catalog with the same name. The users are then created on the shard catalog as local users and they are locked. Because these are local users, they only share the same name with shards and are essentially the same as any other user that may have the same name across different databases. Note that these users are not able to login and issue queries because they are not all shard users. To issue multi-shard queries, an all shard user must be created.

Note:

Only users local to a PDB are imported. Common CDB users are not imported.

Grant User Roles and Priviledges

For the imported users, SYNC SCHEMA compares users' privileges.

SYNC SCHEMA grants only the privileges that are granted on all of the shards (common grants). A user A who has a DBA role on shard1, but does not have DBA role on shard2, is not granted the DBA role in the shard catalog.

Import Object Definitions

The objects compared and imported by SYNC SCHEMA to the shard catalog are the objects that will be referenced in multi-shard queries or used by multi-shard query processing.

These objects are:

  • Tables
  • Views and Materialized Views (exported as tables)
  • Check Constraints
  • Object Types
  • Synonyms

Running SYNC SCHEMA does not import objects related to storage, or objects that have no impact on multi-shard query processing, such as tablespaces, indexes, indextypes, directories, or zone maps.

Schema Object Comparison

The objects, from one shard to another, can have different definitions. SYNC SCHEMA compares the different definitions and creates a common definition to enable multi-shard queries against imported objects.

SYNC SCHEMA detects the objects' differences at two levels: number of objects, and object definitions.

First, SYNC SCHEMA considers the number of objects. It is likely that, during an application upgrade, some objects are added to the schemas. Only objects that are on all of the shards will be imported into the shard catalog.

Second, the object definitions from one shard to another can have different attributes. For the objects that SYNC SCHEMA imports, the following differences are noted:

Differences in Tables

When comparing objects in a federated distributed database configuration, some differences in tables have an impact on multi-shard queries and some do not.

Column Differences

Only column differences have an impact on multi-shard queries. SYNC SCHEMA addresses only this difference.

  • The number of columns can be different.

  • The data type of a given column can be different.

  • The default value of a given column can be different.

  • The expression of a virtual column can be different

When a table has a different numbers of columns, SYNC SCHEMA will opt for the creation of a table that contains the union of all of the columns. Taking the union of all of the columns, compared to just taking the intersection, will spare you from re-writing multi-shard queries in case of an incremental deploy, when the added shard has fewer columns than indicated in the shard catalog.

When a column has different data types, SYNC SCHEMA defines it as the highest (largest) datatype.

When a column has different data types, and one of the columns is a user-defined object type, then that column is not imported into the shard catalog.

When a column has different default values, SYNC SCHEMA sets NULL as the default value.

Nested table columns are not imported into the shard catalog.

Example: a Customer table is defined on shard1 and shard2 as shown here.

On shard1:

Customer( Cust_id number, Name varchar(30),
    Address varchar(50),Zip_code number)

On shard2:

Customer( Cust_id varchar(20), Name varchar(30),
    Address varchar(50),Zip_code number,
    Country_code number)

Note that the column Cust_id is a number on shard1 and a varchar(20) on shard2. Also, note that Country_code exists on shard2 but does not exist on shard1.

The Customer table created by SYNC SCHEMA in the shard catalog has all of the columns, including Country_code, and the Cust_id type is varchar(20).

Customer( Cust_id varchar(20), Name varchar(30),
    Address  varchar(50),Zip_code number,
    Country_code number)

SYNC SCHEMA keeps track of these differences between schemas in the shard catalog. A query issued on the catalog database that accesses these heterogeneous columns is rewritten to address the differences before it is sent to the shards. On the shard, if there is a data type mismatch, the data is CAST into the "superior" data type as created on the catalog. If the column is missing on the shard, the default value is returned as set on the catalog.

Partition Scheme Differences

Note that this difference has no impact on multi-shard queries, and is ignored.

  • Partitioning column can be different.
  • Partition type can be different.
  • Number of partitions can be different.

Storage Attribute Differences

Note that this difference has no impact on multi-shard queries, and is ignored.

  • Tablespaces, on which the table is created, are different.
  • The encryption can be different.
  • The INMEMORY attribute can be different.
Differences in Views

Views on shards are created and handled as tables in the shard catalog. The same restrictions that apply to tables also apply to views.

Differences in Constraints

Only CHECK constraints are created in the shard catalog. The CHECK constraint condition should be same on all of the shards.

Differences in Object Types

Object types and type bodies are only created if they have the same definition on all of the shards.

Troubleshooting a Federated Distributed Database

Solve common federated distributed database issues with these troubleshooting tips.

ORA-03851: Operation not supported in federated database

ORA-03701: Invalid parameter combination: federated database and ...

Some of the operations and command options that apply to a traditional distributed database are not applicable to a federated distributed database. This is because:

  • There is no concept of a chunk in a federated distributed database. Any chunk-related operation is invalid, for example SPLIT CHUNK and MOVE CHUNK.

  • The Data Guard broker configuration is not set up or managed by the system in federated distributed database, because the existing shards may already have been set up with their own high availability configurations. Operations such as SET DATAGUARD_PROPERTY or MODIFY SHARDSPACE are not supported.

  • The CREATE SHARD command is not supported.

ORA-03885: Some primary shards are undeployed or unavailable

The SYNC SCHEMA operation requires that all primary shards be available. Check the output of the CONFIG SHARD command, and check the status of all primary shards. Fix any issues and retry the operations when the shards become available.

ORA-03871: Some DDL statements are not applied to the catalog

The SYNC SCHEMA operation cannot import object definitions from the shards when some statements from the previous issuance are still not applied on the shard catalog. Run SYNC SCHEMA with the -apply option to run these statements.

Handling Errors During Multi-Shard Queries

If a multi-shard query fails with this error due to a mismatch of the object definition on the shard and the catalog, make sure that the shard catalog has the latest schema changes imported. Any time there are schema changes in the federated distributed database, you must run SYNC SCHEMA to import any changes in the schemas on the shards.

Note that subsequent runs of SYNC SCHEMA will not drop and recreate the object, but will generate ALTER statements to incorporate the definition changes. This ensures that if there are queries already running during the SYNC SCHEMA operation, they won't fail with invalid object errors.

Handling Errors During DDL Processing Phase

If DDL fails on the shard catalog, the status of each DDL can be examined with the SYNC SCHEMA -show option.

gdsctl> sync schema -show

Note: The SYNC SCHEMA -show command is different from the command SHOW DDL. SHOW DDL lists DDL statements run by an all-shard user that are first run on the catalog and then propagated to the shards, whereas SYNC SCHEMA -show DDL statements are generated from the objects imported from shards.

By default, SYNC SCHEMA -show lists a fixed number of the latest DDLs. The -count and -ddl options can be used to inspect specific range of DDLs. For example,

gdsctl> sync schema -show -count 20
gdsctl> sync schema -show -count 20 -ddl 5

To check the complete DDL text and error message, if any, use the -ddl option.

gdsctl> sync schema -show -ddl 5

To list only the failed DDL statements, use the -failed_only option.

gdsctl> sync schema –failed_only

Based on the error message of the failed DDL, fix the cause of the error and perform the apply phase.

gdsctl> sync schema -apply

The SYNC SCHEMA command also has a -restart option to perform the complete operation from the beginning as if it were run for the first time. This option will DROP all existing schemas imported during all previous runs of SYNC SCHEMA and any related metadata. Be aware that this will cause any running queries on these objects to fail.

gdsctl> sync schema -restart