Configuring a Federated Distributed Database

To deploy a federated distributed database configuration using existing databases, you define the database layout just as you would for the user-defined data distribution method, using GDSCTL commands.

The following is a high-level description of the process for creating and deploying a federated distributed database configuration.

  1. Run the GDSCTL CREATE SHARDCATALOG command with the FOR_FEDERATED_DATABASE option to create the configuration

  2. Add shard directors to the configuration.

  3. Add a shardspace to the configuration. A shardspace is defined as an existing database and its replica.

  4. Add a shard by adding the existing database to the shardspace, then run DEPLOY.

  5. Run GDSCTL SYNC SCHEMA to compare the schemas in the federated distributed database configuration and retrieve the common shared schemas. Use SYNC SCHEMA to inspect and apply the DDLs.

  6. Use SQL ALTER TABLE on the shard catalog to convert tables containing the same data across the federated shards to duplicated tables.

  7. Prepare the shards in the federated distributed database configuration for multi-shard queries.

Create a Federated Distributed Database Configuration

The GDSCTL command CREATE SHARDCATALOG is used to create the federated distributed database configuration, with the FOR_FEDERATED_DATABASE option used instead of selecting a data distribution method in the SHARDING parameter.

The usage for the GDSCTL command CREATE SHARDCATALOG in creating a federated distributed database is similar to how it is used to create the shard catalog with the user-defined data distribution method, except that instead of specifying a method in the SHARDING parameter, you use the FOR_FEDERATED_DATABASE option. That is, the FOR_FEDERATED_DATABASE option is mutually exclusive with the SHARDING option.

CREATE SHARDCATALOG -DATABASE connect_identifier
  [-USER username[/password]]
  [-REGION region_name_list] 
  [-CONFIGNAME config_name]
  [-AUTOVNCR ON/OFF]
  [-FORCE] 
  [-SDB distributed_db_name]
  [-SHARDSPACE shardspace_name_list]
  -FOR_FEDERATED_DATABASE

The CREATE SHARDCATALOG syntax statement above shows which parameters are supported. The parameters not shown are not supported when used with the FOR_FEDERATED_DATABASE data distribution method, for example, –AGENT_PASSWORD, REPFACTOR, and the Oracle Data Guard protection mode PROTECTMODE.

Note:

Only Oracle Data Guard replication is supported for a federated distributed database. Oracle Globally Distributed Database doesn't handle the creation and management of the Data Guard configuration, but you can use Data Guard parameters with the ADD SHARD command so that you can add the primary and standby databases to see the status in GDSCTL.

See Also:

The GDSCTL create shardcatalog topic in Oracle Database Global Data Services Concepts and Administration Guide for usage notes and command options.

Retrieve, Inspect, and Apply the DDLs

Run the GDSCTL SYNC SCHEMA command in phases to create the schema objects common to the existing databases in the shard catalog.

The GDSCTL SYNC SCHEMA syntax shown here illustrates the three phases of the operation.

sync[hronize] schema
  [-schema [schemalist | all] [-retrieve_only] [-restart [-force]]
 | -apply [-skip_first]
 | -show [[-ddl ddlnum] [-count n] | [-failed_only]]]

SYNC SCHEMA should be run in phases, as described here.

  1. Retrieve Phase

    Run SYNC SCHEMA with the -retrieve_only option to inspect and verify the DDLs before they are run on the shard catalog.

    sync schema -schema schemalist –retrieve_only

    When SYNC SCHEMA is run without -retrieve_only, the DDL is retrieved and applied at the same time.

  2. Inspection Phase

    You can examine the DDL statements and their processing status with the -show option. The -ddl ddlnum option shows the specified DDL, and the -count n option specifies the maximum number of entries to show.

    sync schema –show -ddl ddlnum -count n

    Or you can use the -failed_only option to examine only the errored out statements.

    sync schema –show -failed_only
  3. Apply Phase

    In the final phase, you run the DDLs on the shard catalog to create the schemas and their objects.

    sync schema –apply

    If you get an error in the apply phase, there are a couple of ways to work around it:

    • If you can fix the cause of the error, fix and then retry SYNC SCHEMA -apply, which retries the failed DDL.

    • If the DDL cannot be fixed or it is not required, you can run SYNC SCHEMA –apply -skip_first, which resumes the apply phase from the point of the DDL failure.

    For security reasons, Oracle Globally Distributed Database doesn't offer a way to edit the DDLs.

  4. Import Incremental Changes

    If there are changes in the schema at a later point, the previous phases can be run again to import incremental changes. For example, when new objects are added, or a new column is added to a table, which will generate an ALTER TABLE ADD statement.

See Also:

The sync schema (synchronize schema) topic in Oracle Database Global Data Services Concepts and Administration Guide for more SYNC SCHEMA usage notes and option details.

SYNC SCHEMA Operations for information about the tasks performed by SYNC SCHEMA

Convert Tables to Duplicated Tables

Use ALTER TABLE table_name externally duplicated to mark tables as duplicated in a federated distributed database configuration.

Any table created by SYNC SCHEMA is considered by the multi-shard query layer as an externally sharded table. If the table contains the same data on all of the shards, you can alter the table to externally duplicated, so that the multi-shard query retrieves the data from one shard only, even if it is a query on a table with no filter predicates on ORA_SHARDSPACE_NAME.

ALTER TABLE table_name [externally duplicated | externally sharded]

Prepare the Shards For Multi-Shard Queries

Create all shard users and use the ORA_SHARDSPACE_NAME pseudo-column to perform queries on specific shards.

All Shard Users

Before running multi-shard queries from the shard catalog, you must create all shard users and grant them access to the sharded and duplicated tables. These users and their privileges should be created in the shard catalog under shard DDL enabled.

Create Shardspace-Specific Queries

A shardspace in a federated distributed database is a set consisting of a primary shard and zero or more standby shards. To filter query results for a particular shard[space], a pseudo-column called ORA_SHARDSPACE_NAME is added to every externally sharded table. The value of this pseudo column in the tables is the name of the shardspace.

Depending on the value of MULTISHARD_QUERY_DATA_CONSISTENCY, the rows can be fetched from the primary or from any of the standbys in the shardspace. To run a multi-shard query on a given shard, you can filter the query with the predicate ORA_SHARDSPACE_NAME = shardspace_name_shard_belongs_to.

A query like SELECT CUST_NAME, CUST_ID FROM CUSTOMER, where the table CUSTOMER is marked as externally sharded, runs on all of the shards.

A query like SELECT CUST_NAME, CUST_ID FROM CUSTOMER WHERE ora_shardspace_name = ‘EUROPE’ runs on the shards belonging to the shardspace_name Europe. Depending on the MULTISHARD_QUERY_DATA_CONSISTENCY parameter value, the query is run on either the primary shard of the shardspace Europe or on its standbys.

You can join sharded tables from different shardspaces. For example, to find the customers from shardspace Europe with orders in shardspace NA, write a query similar to the following.

SELECT order_id, customer_name FROM customers c , orders o WHERE c.cust_id = o.cust_id and
c.ora_shardspace_name = ‘Europe’ and o.ora_shardspace_name = ‘NA’ 

Querying an externally duplicated table, with or without the ORA_SHARDSPACE_NAME predicate, should go to only one of the shardspaces. The MULTISHARD_QUERY_DATA_CONSISTENCY parameter value determines whether to query a primary shard in the shardspace or its replicas.