Migrate from DynamoDB JSON file in AWS S3 to Oracle NoSQL Database

This example shows how to use the Oracle NoSQL Database Migrator to copy DynamoDB JSON file stored in an AWS S3 store to Oracle NoSQL Database.

Use Case:

After evaluating multiple options, an organization finalizes Oracle NoSQL Database over DynamoDB database. The organization wants to migrate their tables and data from DynamoDB to Oracle NoSQL Database.

See Mapping of DynamoDB table to Oracle NoSQL table for more details.

You can migrate a file containing the DynamoDB exported JSON data from the AWS S3 storage by specifying the path in the source configuration template.

A sample DynamoDB-formatted JSON File is as follows:
{"Item":{"Id":{"N":"101"},"Phones":{"L":[{"L":[{"S":"555-222"},{"S":"123-567"}]}]},"PremierCustomer":{"BOOL":false},"Address":{"M":{"Zip":{"N":"570004"},"Street":{"S":"21 main"},"DoorNum":{"N":"201"},"City":{"S":"London"}}},"FirstName":{"S":"Fred"},"FavNumbers":{"NS":["10"]},"LastName":{"S":"Smith"},"FavColors":{"SS":["Red","Green"]},"Age":{"N":"22"}}}
{"Item":{"Id":{"N":"102"},"Phones":{"L":[{"L":[{"S":"222-222"}]}]},"PremierCustomer":{"BOOL":false},"Address":{"M":{"Zip":{"N":"560014"},"Street":{"S":"32 main"},"DoorNum":{"N":"1024"},"City":{"S":"Wales"}}},"FirstName":{"S":"John"},"FavNumbers":{"NS":["10"]},"LastName":{"S":"White"},"FavColors":{"SS":["Blue"]},"Age":{"N":"48"}}}

You export the DynamoDB table to AWS S3 storage as specified in Exporting DynamoDB table data to Amazon S3.

Example:

For this demonstration, you will learn how to migrate a DynamoDB JSON file in an AWS S3 source to Oracle NoSQL Database. You will use a manually created configuration file for this example.

Prerequisites

  • Identify the source and sink for the migration.
    • Source: DynamoDB JSON File in AWS S3
    • Sink: Oracle NoSQL Database
  • Identify the table in AWS DynamoDB that needs to be migrated to Oracle NoSQL Database. Log in to your AWS console using your credentials. Go to DynamoDB. Under Tables, choose the table to be migrated.
  • Create an object bucket and export the table to S3. From your AWS console, go to S3. Under buckets, create a new object bucket. Go back to DynamoDB and click Exports to S3. Provide the source table and the destination S3 bucket and click Export.
    Refer to steps provided in Exporting DynamoDB table data to Amazon S3 to export your table. While exporting, you select the format as DynamoDB JSON. The exported data contains DynamoDB table data in multiple gzip files as shown below.
    / 01639372501551-bb4dd8c3 
    |-- 01639372501551-bb4dd8c3 ==> exported data prefix
    |----data
    |------sxz3hjr3re2dzn2ymgd2gi4iku.json.gz  ==>table data
    |----manifest-files.json
    |----manifest-files.md5
    |----manifest-summary.json
    |----manifest-summary.md5
    |----_started
  • You need aws credentials (including access key ID and secret access key) and config files (credentials and optionally config) to access AWS S3 from the migrator. See Set and view configuration settings for more details on the configuration files. See Creating a key pair for more details on creating access keys.

Procedure

To migrate the DynamoDB JSON data to Oracle NoSQL Database, use one of the following options:

  1. Prepare the configuration file (in JSON format) with the identified source and sink details. For details, see Source Configuration Templates and Sink Configuration Templates.

    Note:

    If the items in your DynamoDB JSON File in AWS S3 contain TTL attribute, to optionally import the TTL values, specify the attribute in the ttlAttributeName configuration parameter of the source configuration template and set the includeTTL configuration parameter to true in the sink configuration template. For more details, see Migrating TTL Metadata for Table Rows.
    Set the defaultSchema to TRUE in the sink configuration template. The Migrator utility creates the table with the default schema at the sink. You must specify the DDBPartitionKey and the corresponding NoSQL column type. Otherwise, an error is thrown.
    {
      "source" : {
        "type" : "aws_s3",
        "format" : "dynamodb_json",
        "s3URL" : "<https://<bucket-name>.<s3_endpoint>/export_path>",
        "credentials" : "</path/to/aws/credentials/file>",
        "credentialsProfile" : "default"
      },
      "sink" : {
        "type" : "nosqldb",
        "storeName" : "kvstore",
        "helperHosts" : ["phoenix126166:5000"],
        "table" : "sampledynDBImp",
        "includeTTL" : false,
        "schemaInfo" : {
          "DDBPartitionKey" : "<PrimaryKey:Datatype>",
          "defaultSchema" : true
        },
        "overwrite" : true,
        "requestTimeoutMs" : 5000
      },
      "abortOnError" : true,
      "migratorVersion" : "1.7.0"
    }
    
    For a DynamoDB JSON source, the default schema for the table will be as shown below:
    CREATE TABLE IF NOT EXISTS <TABLE_NAME>(DDBPartitionKey_name DDBPartitionKey_type, 
    [DDBSortKey_name DDBSortKey_type], DOCUMENT JSON, 
    PRIMARY KEY(SHARD(DDBPartitionKey_name),[DDBSortKey_name]))

    Where:

    TABLE_NAME = value provided for the sink 'table' in the configuration

    DDBPartitionKey_name = value provided for the partition key in the configuration

    DDBPartitionKey_type = value provided for the data type of the partition key in the configuration

    DDBSortKey_name = value provided for the sort key in the configuration if any

    DDBSortKey_type = value provided for the data type of the sort key in the configuration if any

    DOCUMENT = All attributes except the partition and sort key of a Dynamo DB table item aggregated into a NoSQL JSON column

  2. Open the command prompt and navigate to the directory where you extracted the NoSQL Database Migrator utility.
  3. Run the runMigrator command by passing the configuration file. Use --config or -c option.
    [~/nosqlMigrator]$./runMigrator 
    --config <complete/path/to/the/JSON/config/file>
  4. The utility proceeds with the data migration, as shown below.
    
    creating source from given configuration:
    source creation completed
    creating sink from given configuration:
    sink creation completed
    creating migrator pipeline
    [nosqldb sink] : start loading DDLs
    [nosqldb sink] : executing DDL: CREATE TABLE IF NOT EXISTS sampledynDBImp (Id INTEGER, document JSON, PRIMARY KEY(SHARD(Id)))
    [nosqldb sink] : completed loading DDLs
    migration started
    Start writing data to OnDB Sink
    executing for source:azkzkyynnq7w7j7yqbao5377he
    executing for source:jrnmvjofnu3bzlwql4cpyasuma
    executing for source:kdieseqlwa72xlaenwgo5vb7tq
    executing for source:wp776cslq46nhm5snsi23v4j3a
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/azkzkyynnq7w7j7yqbao5377he.json.gz
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/jrnmvjofnu3bzlwql4cpyasuma.json.gz
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/kdieseqlwa72xlaenwgo5vb7tq.json.gz
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/wp776cslq46nhm5snsi23v4j3a.json.gz
    [INFO] Writing data to OnDB Sink completed.
    [INFO] migration completed.
    Records provided by source=2, Records written to sink=2, Records failed=0,Records skipped=0.
    Elapsed time: 0min 0sec 747ms
    Migration completed.

Verification

Start the SQL prompt in your KVStore.
 java -jar lib/sql.jar -helper-hosts localhost:5000 -store kvstore
Verify that the sampledynDBImp table is created with the source data:
{"Id":102,"document":{"Address":{"City":"Wales","DoorNum":1024,"Street":"32 main","Zip":560014},"Age":48,"FavColors":["Blue"],"FavNumbers":[10],"FirstName":"John","LastName":"White","Phones":[["222-222"]],"PremierCustomer":false}}
{"Id":101,"document":{"Address":{"City":"London","DoorNum":201,"Street":"21 main","Zip":570004},"Age":22,"FavColors":["Red","Green"],"FavNumbers":[10],"FirstName":"Fred","LastName":"Smith","Phones":[["555-222","123-567"]],"PremierCustomer":false}}
  1. Prepare the configuration file (in JSON format) with the identified source and sink details. For details, see Source Configuration Templates and Sink Configuration Templates.

    Note:

    If the items in your DynamoDB JSON File in AWS S3 contain TTL attribute, to optionally import the TTL values, specify the attribute in the ttlAttributeName configuration parameter of the source configuration template and set the includeTTL configuration parameter to true in the sink configuration template. For more details, see Migrating TTL Metadata for Table Rows.
    To specify a user-defined schema file in the sink configuration template, set the defaultSchema to FALSE and specify the schemaPath as a file containing your DDL statement. For details, see Mapping of DynamoDB types to Oracle NoSQL types .

    Note:

    If the Dynamo DB table has a data type that is not supported in NoSQL, the migration fails.
    A sample user-defined schema file is shown below.
    CREATE TABLE IF NOT EXISTS sampledynDBImp (AccountId INTEGER,document JSON, 
    PRIMARY KEY(SHARD(AccountId)));
    The schema file is used to create the DynamoDB table as fixed columns at the sink as part of the migration. As long as the primary key data is provided, the input JSON record will be inserted, otherwise it throws an error.

    Note:

    • If the input data does not contain a value for a particular column(other than the primary key) then the column default value will be used. The default value should be part of the column definition while creating the table. For example id INTEGER not null default 0. If the column does not have a default definition then SQL NULL is inserted if no values are provided for the column.
    • If you are modeling DynamoDB table as a JSON document, ensure that you use AggregateFields transform in order to aggregate non-primary key data into a JSON column. For details, see aggregateFields.
    {
      "source" : {
        "type" : "aws_s3",
        "format" : "dynamodb_json",
        "s3URL" : "<https://<bucket-name>.<s3_endpoint>/export_path>",
        "credentials" : "</path/to/aws/credentials/file>",
        "credentialsProfile" : "default"
      },
      "sink" : {
        "type" : "nosqldb",
        "storeName" : "kvstore",
        "helperHosts" : ["phoenix126166:5000"],
        "table" : "sampledynDBImp",
        "includeTTL" : false,
        "schemaInfo" : {
          "schemaPath" : "<full path of the schema file with the DDL statement>"
        },
        "overwrite" : true,
        "requestTimeoutMs" : 5000
      },
      "transforms" : {
        "aggregateFields" : {
          "fieldName" : "document",
          "skipFields" : ["AccountId"]
        }
      },
      "abortOnError" : true,
      "migratorVersion" : "1.7.0"
    }
  2. Open the command prompt and navigate to the directory where you extracted the NoSQL Database Migrator utility.
  3. Run the runMigrator command by passing the configuration file. Use --config or -c option.
    [~/nosqlMigrator]$./runMigrator 
    --config <complete/path/to/the/JSON/config/file>
  4. The utility proceeds with the data migration, as shown below.
    
    creating source from given configuration:
    source creation completed
    creating sink from given configuration:
    sink creation completed
    creating migrator pipeline
    [nosqldb sink] : start loading DDLs
    [nosqldb sink] : executing DDL: CREATE TABLE IF NOT EXISTS sampledynDBImp (AccountId INTEGER, document JSON, PRIMARY KEY(SHARD(AccountId)))
    [nosqldb sink] : completed loading DDLs
    migration started
    Start writing data to OnDB Sink
    executing for source:azkzkyynnq7w7j7yqbao5377he
    executing for source:jrnmvjofnu3bzlwql4cpyasuma
    executing for source:kdieseqlwa72xlaenwgo5vb7tq
    executing for source:wp776cslq46nhm5snsi23v4j3a
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/azkzkyynnq7w7j7yqbao5377he.json.gz
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/jrnmvjofnu3bzlwql4cpyasuma.json.gz
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/kdieseqlwa72xlaenwgo5vb7tq.json.gz
    [INFO] [DDB S3 source] : start parsing JSON records from object: AWSDynamoDB/01754372757879-92f376b3/data/wp776cslq46nhm5snsi23v4j3a.json.gz
    [INFO] Writing data to OnDB Sink completed.
    [INFO] migration completed.
    Records provided by source=2, Records written to sink=2, Records failed=0,Records skipped=0.
    Elapsed time: 0min 0sec 747ms
    Migration completed.

Verification

Start the SQL prompt in your KVStore.
 java -jar lib/sql.jar -helper-hosts localhost:5000 -store kvstore
Verify that the sampledynDBImp table is created with the source data:
{"AccountId":102,"document":{"Address":{"City":"Wales","DoorNum":1024,"Street":"32 main","Zip":560014},"Age":48,"FavColors":["Blue"],"FavNumbers":[10],"FirstName":"John","LastName":"White","Phones":[["222-222"]],"PremierCustomer":false}}
{"AccountId":101,"document":{"Address":{"City":"London","DoorNum":201,"Street":"21 main","Zip":570004},"Age":22,"FavColors":["Red","Green"],"FavNumbers":[10],"FirstName":"Fred","LastName":"Smith","Phones":[["555-222","123-567"]],"PremierCustomer":false}}