9 Use the Oracle Big Data Manager bdm-cli Utility
Use the bdm-cli
(Oracle Big Data Manager Command Line Interface) utility
to copy data and manage copy jobs at the command line. The bdm-cli
utility is installed by default on all nodes of the cluster.
You don't need to install it.
bdm-cli
has several commands that
duplicate odcp
commands, but bdm-cli
also includes additional commands for
scheduling and managing copy jobs and other administrative tasks.
9.1 Usage
You can use bdm-cli
at the command line to create and
manage copy jobs.
Syntax
bdm-cli [global_options] subcommand [options][arguments]...
Supported Storage Protocols and Paths
The protocols and paths to the file systems and storage services supported by
bdm-cli
are:
-
HDFS:
hdfs:///
-
Oracle Cloud Infrastructure Object Storage (formerly known as Oracle Bare Metal Cloud Object Storage Service):
oss:///container
For operations with Oracle Cloud Infrastructure Object Storage, you must specify the provider by using the options
src-provider
anddst-provider
. For example, those options are used withbdm-cli create_job
when used with Oracle Cloud Infrastructure Object Storage.
Finding a Job’s UUID
A number of bdm-cli subcommands require that you identify a job by its Universally Unique Identifier (UUID). To find UUIDs, execute bdm-cli list_all_jobs.
Specifying Source and Destination Paths
When specifying sources and destinations, fully qualify the paths:
-
source ...
File name qualified by protocol and full path, for example:
hdfs:///user/oracle/test.raw
-
destination
Directory name qualified by protocol and full path, for example:
swift://container.storagename/test-dir
Setting Environment Variables
bdm-cli
options as
environment variables. For example, you can set Oracle Big Data Manager URL and user password file, as
follows:export BDM_URL=https://hostname:8890/bdcs/api && export BDM_PASSWORD=/tmp/password_file
All the bdm-cli
options that can be set as environment variables are documented in the sections below.
Getting Help
bdm-cli
use: bdm-cli --help
bdm-cli command --help
bdm-cli edit_job_template --help
9.2 Options
Options that can be used by all bdm-cli commands are explained below.
Option | Description |
---|---|
--bdm-passwd path_to_password_file |
Path to the Oracle Big Data Manager user password file. Environment variable: |
--bdm-url bdm_url |
Oracle Big Data Manager server URL. Environment variable: |
--bdm-username username |
Oracle Big Data Manager server user name. Default value: Environment variable: |
-f [table|csv|json] |
Specify the output format:
|
--fields fields |
Specifies comma-separated fields depending on the type of object. |
|
Show this message and exit. |
--no-check-certificate |
Don't validate the server's certificate. |
--proxy proxy |
Proxy server. |
--tenant-name tenant_name |
Name of the tenant. Default value: |
-v |
Print the REST request body. |
--version
|
Show the Oracle Big Data Manager version and exit. |
9.3 Subcommands
The following table summarizes the bdm-cli subcommands. For more details on each, click the name of the command.
Command | Description |
---|---|
bdm-cli abort_job | Abort a running job. |
bdm-cli copy | Execute a job to copy sources to destination. |
bdm-cli create_job | Execute a new job from an existing template. |
bdm-cli create_job_template | Create a new job template. |
bdm-cli get_data_source | Find a data source by name. |
bdm-cli get_job | Get a job by UUID. |
bdm-cli get_job_log | Get a job log. |
bdm-cli list_all_jobs | List all jobs from the execution history. |
bdm-cli list_template_executions | List all jobs from the execution history for the given template. |
bdm-cli ls | List files from a specific location. |
9.4 bdm-cli abort_job
Abort a running job.
Syntax
bdm-cli abort_job [options] job_uuid
Options
Option | Description |
---|---|
|
Force abort job. |
|
Show this message and exit. |
Example
Abort a job.
/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8890/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE} abort_job 24ef30e8-913b-4402-baf8-74b99c211f50
9.5 bdm-cli copy
Execute a job to copy sources to destination.
Syntax
bdm-cli copy [options] source... destination
Options
Option | Description |
---|---|
|
Specify the block size in bytes. |
|
Data source description. |
|
Specify the maximum amount of memory for the Oracle Storage Cloud Service driver. |
|
Specify the provider of the destination, when using Oracle Cloud Infrastructure Object Storage Classic destination. |
|
Show this message and exit. |
|
Specify the Spark executors memory limit in GB per node, for example, |
|
Specify the maximum number of Spark executors per node, for example, |
|
Specify the maximum number of threads per node. |
|
Specify the part size in bytes. |
|
Recursively copy (enabled by default). |
|
Retry data transfer in case of failure. |
|
Specify the provider of the source, when using for Oracle Cloud Infrastructure Object Storage Classic. |
|
Synchronize the source with the destination. |
Example
Copy a file from HDFS to Oracle Storage Cloud Service:
/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url
${DATA_HOST}:8890/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd
${USER_PASSWORD_FILE} copy hdfs:///user/${DATA_USER}/1MFile.raw
oss:///${DATA_USER} --dst-provider ${OSS_PROVIDER}
9.6 bdm-cli create_job
Execute a new job from an existing template.
Syntax
bdm-cli create_job [options] job_template_name
Options
Option | Description |
---|---|
|
Execute job immediately if job scheduling is set. Ignored otherwise. |
|
Source file, for example:
|
|
The destination directory, for example: |
|
Specify the maximum amount of memory for an Oracle Storage Cloud Service driver. |
|
Specify the Spark executors memory limit in GB per node, for example: |
|
Specify the maximum number of Spark executors per node, for example: |
|
Specify the maximum number of threads per node. |
|
Specify the block size in bytes. |
|
Specify the part size in bytes. |
|
Retry data transfer in case of failure. |
|
Synchronize the source with the destination. |
|
Recursively copy (enabled by default). |
|
Main Java class used for the Spark job execution. |
|
Specify the provider of the source when using Oracle Cloud Infrastructure Object Storage source. |
|
Specify the provider of the destination when using Oracle Cloud Infrastructure Object Storage destination. |
|
Show this message and exit. |
9.7 bdm-cli create_job_template
Create a new job template.
Syntax
bdm-cli create_job_template [options] job_template_name source ... destination
Options
Option | Description |
---|---|
|
Abort an already running execution if the next scheduled execution is started. |
|
Specify block size in bytes. |
|
Job's data source name. |
|
Job template description. |
|
Specify for |
|
Environment in JSON format:
|
|
Show this message and exit. |
|
Count of executions history log. |
|
Main Java class used for the Spark job execution. |
|
Specify cron-like job schedule, for example:
|
|
Specify job template type. Allowed values are:
|
|
Hadoop libraries, for example: This option can have multiple values, for example:
|
|
Specify the Spark executors memory limit in GB per node, for example: |
|
Specify the maximum number of Spark executors per node, for example: |
|
Specify the maximum of threads per node. |
|
Specify part size in bytes. |
|
Recursively copy (enabled by default). |
|
Retry data transfer in case of failure. |
|
Specify the provider of the source, when using for Oracle Cloud Infrastructure Object Storage Service. |
|
Synchronize source with destination. |
|
User defined tag. This option can have multiple values, for example:
|
9.8 bdm-cli get_data_source
Find a data source by name.
Syntax
bdm-cli get_data_source [options] data_source_name
Options
Option | Description |
---|---|
|
Show this message and exit. |
9.9 bdm-cli get_job
Get a job by UUID.
Syntax
bdm-cli get_job [options] job_uuid
Options
Option | Description |
---|---|
|
Show this message and exit. |
Example
Get information on a job.
/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url
${DATA_HOST}:8890/bdcs/api --bdm-username
${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE}
get_job ${JOB_UUID}
9.10 bdm-cli get_job_log
Get a job log.
Syntax
bdm-cli get_job_log [options] job_uuid
Options
Option | Description |
---|---|
|
Show this message and exit. |
9.11 bdm-cli list_all_jobs
List all jobs from the execution history.
Syntax
bdm-cli list_all_jobs [options]
Options
Option | Description |
---|---|
|
Show this message and exit. |
|
Specify the size of the page. |
|
Specify the paging offset. |
Example
List all jobs.
/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8890/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE} list_all_jobs
Use the --offset
and --limit
options to restrict the results. For example to get the eighth page when there are 20 rows per page, do the following:
bdm-cli list_all_jobs --offset 8 --limit 20
9.12 bdm-cli list_template_executions
List all jobs from the execution history for the given template.
Syntax
bdm-cli list_template_executions [options] job_uuid
Options
Option | Description |
---|---|
|
Show this message and exit. |
9.13 bdm-cli ls
List files from a specific location.
Syntax
bdm-cli ls [options] path_1 ... path_n
Options
Option | Description |
---|---|
|
Human readable file sizes. |
|
List directories only. |
|
Specify for Oracle Bare Metal Cloud Object Storage Service paths. |
|
Show this message and exit. |
Example
List HDFS content under selected user.
/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8890/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE} ls hdfs:///user/${DATA_USER}/integration_in --provider hdfs