Processing Bulk Data Best Practices
Your automation work might involve processing bulk data, such as a file of input data or a JSON object containing a number of items. You have several options for processing bulk data.
Questions
Understand how the answers to the questions in the flow chart help inform your decision-making process.
Question | Example | Why the question matters |
---|---|---|
![]() |
For example, do you need to update record 1, then record 2, and so on? Or can you update 5 records simultaneously? |
Robots can process orders in sequence without any issues. However, when your business requirements allow it, you'll find opportunities for efficiency by processing records in parallel. |
![]() |
For example, if you need to update 100 records, and each update takes 30 seconds, the total processing time is 50 minutes. |
In general, when the total processing time for all records exceeds 30 minutes, Oracle recommends using an integration to manage the distribution of work across multiple robots. On the other hand, when the total processing time for all records is less than 30 minutes, you can allow the robot to manage the distribution of its own work and don't need a more robust solution architecture. The 30-minute time limit is an Oracle-recommended limit. Your organization can choose a different time period. Consider the amount of time you're willing to wait to determine whether a set of records processed successfully, as well as the service limits. See Service Limits in Provisioning and Administering Oracle Integration 3. |
Processing Options
The flow chart provided several processing options. Review them in more detail.
Processing option | Description | Use cases |
---|---|---|
![]() |
Create an integration with a foreach loop that handles parallel iterations. The integration processes the data in parallel. Each branch invokes a robot instance to process one or more records. For example, consider a data set with 100 records. An integration supports 5 parallel branches, and each branch calls 1 robot. Therefore, the integration and robot process 5 records at a time. For guidance on the number of robot instances to invoke and the number of records to pass to each robot, keep reading. |
This solution is efficient when the total processing time for your records is high, either because you have a lot of records to process or because each record takes a long time to process, and your business requirements allow you to process the records in any order. |
![]() |
Create an integration with a foreach loop that handles sequential iterations. The integration iterates over all of the records, one at a time, and invokes a robot instance for each record in turn. For guidance on the number of robot instances to invoke and the number of records to pass to each robot, keep reading. |
This solution is ideal for the following scenarios:
|
![]() |
Create an integration that passes the entire data set to a single robot instance. In the robot, create a foreach loop that iterates over all of the records, one at a time. |
This solution is easy and straightforward and is best for records that can be processed relatively quickly and must be processed in a specific order. |
Additional Factors: Number of Robots and Records
Several scenarios require you to determine the number of robots that process records and the number of records that each robot processes.
The following scenarios require you to make these decisions:
Consider the following factors.
Factor | More information |
---|---|
Overhead for calling a robot instance |
Each robot accomplishes one or more specific goals, such as updating a record. However, to achieve its goal, a robot must complete other tasks, such as opening an application, signing in, and navigating to the right page. All of the tasks that a robot does to prepare for its specific goal are the robot's overhead. For example, a robot that takes one minute and 15 seconds (1m 15s) to run might spend 1 minute navigating to the right page and then 15 seconds accomplishing its goal. That robot has 1 minute of overhead. |
Total processing time for all records |
The following components determine the total processing time for all records:
For example, if you pass 3 records to a robot instance, you eliminate the overhead for 2 robots, but you also increase the total running time for the robot instance. |
30-minute (or a different organization-created) time limit |
The time limit is the amount of time that you're willing to wait before knowing whether a robot has succeeded. This value becomes the maximum processing time for a set of records and helps you calculate the number of records to send to a given robot instance. To maximize the efficiency of your automation, Oracle recommends passing the maximum number of records to the robot instance to limit the overhead time. Additionally, you can use parallel processing to reduce the clock time that passes before all records are processed. However, remember that each branch of the parallel processing incurs the overhead costs. Depending on the overhead duration and other components, distributing records to 5 branches might be less efficient than distributing records to only 3 branches. |
Sample Calculations
Sample calculations help you understand how to calculate the optimal number of robots to use and the number of records to send them.
Simple Scenarios
A robot takes one minute and 15 seconds (1m 15s) to run. The robot spends 1 minute navigating to the right page and then 15 seconds accomplishing its goal. Different numbers of records and robot instances impact the total processing time for this work.
Scenario | More information |
---|---|
Five robot instances each process one record, either sequentially or in parallel |
Each robot requires 1m 15s to run, resulting in a total processing time of 6m 25s:
The robots can run sequentially or in parallel. |
One robot instance processes five records sequentially |
The robot requires 1 minute of overhead, and then 15 seconds of processing time for each record, resulting in a total processing time of 2m 25s:
|
One robot instance processes 150 records sequentially |
Reducing overhead costs improves the efficiency of your automation, but passing too many records to a single robot instance can result in longer-than-preferred processing times. For example, if one robot updates 150 records, you save 149 minutes of overhead time. However, the total processing time is 38m 30s, which might be longer than you want to wait to determine whether all the updates completed successfully.
|
Sequential Processing
If your business requires you to process records in sequence, determine the optimal number of tasks that each robot instance should process.
Here's how to complete these calculations.
-
Determine the overhead time
For example, consider a robot that spends 1 minute navigating to the right page and then 15 seconds accomplishing its goal. This robot has 1 minute, or 60 seconds, of overhead.
-
Determine the maximum time to process records
The 30-minute time limit contains 1,800 seconds:
30 minutes
x
60 seconds
= 1,800 seconds
Each record requires 60 seconds of overhead. You must subtract the overhead time from the maximum processing time:
1,800 seconds
-
60 seconds
= 1,740 seconds
This calculation assumes that in 30 minutes, you complete the overhead one time and then use the rest of the time to process records.
-
Calculate the number of records that you can process
A robot needs 15 seconds to process each record.
To calculate the maximum number of records that a robot can process, divide the maximum time to process records by the time to process each record:
1,740 seconds maximum time
/
15 seconds per record
= 116 records
Theoretically, the optimal number of records for each robot to process is 116.
Note:
This conclusion is theoretical because it makes several potentially faulty assumptions. For example, the calculation assumes that the processing time never changes, but response times vary significantly in the real world. The optimal value according to a calculator doesn't reflect these varying circumstances. When making decisions, consider building in some wiggle room that accommodates requirements that these calculations don't consider, such as network latency.Parallel Processing
If your business allows you to process records in parallel, you can distribute the work in a way that minimizes the time of the jobs.
-
Calculate the total potential overhead time
For example, if you have 100 records to process, and each record requires 60 seconds of overhead time, the total potential overhead is 6,000 seconds:
100 records
x
60 seconds of overhead
= 6,000 seconds of potential overhead
You can reduce this value by processing multiple records using a single robot instance.
-
Calculate the processing time without any overhead
For example, if each record requires 15 seconds to process (without its overhead time), the total processing time is 1,500 seconds:
15 seconds of processing time
x
100 tasks
= 1,500 seconds
You cannot reduce this time. However, you can reduce the amount of time that passes on the clock by processing records in parallel.
-
Consider several scenarios to find your preferred number of parallel branches (up to 5) and the number of records that each processes
To minimize the processing time, including overhead, you need to reduce your overhead time as much as possible while staying within the 30-minute time limit (or whatever time limit your organization chooses). Processing the records in parallel also minimizes the total time that passes on the clock before the jobs complete.
Calculate several scenarios to find your preferred combination. For example:
Scenario Total processing time per branch Calculation 2 branches, 50 records per branch
810 seconds (13 ½ minutes) 60 seconds of overhead
+
(50 records x 15 seconds of processing time)
= 810 seconds
3 branches, 33 or 34 records per branch
570 seconds (9 ½ minutes) 60 seconds of overhead
+
(34 records x 15 seconds of processing time)
= 570 seconds
4 branches, 25 records per branch
435 seconds (7 ¼ minutes) 60 seconds of overhead
+
(25 records x 15 seconds of processing time)
= 435 seconds
5 branches, 20 records per branch
360 seconds (6 minutes) 60 seconds of overhead
+
(20 records x 15 seconds of processing time)
= 360 seconds
5 branches, 20 records per branch, sent in 2 different batches
Each batch finishes in 210 seconds (3 ½ minutes), for a total processing time of 420 seconds (7 minutes) per branch
[60 seconds of overhead + (10 records x 15 seconds of processing time)]
x
2
= 420 seconds
With a higher number of records or higher processing times, you might need to consider sending records to each branch in batches. This approach often reduces the processing time for a given batch of records but increases the total processing time. The following table provides sample calculations for processing 500 records in parallel.
Scenario Total processing time per branch Calculation 2 branches, 250 records per branch
3810 seconds (63 ½ minutes)
This value exceeds the 30-minute time limit
60 seconds of overhead
+
(250 records x 15 seconds of processing time)
= 3810 seconds
3 branches, 166 or 167 records per branch
2565 seconds (42 ¾ minutes)
This value exceeds the 30-minute time limit
60 seconds of overhead
+
(167 records x 15 seconds of processing time)
= 2565 seconds
4 branches, 125 records per branch
1935 seconds (32 ¼ minutes)
This value exceeds the 30-minute time limit
60 seconds of overhead
+
(125 records x 15 seconds of processing time)
= 1935 seconds
4 branches, 125 records per branch, send in 2 different batches Each batch finishes in 1005 seconds (16 ¾ minutes), for a total processing time of 2010 seconds (33 ½ minutes) per branch
[60 seconds of overhead + (63 records x 15 seconds of processing time)]
x
2
= 2010 seconds
5 branches, 100 records per branch
1560 seconds (26 minutes)
60 seconds of overhead
+
(100 records x 15 seconds of processing time)
= 1560 seconds
5 branches, 100 records per branch, sent in 2 different batches
Each batch finishes in 810 seconds (13 ½ minutes), for a total processing time of 1620 seconds (27 minutes) per branch
[60 seconds of overhead + (50 records x 15 seconds of processing time)]
x
2
= 1620 seconds
-
Choose the right scenario for your requirements
Consider your calculations. Build in some wiggle room for periods of higher-than-usual volume, network latency, and other unforeseen issues. Then, choose an approach.
Oracle recommends testing an integration and robot under load before going live to confirm that your approach will succeed in the real world.