Configure a Data Flow for Incremental Processing
Apply incremental processing in a data flow to load only new or updated records from a database.
Applying incremental processing in a data flow enables you to load only new data rather than performing a full load each time, which is inefficient and costly. In other words, each time you load data using a data flow, you only process new data that's been added since the last run.
Before you start, create a connection to one of the supported databases, for example Oracle, Oracle Autonomous Data Warehouse, Apache Hive, Hortonworks Hive, or Map R Hive. Then, configure a new data indicator for that database. See Specify a New Data Indicator for a Data Source.
- Create or open the data flow in which you want to apply incremental processing.
- In the Data Flow editor select the Save Data step to display the Step editor pane.
- In the Dataset field, specify the name of the input dataset specified in the Add Data step.
- At the Save data to option select Database Connection.
- Click Select Connection and select a connection to one of the supported target databases.
- In the Table field, specify the name of the target table that you're writing to.
- In the When run option, select Add new data to existing data.
- Click Save.