Configure a Data Flow for Incremental Processing

Apply incremental processing in a data flow to load only new or updated records from a database.

Applying incremental processing in a data flow enables you to load only new data rather than performing a full load each time, which is inefficient and costly. In other words, each time you load data using a data flow, you only process new data that's been added since the last run.

Before you start, create a connection to one of the supported databases, for example Oracle, Oracle Autonomous Data Warehouse, Apache Hive, Hortonworks Hive, or Map R Hive. Then, configure a new data indicator for that database. See Specify a New Data Indicator for a Data Source.

Create or open the data flow in which you want to apply incremental processing.
In the Data Flow editor select the Save Data step to display the Step editor pane.
In the Dataset field, specify the name of the input dataset specified in the Add Data step.
At the Save data to option select Database Connection.
Click Select Connection and select a connection to one of the supported target databases.
In the Table field, specify the name of the target table that you're writing to.
In the When run option, select Add new data to existing data.
Click Save.