7 Scaling a Streams Application
You can scale a single streaming service to run on multiple nodes to handle a high volume of stream events from large Oracle NoSQL Database stores. The streaming service can use multiple subscribers to stream data from the Oracle NoSQL Database store.
The stream processing application does not need to know the topology of the Oracle NoSQL Database store, but can simply add or remove more independent subscribers as needed. All that the stream processing application needs to specify is the number of subscribers and a subscriber ID.
The following illustration depicts how a stream application can be scaled to use two
clients to stream data from the six shards of the Oracle NoSQL Database store:
Description of the illustration multiple_subscribers.png
The Oracle NoSQL Database store strives to achieve even distribution of streams among its subscribers. As shown in the illustration, there are six shards and two subscribers. In this example, each subscriber receives streams from three shards. The subscriber does not choose which shard it gets streams from. The system determines this automatically, and the decision is transparent to the subscribers. In cases where there are more shards than of subscribers (as in this example), some subscribers can receive streams from more than one shard.
Note:
-
The maximum number of scalable subscribers cannot exceed the number of shards. For example, if the Oracle NoSQL Database has six shards, subscribers cannot be scaled to more than six clients.
-
Oracle NoSQL Databasethat you scale stream processing applications by running them on different nodes to benefit from newly added resources.
Although scalable subscribers can be created and run inside separate JVMs on the same node, such configuration would not have any benefit over running a single subscriber without using scalable subscribers. In our example, running two scalable subscribers inside different JVMs (but within the same node), streaming over three shards each, would not benefit over running a single subscriber on the same node that is subscribed to the entire data store.
Elastic operations in data stores to Streams API are supported both when a single
subscriber or multiple, sharded subscribers are deployed. Sharded subscribers are a part
of a Subscriber group. When multiple sharded subscribers are deployed, after an elastic
operation, the writes to a shard key would be delivered (via your implementation of
NoSQLSubscriber#onNext
), in the same order as they were made in the
data store.
Description of the illustration subscribers_implementation_architecture.png
The above figure shows an example that two sharded subscribers streaming writes to a single key in a partition P1 during migration. There are two shards, source shard1 and target shard2. Writes to a migrating partition P1 are partly in shard1 and partly on shard2. The partition P1 would be removed from the shard1 after the migration is complete. Similarly, the partition P1 would be re-built at shard2 after the migration is complete. All writes to the partition P1 at shard1 will stream from shard1 to the sharded subscriber (whose id is 2_0). Similarly, all writes to partition P1 at shard 2 will stream from shard2 to the sharded subscriber (whose id is 2_1). From the diagram you can see that writes to shard1 takes place prior to writes in shard 2. At the delivery time, all writes delivered to subscriber 2_0 will be delivered before all writes delivered to subscriber 2_1. This will ensure the correct order of delivery of multiple sharded subscribers.