Skip to main content
Version: Current

Deduplication

Mode: Streaming

Description

Deduplicates events based on a grouping key, remembering the last passed entry per key. The first event for a given key always passes through, and subsequent events are forwarded only if they satisfy the filter condition. By default, the filter condition is false, which means only the first event for a given key passes through (simple deduplication).

Parameters and configuration

NameDescription
Group byGroups events for deduplication. Events with the same key share one deduplication entry.
ValueValue to track for deduplication. Accessible as #previousEntry.value and #incomingEntry.value in the filter condition.
Filter conditionLogical expression determining when a record should pass through. If the condition is not met, the record is not passed to further processing. Use #previousEntry (existing state) and #incomingEntry (new value) to compare timestamp and value. Defaults to false (simple deduplication).
TTLTime after which the deduplication entry expires. The timer resets with each incoming event for a given key. After this period of inactivity, the next event is treated as new.

How it works

  1. An event arrives and is grouped by the Group by expression.
  2. If there is no existing entry for this key (first event or state expired), the event passes through.
  3. If an entry exists, the Filter condition is evaluated with #previousEntry (existing state) and #incomingEntry (new value), each containing timestamp and value. The event passes through only when the condition returns true.
  4. The TTL is an inactivity timeout: the timer resets on every incoming event for a given key, regardless of whether the event passes through or is filtered. After the TTL elapses with no events, the state is cleared.

Examples

The examples below assume the following input event schema:

{
"subscriptionId": "sub1",
"amount": 100
}

Simple deduplication

To pass only the first event per subscription:

  • Group by: #input.subscriptionId
  • Value: #input
  • Filter condition: false (default)
  • TTL: 1 hour

If four events arrive for the same subscription within an hour, only the first one passes through. The remaining three are filtered out. After 1 hour of inactivity, the state expires and the next event is treated as new.

Conditional deduplication

To pass through events only when the amount increased by at least 20 compared to the last passed event:

  • Group by: #input.subscriptionId
  • Value: #input
  • Filter condition: #incomingEntry.value.amount >= #previousEntry.value.amount + 20
  • TTL: 1 hour

If events arrive with amount values 0, 18, 19, 20, 38, 39, 40, the output will contain events with amounts 0, 20, 40. The first event (0) always passes. Events 18 and 19 are filtered because they don't reach 0 + 20. Event 20 passes and becomes the new baseline. Events 38 and 39 are filtered because they don't reach 20 + 20. Event 40 passes.