Deduplication
Description
Deduplicates events based on a grouping key, remembering the last passed entry per key. The first event for a given key always passes through, and subsequent events are forwarded only if they satisfy the filter condition. By default, the filter condition is false, which means only the first event for a given key passes through (simple deduplication).
Parameters and configuration
| Name | Description |
|---|---|
| Group by | Groups events for deduplication. Events with the same key share one deduplication entry. |
| Value | Value to track for deduplication. Accessible as #previousEntry.value and #incomingEntry.value in the filter condition. |
| Filter condition | Logical expression determining when a record should pass through. If the condition is not met, the record is not passed to further processing. Use #previousEntry (existing state) and #incomingEntry (new value) to compare timestamp and value. Defaults to false (simple deduplication). |
| TTL | Time after which the deduplication entry expires. The timer resets with each incoming event for a given key. After this period of inactivity, the next event is treated as new. |
How it works
- An event arrives and is grouped by the Group by expression.
- If there is no existing entry for this key (first event or state expired), the event passes through.
- If an entry exists, the Filter condition is evaluated with
#previousEntry(existing state) and#incomingEntry(new value), each containingtimestampandvalue. The event passes through only when the condition returnstrue. - The TTL is an inactivity timeout: the timer resets on every incoming event for a given key, regardless of whether the event passes through or is filtered. After the TTL elapses with no events, the state is cleared.
Examples
The examples below assume the following input event schema:
{
"subscriptionId": "sub1",
"amount": 100
}
Simple deduplication
To pass only the first event per subscription:
- Group by:
#input.subscriptionId - Value:
#input - Filter condition:
false(default) - TTL:
1 hour
If four events arrive for the same subscription within an hour, only the first one passes through. The remaining three are filtered out. After 1 hour of inactivity, the state expires and the next event is treated as new.
Conditional deduplication
To pass through events only when the amount increased by at least 20 compared to the last passed event:
- Group by:
#input.subscriptionId - Value:
#input - Filter condition:
#incomingEntry.value.amount >= #previousEntry.value.amount + 20 - TTL:
1 hour
If events arrive with amount values 0, 18, 19, 20, 38, 39, 40, the output will contain events with amounts 0, 20, 40. The first event (0) always passes. Events 18 and 19 are filtered because they don't reach 0 + 20. Event 20 passes and becomes the new baseline. Events 38 and 39 are filtered because they don't reach 20 + 20. Event 40 passes.