Skip to main content
Version: Current

Full-outer-join

Mode: Streaming

Description

note

In this documentation, data record is the neutral term used across all processing modes.
In purely streaming contexts, we typically use term event - that simply means a data record with a timestamp, which lets Nussknacker apply time-based processing logic.

Full Outer Join is Nussknacker's version of SQLs full outer join. It works much like Single Side Join component, but it has aggregates for all branches and emits a new event for every event it receives. Every time a new event is received, it is matched with events with the same key from the other branches in the time window of windowLength, then the aggregate for each branch is calculated. A new event is emitted containing aggregates for both branches. The time window boundaries will be determined by the event which has just arrived and will be in the range of [event-time - windowLength, event-time].

If an event cannot be matched, then a new event is still emitted, but the aggregate for the branch which had no matching events will have a value of zero or null, depending on the aggregator configured.

Parameters and configuration

NameDescription
KeyExpression, which is used to match events from different branches; separate expression for each branch. The expression in this field can refer to events and their associated variables (including #input) from their respective branches only.
AggregateByThe input to the aggregator function; separate for each branch. Only variables associated with the event from the given branch can be used in the AggregateBy expression.
AggregatorAggregator function; separate for each branch. See this document for the list of available aggregator functions.
WindowLengthSee above for explanation
Output variable namethe name of the variable which will hold the results of the aggregator function.

Content of the output variable:

{
"key" : "value of the key used to join branches",
"branch-one-name" : "branch-one-aggregator-result",
"branch-two-name" : "branch-two-aggregator-result"
}

Additional considerations

  • Unlike Single Side Join, Full Outer Join can have more than two input branches.
  • The #input variable will not be available downstream.
  • We use slice optimization to reduce amount of memory used by Flink state. Read here to find more.