Skip to main content
Version: Current

Single Side Join

Mode: Streaming

Description

note

In this documentation, data record is the neutral term used across all processing modes.
In purely streaming contexts, we typically use term event - that simply means a data record with a timestamp, which lets Nussknacker apply time-based processing logic.

Conceptually single-side-join is an equivalent of the left (or right) SQL join. In the SQL case, the left join returns all records from the left table, and the matched records from the right table. In Nussknacker's case the single-side-join joins two ‘branches’ of a scenario - the Main branch and the Joined branch and returns exactly as many events as there were in the Main branch. Even if no events are matched in the Joined branch, an event will be emitted, with the aggregator function result corresponding to the aggregator selected - null for List and Set, 0 for Sum, null for Min and Max. The time window boundaries will be determined by the event coming from the main branch and will be in the range of [main-branch-event-event-time - windowLength, main-branch-event-event-time].

Under the hood single-side-join uses sliding window on the JOINED branch to deliver events to the MAIN branch. You can choose which aggregate function to use on the JOINED branch; Last is an obvious choice if you want the most recent value seen in the JOINED branch.

Single-side-join can be an attractive and very fast alternative to database lookup's if you have enough memory to stream your whole lookup table and (if needed) you are able to stream changes of the look up table records. To use single-side-join as a very fast lookup, configure the topic containing the lookup table values as a JOINED branch. Make sure that you set window length to value high enough to ensure that there are always events which qualify to the window in the JOINED branch.

Parameters and configuration

NameDescription
branchTypeEither MAIN or JOINED; see above for description
keyExpression, which is used to match events from different branches; separate expression for each branch. The expression in this field can refer to events and their associated variables (including #input) from their respective branches only.
Output variable namethe name of the variable which will hold the results of the aggregator function.
AggregatorAggregator function; see this document for the list of available aggregator functions.
WindowLengthSee Description section for explanation
AggregateByan input to the aggregator function; for each event with the same groupBy value which qualifies to the time window, the aggregateBy expression is evaluated and fed to the aggregator function. As the aggregator function is applied to events from the joined branch, you can refer to variables present in the joined branch only.

Additional considerations

  • Because there are no tables and table names to refer to, Nussknacker derives names of the branches to join from the names of nodes taking part in the single-side-join.
  • We use slice optimization to reduce amount of memory used by Flink state. Read here to find more.