Deduplicator examples
Use the following examples to choose a Deduplicator configuration that matches your use case.
Replay protection in a streaming meter
Use Processing Time with Rolling Duration when you want to block duplicate retries or replayed events over a moving time window.
Example configuration:
-
Reference Time: Processing Time.
-
Deduplication Strategy: Rolling Duration.
-
Duration: 30 days.
-
Reset window on duplicate: Enabled.
Behavior:
-
The event is accepted on Jan 1.
-
Duplicate events on Jan 10 and Jan 20 are identified as duplicates.
-
A later duplicate on Feb 15 is still treated as a duplicate because the window keeps extending when duplicates arrive.
Batch file deduplication
Use Processing Time with Rolling Duration when you need deduplication only within the current batch run.
Example configuration:
-
Reference Time: Processing Time.
-
Deduplication Strategy: Rolling Duration.
-
Duration: 30 days.
Example batch content:
-
B -
B -
C
Result:
-
B -
C
Important notes:
-
The configured duration does not persist after the batch run ends.
-
In batch mode, TTL has no practical long-term effect.
-
In batch mode, reset-window behavior is also not meaningful after the batch completes.
Batch plus calendar windows
Use Processing Time with Fixed Calendar Window when you want deduplication to reset by calendar boundary inside the same batch.
Example configuration:
-
Reference Time: Processing Time.
-
Deduplication Strategy: Fixed Calendar Window.
-
Calendar Unit: Day.
Example batch records:
-
Event A on Monday.
-
Event A on Monday.
-
Event A on Tuesday.
Result:
-
Monday A: Accepted.
-
Monday A: Duplicate.
-
Tuesday A: Accepted again.
This happens because calendar windows partition deduplication scope inside the batch itself.
Historical replay or late-arriving events
Use Event Time with Fixed Calendar Window when events can arrive late, out of order, or from historical backfills.
Event Time determines which calendar window is used for duplicate checking, but deduplication state is retained only for a limited period. By default, the system keeps that state for 60 days. If a matching historical record arrives after the retained state has expired, it is treated as new.
Example behavior:
- A record with event date
2023-01-01is processed in January 2023 and stored in that calendar window. - If the same record with the same deduplication fields arrives again a few days later, it is identified as a duplicate.
- If the same record arrives again after the retained state has expired, it is treated as new even though the event date is still
2023-01-01.
Example configuration:
-
Reference Time: Event Time.
-
Deduplication Strategy: Fixed Calendar Window.
-
Calendar Unit: Day.
-
Event Timestamp Field: eventTimestamp.
Behavior:
-
A Monday event arriving late on Tuesday is still treated as part of Monday's deduplication window.
-
This configuration is useful for historical imports, replay processing, backfills, and out-of-order event streams.