Deduplicator processor
The Deduplicator processor helps you identify duplicate records in a Mediation pipeline based on all fields or selected fields.
Use this processor when you want to evaluate records within a deduplication window before passing them downstream in the meter flow. For configuration steps, see Configure Deduplicator as a processor.
How the Deduplicator processor works
The Deduplicator uses two settings together to determine how duplicate records are evaluated:
-
Reference Time: determines which timestamp is used for deduplication.
-
Deduplication Strategy: determines how the deduplication window is applied.
Within a given window, only the first arriving event for a deduplication key is processed. Arrival order determines which event is processed first. Event timestamps do not reorder events or change winner selection.
When the Deduplicator identifies a duplicate record, the record is marked as an error and discarded.
Match options
You can configure the Deduplicator to compare records in one of the following ways:
-
All Fields: compares the entire record.
-
Specific Fields: compares only the fields that you select.
Reference Time
You can select one of the following Reference Time options:
-
Processing Time: uses the time when the event is received by the system.
-
Event Time: uses a timestamp field from the event payload to determine window placement.
If you select Event Time, you must specify the event timestamp field and the time format. If the event timestamp is missing or invalid, the record is sent to the error output.
Deduplication Strategy
You can select one of the following Deduplication Strategy options:
-
Rolling Duration: starts a per-key time window when the first event arrives and keeps that window open for the configured duration.
-
Fixed Calendar Window: groups records into calendar periods such as hour, day, week, or month.
Calendar windows use the configured timezone so that boundaries match your business timezone.
Supported combinations
The following combinations are supported:
-
Rolling Duration with Processing Time.
-
Fixed Calendar Window with Processing Time.
-
Fixed Calendar Window with Event Time.
Rolling Duration with Event Time is not supported. Rolling Duration works with a continuously moving expiration timer based on system time, so it works only with Processing Time.
Rolling Duration behavior
You can choose whether duplicate arrivals extend the deduplication window.
-
If Reset window on duplicate is enabled, each duplicate arrival resets the expiration timer.
-
If Reset window on duplicate is disabled, the deduplication window remains fixed from the first time the record is seen.
Default behavior
If you do not change the settings, the Deduplicator uses Rolling Duration with Processing Time and a 30-day window.
This preserves the existing behavior for current pipelines.
Rolling Duration with Processing Time
Use this option for replay protection or retry suppression in real-time pipelines.
Example behavior:
-
A record is accepted on Monday.
-
The same record is identified as a duplicate on Wednesday and Friday.
-
After the configured duration expires, the same record is treated as new again.
Fixed Calendar Window with Processing Time
Use this option when deduplication should reset on calendar boundaries such as each hour, day, week, or month.
Example behavior:
-
A record is accepted on May 18.
-
The same record is identified as a duplicate again on May 18.
-
The same record is treated as new again on May 19.
Fixed Calendar Window with Event Time
Use this option when deduplication should follow the event timestamp instead of the arrival time. This option is useful for late-arriving events, historical replay, and out-of-order data.
Example behavior:
-
A Monday event arrives on Tuesday.
-
When Event Time is used, the event is still treated as part of Monday's deduplication window.
For Fixed Calendar Window with Event Time, the event timestamp determines which calendar window is used for duplicate checking. Deduplication state is not retained indefinitely. By default, the system keeps deduplication state for 60 days. If a matching record arrives after that retained state has expired, the processor no longer remembers the earlier record and treats the new record as unique.
For example, if a record with event date 2023-01-01 is processed in January 2023, it is stored under that calendar window. If the same record arrives again a few days later, it is identified as a duplicate. If the same record arrives again after the retained state has expired, it is treated as new even though the event date is still 2023-01-01.
Batch and streaming behavior
The Deduplicator behaves differently in streaming meters and batch meters.
- In streaming meters, deduplication state persists over time, so time windows and retention settings continue to apply across events.
- In batch meters, deduplication applies only within the current batch run.
- In batch meters, deduplication state is cleared when the batch run ends.
- In batch meters, Fixed Calendar Window can still partition records within the batch by the selected calendar window.
For more information, see Deduplicator behavior in batch and streaming meters.