Deduplicate a stream

pd uniq creates a transform pipe that reads from a source pipe and emits each record only once. Duplicates are detected by the parsed record, not by the raw JSON text.

TL;DR

$ pd uniq orders-unique --from orders
$ pd pull orders-unique

How it works

pd uniq <name> --from <source> creates a new pipe <name> that reads everything from <source> and emits each unique record once. The new pipe starts running immediately; it’s a regular pipe you can pull, pause, or feed into a dest http.

What counts as a duplicate

Records are hashed after JSON parsing, so formatting differences are ignored. Object keys are sorted before hashing, and types are not coerced. These all hash the same — they’re one record, deduplicated:

{"foo":"bar","n":1}
{ "foo" : "bar" , "n" : 1 }
{"n":1,"foo":"bar"}

These are different records (different keys, different types):

{"foo":"bar"}
{"foo ":"bar"}   ← trailing space inside the key string
{"n":1}
{"n":1.0}        ← integer vs float

Flags:

Flag	Default	Purpose
`--from`		Source pipe to dedupe. Required on create.
`--duplicates`	`false`	Invert: emit only the duplicates (one per group) instead of uniques.
`--start-id`	`0`	First record ID in the source to consider.

Common patterns

Dedupe then forward

$ pd source http raw
$ pd uniq clean --from raw
$ pd dest http webhook \
  --source clean \
  --method POST \
  --url https://example.com/hook \
  --retry

Inspect duplicates only

$ pd uniq dup-only --from orders --duplicates
$ pd pull dup-only | jq .

​TL;DR

​How it works

​What counts as a duplicate

​Common patterns

​See also

TL;DR

How it works

What counts as a duplicate

Common patterns

See also