Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pipedata.io/llms.txt

Use this file to discover all available pages before exploring further.

pd uniq creates a transform pipe that reads from a source pipe and emits each record only once. Duplicates are detected by the parsed record, not by the raw JSON text.

TL;DR

$ pd uniq orders-unique --from orders
$ pd pull orders-unique

How it works

pd uniq <name> --from <source> creates a new pipe <name> that reads everything from <source> and emits each unique record once. The new pipe starts running immediately; it’s a regular pipe you can pull, pause, or feed into a dest http.

What counts as a duplicate

Records are hashed after JSON parsing, so formatting differences are ignored. Object keys are sorted before hashing, and types are not coerced. These all hash the same — they’re one record, deduplicated:
{"foo":"bar","n":1}
{ "foo" : "bar" , "n" : 1 }
{"n":1,"foo":"bar"}
These are different records (different keys, different types):
{"foo":"bar"}
{"foo ":"bar"}   ← trailing space inside the key string
{"n":1}
{"n":1.0}        ← integer vs float
Flags:
FlagDefaultPurpose
--fromSource pipe to dedupe. Required on create.
--duplicatesfalseInvert: emit only the duplicates (one per group) instead of uniques.
--start-id0First record ID in the source to consider.

Common patterns

Dedupe then forward
$ pd source http raw
$ pd uniq clean --from raw
$ pd dest http webhook \
  --source clean \
  --method POST \
  --url https://example.com/hook \
  --retry
Inspect duplicates only
$ pd uniq dup-only --from orders --duplicates
$ pd pull dup-only | jq .

See also