octa --unique-columns¶
Find columns, and optional small combinations, whose values are unique across a file. Useful for spotting primary-key candidates in undocumented data.
Synopsis¶
| Flag | Required | Meaning |
|---|---|---|
--unique-columns FILE |
yes | The file to scan. |
--table NAME |
no | Specific table for multi-table sources. |
--max-combo N |
no | Max combo size (default 1; clamped to [1, 3]). |
-f, --format FORMAT |
no | Output format: tsv (default), json, or csv. |
Uniqueness rule¶
A column is is_unique only when:
distinct_count == total_rows, ANDnull_count == 0, ANDtotal_rows > 0.
Rule 2 is deliberate: most databases reject NULL in a primary key,
so a column with a single null and otherwise distinct values is not
a PK candidate.
Combo strategy¶
When --max-combo > 1, the tool tests pairs (and triples if the
value is 3). To avoid pointless work, only columns whose own
distinct_count is in (1, total_rows) are combined. Already-unique
columns are skipped, since they'd make any combo trivially unique too.
Output¶
A five-column table:
| Column | Meaning |
|---|---|
scope |
single for one column, combo for a pair / triple. |
columns |
Column name (singles) or +-joined names (combos). |
distinct_count |
Number of distinct keys observed. |
null_count |
Single columns only; empty for combos. |
is_unique |
Boolean. |
Examples¶
Single columns¶
$ octa --unique-columns users.csv
scope columns distinct_count null_count is_unique
single id 10000 0 true
single email 9998 2 false
single region 5 0 false
email has 9998 distinct values across 10000 rows AND two nulls, so it is
not a PK candidate.
Test pairs too¶
$ octa --unique-columns orders.parquet --max-combo 2
scope columns distinct_count null_count is_unique
single user_id 4231 0 false
single ordered_at 9874 0 false
single payment_method 3 0 false
combo user_id + ordered_at 10000 true
The (user_id, ordered_at) pair is unique → that's the composite key.
JSON for downstream tooling¶
Performance notes¶
--max-combo 3 can be slow on wide tables (C(50, 3) ≈ 19,600
combinations to test). Start with --max-combo 2 and only escalate
if no pair turns out to be unique.
See also¶
- MCP
unique_columns: same feature over MCP. octa --sql:SELECT COUNT(*), COUNT(DISTINCT col) FROM datais the manual equivalent for one column.