Skip to content

Octa

sample

thorstenfoltz/octa

`sample`¶

Read a tabular data file and return a random N-row sample. Sampling is without replacement, the chosen rows keep their original order, and the draw is reproducible for a given seed. Same response shape as read_table.

When to use¶

To give Claude a representative slice of a large table without the bias of always taking the first N rows (which over-represents whatever the file happens to be sorted by).

Input schema¶

Parameter	Type	Required?	Default	Description
`path`	string	yes	(no default)	Absolute or working-directory-relative path to the file
`limit`	int	no	server default (1000)	Sample size. `0` = every row (no sampling)
`seed`	int	no	`0`	RNG seed. Same seed + file = same sample
`table`	string	no	(no default)	Specific table to read for multi-table sources
`unlimited`	bool	no	`false`	Lift the 5,000,000-row file-loader cap so the sample sees every row

Response shape¶

Identical to read_table: { schema, rows, row_count, truncated, total_rows_available, cell_truncated }. The rows are the sampled rows, in original order.

Notes¶

For streaming formats the sample is drawn from the rows within the 5 M-row cap; pass unlimited: true to sample from the whole file.
A fixed seed makes repeated calls deterministic, which is handy for reproducible analysis.

Example call¶

{
  "name": "sample",
  "arguments": { "path": "/tmp/events.parquet", "limit": 100, "seed": 7 }
}

See also¶

read_table / tail.
CLI octa --sample.