count_rows¶
Count rows in a tabular file. Loads the table and reports the row count.
When to use¶
- "How big is this file?" prompts.
- Sanity check before deciding whether to call
read_tablewith the default cap (1000) or withlimit: 0.
Input schema¶
| Parameter | Type | Required? | Default | Description |
|---|---|---|---|---|
path |
string | yes | (no default) | Path to the file |
table |
string | no | (no default) | Specific table for multi-table sources |
unlimited |
bool | no | false |
Lift the 5,000,000-row file-loader cap so the count reflects every row in the file |
Response shape¶
initial_load_capped¶
For streaming formats (Parquet, CSV, TSV), Octa applies an
initial-load row cap (default 5,000,000) at load time, after
which read_table and friends stop pulling more rows.
count_rows works on the same loaded table, so on those streaming
formats it counts the loaded rows, not necessarily every row in
the source file. When the loaded count hits the cap, this flag is
true and initial_load_cap echoes the current cap so the model
knows the count is an underestimate. Pass unlimited: true to
disable the cap for this call and get the true total.
For non-streaming formats (Excel, SQLite, JSON, etc.), the whole
file is loaded so the count is exact and initial_load_capped is
false.
Example calls¶
Count an Excel file's rows¶
Response (small file, exact count):
Count rows in a SQLite table¶
Response:
(SQLite is non-streaming; even though the count exceeds the cap, the cap doesn't apply here.)
Count a huge Parquet file (cap applied)¶
Response (cap was hit):
A model seeing initial_load_capped: true should mention to the
user that the count is an underestimate, and offer to re-call with
unlimited: true:
Response (cap lifted; whole file read):
Note that run_sql with SELECT count(*) FROM data
is subject to the same initial-load cap unless it is also called
with unlimited: true. Parquet files with very many row groups
(> 32,767) fall back to a DuckDB-backed reader automatically and
open without manual recompaction.
Why a dedicated tool¶
count_rows exists separately from read_table because:
- The response is small (~50 bytes) regardless of file size.
- It surfaces the streaming cap, which is invisible from a
read_tableresponse. - Some models prefer a dedicated tool for "how many rows" over
parsing a
read_tableresponse.
See also¶
run_sqlis also subject to the initial-load cap; useful for aggregation but not a workaround for the cap.- Limits & truncation covers the same initial-load cap that affects this tool.