describe_file¶
One-shot orientation snapshot of a tabular file: format, file size,
row count, column schema, and a small sample of rows. Replaces the
common three-call dance of list_tables → schema → read_table
when first meeting an unfamiliar file.
CLI mirror: octa --describe.
When to use¶
- The very first call when handed an unfamiliar file.
- Anywhere a quick "what is this?" answer is more useful than the full schema or row data.
- As a lighter-weight alternative to
profilewhen statistics aren't needed.
Input schema¶
| Parameter | Type | Required? | Default | Description |
|---|---|---|---|---|
path |
string | yes | (no default) | Path to the file. |
table |
string | no | (no default) | Specific table for multi-table sources. |
sample_rows |
integer | no | 5 |
Sample-row count. Clamped to [0, 100]. |
unlimited |
boolean | no | false |
Lift the 5,000,000-row file-loader cap so the row count is exact. |
For multi-table sources called without table, the reader's default
behaviour applies, so call list_tables first if you're unsure.
Response shape¶
{
"path": "/data/sales.parquet",
"format_name": "Parquet",
"file_size_bytes": 1048576,
"table": null,
"row_count": 47832,
"initial_load_capped": false,
"initial_load_cap": 5000000,
"columns": [
{ "name": "id", "type": "Int64" },
{ "name": "region", "type": "Utf8" },
{ "name": "amount", "type": "Float64" }
],
"column_count": 3,
"sample_rows": [
[1, "EU", 1234.5],
[2, "US", 9876.0]
],
"sample_row_count": 2,
"cell_truncated": false
}
Sample rows obey the server's per-cell byte cap; any cell that
overflows is replaced with a [truncated: N bytes; cap M bytes ...]
marker and cell_truncated flips to true.
Example call¶
Multi-table source:
Force an exact row count on a very large file:
{
"name": "describe_file",
"arguments": {
"path": "/data/huge.parquet",
"unlimited": true,
"sample_rows": 0
}
}
See also¶
schema,count_rows,read_table: the three callsdescribe_filecollapses.profile: the heavier-weight cousin, returns per-column statistics via DuckDB SUMMARIZE.octa --describe, the CLI mirror.