schema¶
Return the column schema of a file: names + data types only, with
no rows in the response. Cheap discovery step before deciding
whether to call read_table or
run_sql.
When to use¶
- Answering "what columns does this file have?"
- Probing an unknown file before deciding what to do with it.
- Building a SQL query, since knowing the column types up-front lets Claude pick the right operators.
Input schema¶
| Parameter | Type | Required? | Default | Description |
|---|---|---|---|---|
path |
string | yes | (no default) | Path to the file |
table |
string | no | (no default) | Specific table for multi-table sources |
Response shape¶
The schema entries are typed in the same Arrow-derived vocabulary
read_table uses: Int64, Float64, Utf8,
Boolean, Date32, Timestamp(Microsecond, None), Binary, etc.
Example calls¶
Basic schema lookup¶
Response:
{
"columns": [
{ "name": "region", "type": "Utf8" },
{ "name": "quarter", "type": "Utf8" },
{ "name": "amount", "type": "Float64" },
{ "name": "order_id", "type": "Int64" }
],
"column_count": 4
}
Schema of a specific table¶
Performance¶
schema shares its load path with read_table: it
routes through the same FormatRegistry::reader_for_path and reads
the file into a DataTable, then projects only the column metadata
into the response. Compared to read_table it saves the
serialisation cost of every row, not the load cost.
- For streaming formats (Parquet, CSV, TSV) the server's initial-load cap is applied at load time; even a multi-GB Parquet is usually sub-second.
- For non-streaming formats (Excel, JSON, Stata, etc.) the
whole file is loaded. On small files this is fast; on multi-GB
Excel workbooks
schemais no cheaper thanread_tablewithlimit: 1.
See also¶
read_tablereturns schema + rows together.list_tablesreturns every table's schema at once for multi-table databases.