`schema`¶

Return the column schema of a file: names + data types only, with no rows in the response. Cheap discovery step before deciding whether to call read_table or run_sql.

When to use¶

Answering "what columns does this file have?"
Probing an unknown file before deciding what to do with it.
Building a SQL query, since knowing the column types up-front lets Claude pick the right operators.

Input schema¶

Parameter	Type	Required?	Default	Description
`path`	string	yes	(no default)	Path to the file
`table`	string	no	(no default)	Specific table for multi-table sources

Response shape¶

{
  "columns": [
    { "name": "<column>", "type": "<arrow_type>" },
    …
  ],
  "column_count": <n>
}

The schema entries are typed in the same Arrow-derived vocabulary read_table uses: Int64, Float64, Utf8, Boolean, Date32, Timestamp(Microsecond, None), Binary, etc.

Example calls¶

Basic schema lookup¶

{
  "name": "schema",
  "arguments": { "path": "/tmp/sales.parquet" }
}

Response:

{
  "columns": [
    { "name": "region", "type": "Utf8" },
    { "name": "quarter", "type": "Utf8" },
    { "name": "amount", "type": "Float64" },
    { "name": "order_id", "type": "Int64" }
  ],
  "column_count": 4
}

Schema of a specific table¶

{
  "name": "schema",
  "arguments": {
    "path": "/data/app.sqlite",
    "table": "users"
  }
}

Performance¶

schema shares its load path with read_table: it routes through the same FormatRegistry::reader_for_path and reads the file into a DataTable, then projects only the column metadata into the response. Compared to read_table it saves the serialisation cost of every row, not the load cost.

For streaming formats (Parquet, CSV, TSV) the server's initial-load cap is applied at load time; even a multi-GB Parquet is usually sub-second.
For non-streaming formats (Excel, JSON, Stata, etc.) the whole file is loaded. On small files this is fast; on multi-GB Excel workbooks schema is no cheaper than read_table with limit: 1.

schema¶