run_sql¶
Run a DuckDB SQL query against one or more files. The primary file
is loaded and registered as a temp table called data; your query
runs against it.
Additional files can be registered as workspace tables via
extra_tables and whole DuckDB / SQLite databases can be ATTACH-ed
via attach, so the same call can JOIN across formats. The SELECT
result can also be written back to a DuckDB or SQLite file via
write_to instead of being returned.
When to use¶
- Filtering ("rows where amount > 1000").
- Aggregations ("total amount per region", "median order size").
- Window functions, joins to other DuckDB-accessible sources, etc.
- Multi-source JOINs across formats (Parquet + CSV + SQLite, etc.).
- Persisting the result back into a DuckDB schema or a SQLite table.
- Most "find me X in this file" prompts.
For raw schema discovery, schema is cheaper. For
"read every row" with no filtering,
read_table is cleaner, since the SQL overhead
isn't free.
Input schema¶
| Parameter | Type | Required? | Default | Description |
|---|---|---|---|---|
path |
string | yes | (no default) | Path to the primary file (registered as data). |
query |
string | yes | (no default) | SQL query. Always reference the primary file as data. |
limit |
int | no | server default (1000) | Maximum rows to return in the response. 0 = unlimited. |
table |
string | no | (no default) | For multi-table sources, the specific table to expose as data. |
unlimited |
bool | no | false |
Lift the 5,000,000-row file-loader cap so the query sees every row in every loaded file. Combine with limit: 0 for full output. |
extra_tables |
object[] | no | [] |
Additional tables to register before the query. Each entry: { "name": "...", "path": "...", "table": null }. name is sanitised to a valid SQL identifier. |
attach |
object[] | no | [] |
Databases to ATTACH throughout the call. Each entry: { "alias": "...", "path": "..." }. .duckdb / .ddb selects DuckDB; everything else (.sqlite, .db) SQLite. |
write_to |
object | no | (none) | When set, the SELECT result is written to a DuckDB or SQLite file instead of being returned. See Write-back below. |
Response shape¶
For SELECT queries:
{
"kind": "select",
"result": {
"schema": [{ "name": "<col>", "type": "Utf8" }, …],
"rows": [[<v>, <v>, …], …],
"row_count": <n>,
"truncated": <bool>,
"total_rows_available": <n>,
"cell_truncated": <bool>
}
}
For mutation queries (INSERT / UPDATE / DELETE):
{
"kind": "mutation",
"affected": <n>,
"result": {
"schema": [...],
"rows": [...], // post-mutation contents of `data`
...
}
}
For calls with write_to:
{
"kind": "write_back",
"rows_written": <n>,
"created_schema": <bool>,
"target": "/path/to/db.duckdb | schema.table"
}
Mutations don't persist
The in-memory DuckDB connection is created fresh per call and
discarded at the end. Mutations don't write back to the file,
and the mutated state isn't visible to any subsequent tool call:
convert, read_table, and
follow-up run_sql calls all re-read the original file from disk.
The post-mutation rows are returned only so you can inspect the
effect of the query.
Practical takeaway: treat run_sql as read-only. Use mutations
only for "what would the result look like if I did this" probes.
Example calls¶
Filter rows¶
{
"name": "run_sql",
"arguments": {
"path": "/tmp/users.parquet",
"query": "SELECT email, country FROM data WHERE active = true LIMIT 50"
}
}
Aggregation¶
{
"name": "run_sql",
"arguments": {
"path": "/tmp/sales.csv",
"query": "SELECT region, SUM(amount) AS total FROM data GROUP BY region ORDER BY total DESC"
}
}
Count rows on a Parquet file¶
{
"name": "run_sql",
"arguments": {
"path": "/tmp/events-2024.parquet",
"query": "SELECT count(*) FROM data"
}
}
Note: data is the in-memory snapshot Octa loaded through its
streaming reader, which honours the initial-load cap. On files that exceed the cap,
SELECT count(*) FROM data returns the loaded count, not the
file count, the same limitation as count_rows.
Pass unlimited: true to lift the cap for this call so the query
operates on every row:
{
"name": "run_sql",
"arguments": {
"path": "/tmp/events-2024.parquet",
"query": "SELECT count(*) FROM data",
"unlimited": true
}
}
Parquet files with very many row groups (> 32,767) fall back to a DuckDB-backed reader automatically.
Querying a specific table in a database¶
{
"name": "run_sql",
"arguments": {
"path": "/data/app.sqlite",
"table": "users",
"query": "SELECT country, COUNT(*) AS n FROM data GROUP BY country ORDER BY n DESC LIMIT 10"
}
}
JOIN across formats with extra_tables¶
Register one or more extra files into the same DuckDB connection
that hosts data, then JOIN them in the query. The chosen name
is sanitised to a valid SQL identifier (lowercased, non-alphanumeric
characters replaced with _).
{
"name": "run_sql",
"arguments": {
"path": "/tmp/sales.parquet",
"extra_tables": [
{ "name": "customers", "path": "/tmp/customers.csv" }
],
"query": "SELECT c.name, SUM(s.amount) AS total FROM data s JOIN customers c ON s.cid = c.cid GROUP BY c.name ORDER BY total DESC"
}
}
For multi-table sources (SQLite / DuckDB / Excel / ODS), set
extra_tables[*].table to pick the inner table, or use attach
instead so every inner table is reachable in one shot.
ATTACH a database with attach¶
Each attach entry runs DuckDB's ATTACH against the target file.
Every table inside is queryable as alias.schema.tbl (DuckDB) or
alias.tbl (SQLite via the DuckDB sqlite extension when bundled).
For SQLite builds without the extension, the workspace falls back to
per-table loading under names like alias__table.
{
"name": "run_sql",
"arguments": {
"path": "/tmp/sales.parquet",
"attach": [
{ "alias": "wh", "path": "/tmp/warehouse.duckdb" }
],
"query": "SELECT count(*) FROM data d JOIN wh.main.products p ON d.cid = p.cid"
}
}
Write-back¶
write_to persists the SELECT result to a DuckDB or SQLite file
instead of returning rows. The target file is created if missing.
{
"name": "run_sql",
"arguments": {
"path": "/tmp/sales.parquet",
"query": "SELECT region, SUM(amount) AS total FROM data GROUP BY region",
"write_to": {
"path": "/tmp/analytics.duckdb",
"schema": "reports",
"table": "q4_summary",
"mode": "create",
"create_schema_if_missing": true
}
}
}
Fields:
| Field | Type | Required? | Default | Meaning |
|---|---|---|---|---|
path |
string | yes | - | Target DuckDB or SQLite file. Extension picks the kind (.duckdb / .ddb → DuckDB; everything else → SQLite). |
table |
string | yes | - | Target table name. |
schema |
string | no | null |
Target schema (DuckDB only). null writes to main. SQLite has no schemas; pass null or "main". |
mode |
string | no | create |
create (errors if table exists), replace (drop + recreate), or append (INSERT into existing). |
create_schema_if_missing |
bool | no | false |
When true and the target schema doesn't exist, create it (DuckDB only; ignored for SQLite). |
SQLite writes go through rusqlite directly, so they work even on
builds where the DuckDB sqlite extension isn't bundled.
DESCRIBE / EXPLAIN¶
DuckDB's DESCRIBE is useful for schema discovery from inside SQL:
What's available¶
The full DuckDB SQL surface (as exposed by the bundled DuckDB library):
- All standard SQL features (SELECT / FROM / WHERE / GROUP BY / HAVING / ORDER BY / LIMIT).
- Common Table Expressions (CTEs /
WITH … AS). - Window functions (
ROW_NUMBER(),RANK(),LAG(), percentiles). - JSON functions (
json_extract,unnest, …). - Date / time / string functions.
- Regex functions.
- PIVOT / UNPIVOT.
- DESCRIBE / EXPLAIN.
Notes¶
- Identifiers with spaces or special characters need to be
double-quoted:
"My Column". Octa's column registration uses this convention to keep weird names round-tripping cleanly. - Errors come back as
invalid_paramswith a DuckDB-formatted message. - A query that returns zero rows is reported with
row_count: 0and an emptyrowsarray, not an error.
See also¶
- Limits & truncation covers
truncated/total_rows_availablesemantics. read_tableis for when you want everything, no filtering.- Examples has SQL-heavy worked prompts.