Skip to content

run_sql

Run a DuckDB SQL query against one or more files. The primary file is loaded and registered as a temp table called data; your query runs against it.

Additional files can be registered as workspace tables via extra_tables and whole DuckDB / SQLite databases can be ATTACH-ed via attach, so the same call can JOIN across formats. The SELECT result can also be written back to a DuckDB or SQLite file via write_to instead of being returned.

When to use

  • Filtering ("rows where amount > 1000").
  • Aggregations ("total amount per region", "median order size").
  • Window functions, joins to other DuckDB-accessible sources, etc.
  • Multi-source JOINs across formats (Parquet + CSV + SQLite, etc.).
  • Persisting the result back into a DuckDB schema or a SQLite table.
  • Most "find me X in this file" prompts.

For raw schema discovery, schema is cheaper. For "read every row" with no filtering, read_table is cleaner, since the SQL overhead isn't free.

Input schema

Parameter Type Required? Default Description
path string yes (no default) Path to the primary file (registered as data).
query string yes (no default) SQL query. Always reference the primary file as data.
limit int no server default (1000) Maximum rows to return in the response. 0 = unlimited.
table string no (no default) For multi-table sources, the specific table to expose as data.
unlimited bool no false Lift the 5,000,000-row file-loader cap so the query sees every row in every loaded file. Combine with limit: 0 for full output.
extra_tables object[] no [] Additional tables to register before the query. Each entry: { "name": "...", "path": "...", "table": null }. name is sanitised to a valid SQL identifier.
attach object[] no [] Databases to ATTACH throughout the call. Each entry: { "alias": "...", "path": "..." }. .duckdb / .ddb selects DuckDB; everything else (.sqlite, .db) SQLite.
write_to object no (none) When set, the SELECT result is written to a DuckDB or SQLite file instead of being returned. See Write-back below.

Response shape

For SELECT queries:

{
  "kind": "select",
  "result": {
    "schema": [{ "name": "<col>", "type": "Utf8" }, ],
    "rows": [[<v>, <v>, ], ],
    "row_count": <n>,
    "truncated": <bool>,
    "total_rows_available": <n>,
    "cell_truncated": <bool>
  }
}

For mutation queries (INSERT / UPDATE / DELETE):

{
  "kind": "mutation",
  "affected": <n>,
  "result": {
    "schema": [...],
    "rows": [...],   // post-mutation contents of `data`
    ...
  }
}

For calls with write_to:

{
  "kind": "write_back",
  "rows_written": <n>,
  "created_schema": <bool>,
  "target": "/path/to/db.duckdb | schema.table"
}

Mutations don't persist

The in-memory DuckDB connection is created fresh per call and discarded at the end. Mutations don't write back to the file, and the mutated state isn't visible to any subsequent tool call: convert, read_table, and follow-up run_sql calls all re-read the original file from disk. The post-mutation rows are returned only so you can inspect the effect of the query.

Practical takeaway: treat run_sql as read-only. Use mutations only for "what would the result look like if I did this" probes.

Example calls

Filter rows

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/users.parquet",
    "query": "SELECT email, country FROM data WHERE active = true LIMIT 50"
  }
}

Aggregation

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/sales.csv",
    "query": "SELECT region, SUM(amount) AS total FROM data GROUP BY region ORDER BY total DESC"
  }
}

Count rows on a Parquet file

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/events-2024.parquet",
    "query": "SELECT count(*) FROM data"
  }
}

Note: data is the in-memory snapshot Octa loaded through its streaming reader, which honours the initial-load cap. On files that exceed the cap, SELECT count(*) FROM data returns the loaded count, not the file count, the same limitation as count_rows. Pass unlimited: true to lift the cap for this call so the query operates on every row:

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/events-2024.parquet",
    "query": "SELECT count(*) FROM data",
    "unlimited": true
  }
}

Parquet files with very many row groups (> 32,767) fall back to a DuckDB-backed reader automatically.

Querying a specific table in a database

{
  "name": "run_sql",
  "arguments": {
    "path": "/data/app.sqlite",
    "table": "users",
    "query": "SELECT country, COUNT(*) AS n FROM data GROUP BY country ORDER BY n DESC LIMIT 10"
  }
}

JOIN across formats with extra_tables

Register one or more extra files into the same DuckDB connection that hosts data, then JOIN them in the query. The chosen name is sanitised to a valid SQL identifier (lowercased, non-alphanumeric characters replaced with _).

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/sales.parquet",
    "extra_tables": [
      { "name": "customers", "path": "/tmp/customers.csv" }
    ],
    "query": "SELECT c.name, SUM(s.amount) AS total FROM data s JOIN customers c ON s.cid = c.cid GROUP BY c.name ORDER BY total DESC"
  }
}

For multi-table sources (SQLite / DuckDB / Excel / ODS), set extra_tables[*].table to pick the inner table, or use attach instead so every inner table is reachable in one shot.

ATTACH a database with attach

Each attach entry runs DuckDB's ATTACH against the target file. Every table inside is queryable as alias.schema.tbl (DuckDB) or alias.tbl (SQLite via the DuckDB sqlite extension when bundled). For SQLite builds without the extension, the workspace falls back to per-table loading under names like alias__table.

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/sales.parquet",
    "attach": [
      { "alias": "wh", "path": "/tmp/warehouse.duckdb" }
    ],
    "query": "SELECT count(*) FROM data d JOIN wh.main.products p ON d.cid = p.cid"
  }
}

Write-back

write_to persists the SELECT result to a DuckDB or SQLite file instead of returning rows. The target file is created if missing.

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/sales.parquet",
    "query": "SELECT region, SUM(amount) AS total FROM data GROUP BY region",
    "write_to": {
      "path": "/tmp/analytics.duckdb",
      "schema": "reports",
      "table": "q4_summary",
      "mode": "create",
      "create_schema_if_missing": true
    }
  }
}

Fields:

Field Type Required? Default Meaning
path string yes - Target DuckDB or SQLite file. Extension picks the kind (.duckdb / .ddb → DuckDB; everything else → SQLite).
table string yes - Target table name.
schema string no null Target schema (DuckDB only). null writes to main. SQLite has no schemas; pass null or "main".
mode string no create create (errors if table exists), replace (drop + recreate), or append (INSERT into existing).
create_schema_if_missing bool no false When true and the target schema doesn't exist, create it (DuckDB only; ignored for SQLite).

SQLite writes go through rusqlite directly, so they work even on builds where the DuckDB sqlite extension isn't bundled.

DESCRIBE / EXPLAIN

DuckDB's DESCRIBE is useful for schema discovery from inside SQL:

{
  "name": "run_sql",
  "arguments": {
    "path": "/tmp/messy.parquet",
    "query": "DESCRIBE data"
  }
}

What's available

The full DuckDB SQL surface (as exposed by the bundled DuckDB library):

  • All standard SQL features (SELECT / FROM / WHERE / GROUP BY / HAVING / ORDER BY / LIMIT).
  • Common Table Expressions (CTEs / WITH … AS).
  • Window functions (ROW_NUMBER(), RANK(), LAG(), percentiles).
  • JSON functions (json_extract, unnest, …).
  • Date / time / string functions.
  • Regex functions.
  • PIVOT / UNPIVOT.
  • DESCRIBE / EXPLAIN.

Notes

  • Identifiers with spaces or special characters need to be double-quoted: "My Column". Octa's column registration uses this convention to keep weird names round-tripping cleanly.
  • Errors come back as invalid_params with a DuckDB-formatted message.
  • A query that returns zero rows is reported with row_count: 0 and an empty rows array, not an error.

See also