octa --sql¶
Run a DuckDB SQL query against a file and print the result.
The file is loaded once, registered as a temp table called data,
and the query runs against it.
Extra tables can be loaded into the same query with --sql-table and
whole DuckDB / SQLite databases can be ATTACH-ed with --sql-attach,
so the same invocation can JOIN across formats. The SELECT result can
also be written back to a DuckDB or SQLite file via --sql-write-to.
Synopsis¶
octa --sql FILE -q '<query>' [-f tsv|json|csv]
[--sql-table NAME=PATH ...]
[--sql-attach ALIAS=PATH ...]
[--sql-write-to PATH --sql-write-table TABLE
[--sql-write-schema SCHEMA] [--sql-write-mode create|append|replace]]
| Flag | Required | Meaning |
|---|---|---|
--sql FILE |
yes | The primary file to query, registered as data. |
-q QUERY, --query QUERY |
yes | The SQL query string. Always reference the primary file as data. |
-f, --format |
no | Output format (default tsv). Ignored when --sql-write-to is set. |
--sql-table NAME=PATH |
no | Register an extra table from any supported file under the SQL name NAME. Repeatable. |
--sql-attach ALIAS=PATH |
no | ATTACH a DuckDB or SQLite database under ALIAS. Repeatable. Every inner table becomes queryable as ALIAS.schema.tbl (DuckDB) or ALIAS.tbl (SQLite). |
--sql-write-to PATH |
no | Persist the SELECT result to this DuckDB or SQLite file. Created if missing. |
--sql-write-table TABLE |
with --sql-write-to |
Target table name. |
--sql-write-schema SCHEMA |
no | Target schema (DuckDB only; defaults to main). Pass for SQLite only as main or leave unset. |
--sql-write-mode MODE |
no | create (default; errors if the table exists), replace (drop + recreate), or append (INSERT into existing). |
Examples¶
Basic select¶
$ octa --sql sales.parquet -q 'SELECT region, SUM(amount) AS total FROM data GROUP BY region'
region total
EU 4856732.50
US 9201443.75
APAC 3219008.00
Filter + limit¶
Describe a file's schema¶
$ octa --sql data.parquet -q 'DESCRIBE data'
column_name column_type null ...
region VARCHAR YES
quarter VARCHAR YES
amount DOUBLE YES
order_id BIGINT NO
Top-N¶
$ octa --sql users.parquet -q '
SELECT country, COUNT(*) AS n
FROM data
GROUP BY country
ORDER BY n DESC
LIMIT 5
'
country n
US 483291
DE 119008
UK 104223
FR 87502
JP 76140
Window functions (DuckDB's full SQL)¶
$ octa --sql events.parquet -q '
SELECT user_id,
event,
timestamp,
LAG(event) OVER (PARTITION BY user_id ORDER BY timestamp) AS prev_event
FROM data
'
JSON output for downstream tooling¶
$ octa --sql data.parquet -q 'SELECT * FROM data LIMIT 3' -f json
[
{ "id": 1, "name": "Alice", "active": true },
{ "id": 2, "name": "Bob", "active": false },
{ "id": 3, "name": "Charlie", "active": true }
]
Piping into jq or another tool works seamlessly:
Multi-table JOIN¶
--sql-table NAME=PATH registers an extra file into the same DuckDB
connection that hosts data. The file is loaded via the standard
format registry, so any supported format works (CSV, Parquet, JSON,
Excel, SQLite, ...). The flag is repeatable; each pair gets its own
entry in the workspace.
$ octa --sql sales.parquet \
--sql-table customers=customers.csv \
-q '
SELECT c.name, SUM(d.amount) AS total
FROM data d
JOIN customers c ON d.cid = c.cid
GROUP BY c.name
ORDER BY total DESC
'
name total
Carol 300
Bob 200
Alice 150
Several --sql-table flags can be combined:
octa --sql sales.parquet \
--sql-table customers=customers.csv \
--sql-table regions=regions.json \
-q '
SELECT r.region_name, c.name, SUM(d.amount) AS total
FROM data d
JOIN customers c ON d.cid = c.cid
JOIN regions r ON d.region_id = r.id
GROUP BY r.region_name, c.name
'
The chosen NAME is sanitised to a valid SQL identifier (lowercased,
non-alphanumeric characters replaced with _). For multi-table sources
prefer --sql-attach so every inner table is reachable.
Attaching whole databases¶
--sql-attach ALIAS=PATH runs DuckDB's ATTACH against the target
file. Every table inside is queryable as ALIAS.schema.tbl (DuckDB)
or ALIAS.tbl (SQLite via the DuckDB sqlite extension when bundled;
otherwise the workspace falls back to per-table loading under names
like ALIAS__table).
$ octa --sql sales.parquet \
--sql-attach wh=warehouse.duckdb \
-q '
SELECT p.name, SUM(d.amount) AS total
FROM data d
JOIN wh.main.products p ON d.cid = p.cid
GROUP BY p.name
'
The extension .duckdb / .ddb selects DuckDB; everything else
(.sqlite, .db, .sqlite3) selects SQLite.
Write-back to DuckDB / SQLite¶
--sql-write-to PATH persists the SELECT result instead of printing
it. Pair it with --sql-write-table, optionally --sql-write-schema
(DuckDB only), and --sql-write-mode. The target file is created if
missing.
# Create a fresh table in a new schema.
$ octa --sql sales.parquet -q '
SELECT region, SUM(amount) AS total
FROM data
GROUP BY region
ORDER BY region
' --sql-write-to analytics.duckdb \
--sql-write-schema reports \
--sql-write-table q4_summary
wrote 2 row(s) to analytics.duckdb | reports.q4_summary
Write modes:
| Mode | Behaviour |
|---|---|
create |
Default. Errors if the target table already exists. |
replace |
DROP TABLE IF EXISTS followed by CREATE TABLE. |
append |
INSERT INTO an existing table. Column count and order must match the SELECT shape. |
SQLite has no schemas, so --sql-write-schema must be omitted (or
explicitly main). The write goes through rusqlite directly, so it
works even on builds where the DuckDB sqlite extension isn't bundled.
# Append into an existing SQLite table.
octa --sql data.csv -q 'SELECT * FROM data WHERE active=1' \
--sql-write-to users.sqlite \
--sql-write-table active_users \
--sql-write-mode append
The success line is printed to stdout:
For SQLite the <schema>. prefix is omitted. On success, nothing else
is written to stdout, so CI pipelines can compare against an expected
string. Errors print to stderr with a non-zero exit code.
Mutations¶
INSERT / UPDATE / DELETE are accepted but do not persist
back to disk (same as in the GUI if you don't save), since the in-memory
DuckDB connection is discarded at the end of the run. After a mutation,
the post-mutation contents of data are printed so you can pipe them through
--convert if you want to persist:
# Filter rows and write the result to a new Parquet file
octa --sql in.csv -q 'SELECT * FROM data WHERE region = "EU"' -f csv > eu.csv
octa --convert eu.csv eu.parquet
A "rows affected" count is written to stderr for mutations:
How files become data¶
Octa's standard reader produces a DataTable. The CLI registers
that as a DuckDB temp table named data in a fresh in-memory
connection, then runs your query. Extra --sql-table entries land
as additional temp tables in the same connection, and --sql-attach
runs ATTACH on it, so JOINs across all of them are plain DuckDB
work. The connection is single-use; the next octa --sql
invocation starts from scratch.
This is the same execution path as the GUI's SQL view, so:
- Column types match DuckDB's type system:
BIGINT,DOUBLE,VARCHAR,TIMESTAMP,BLOB, etc. - Quoted identifiers (
"some column") work for column names with spaces.
Performance¶
- For streaming formats (Parquet, CSV, TSV), the initial-load row
cap (5,000,000 rows by default) is applied before the query runs.
octa --sql huge.parquet -q 'SELECT count(*) FROM data'counts the cap, not the full file. Override per-call with--rows N|all:
# Raise the cap to 10 million for this query
octa --sql huge.parquet -q 'SELECT count(*) FROM data' --rows 10,000,000
# Read every row in the file
octa --sql huge.parquet -q 'SELECT count(*) FROM data' --rows all
- Parquet files with very many row groups (> 32,767, common with Spark / streaming ingest) fall back to a DuckDB-backed reader automatically, so they open without manual recompaction.
- DuckDB itself is highly optimised, so queries on millions of rows typically run in under a second.
Notes¶
- The query is case-insensitive on keywords but case-sensitive
on identifiers (column names). Use double quotes for names with
unusual casing:
"MyColumn". - Errors (parse errors, type mismatches) print to stderr with a
DuckDB-formatted error message. The exit code is
1. EXPLAIN/EXPLAIN ANALYZEwork and print the DuckDB plan.
See also¶
- SQL panel is the same engine inside the GUI.
octa --convertwrites the result to a different format.- MCP
run_sqltool is the same query path via MCP.