DQL — Queries

The SQL Gateway provides a full Data Query Language on top of Elasticsearch, translating SQL into search, aggregation, and scroll APIs.

SELECT

SELECT [DISTINCT] expr1, expr2, ...
FROM table_name [alias]
[WHERE condition]
[GROUP BY expr1, expr2, ...]
[HAVING condition]
[ORDER BY expr1 [ASC|DESC] [NULLS FIRST|NULLS LAST], ...]
[LIMIT n]
[OFFSET m];

Nested Fields and Aliases

SELECT id,
       name AS full_name,
       profile.city AS city,
       profile.followers AS followers
FROM dql_users
ORDER BY id ASC;

profile is a STRUCT column
Dot notation accesses nested fields
Aliases (AS) are returned as column names

WHERE

Supports comparison operators (=, !=, <, <=, >, >=), logical operators (AND, OR, NOT), IN, NOT IN, BETWEEN, IS NULL, IS NOT NULL, LIKE, RLIKE (regex), and conditions on nested fields.

SELECT id, name, age
FROM dql_users
WHERE (age > 20 AND profile.followers >= 100)
   OR (profile.city = 'Lyon' AND age < 50)
ORDER BY age DESC;

SELECT id, age + 10 AS age_plus_10, name
FROM dql_users
WHERE age BETWEEN 20 AND 50
  AND name IN ('Alice', 'Bob', 'Chloe')
  AND name IS NOT NULL
  AND (name LIKE 'A%' OR name RLIKE '.*o.*');

ORDER BY

Supports multiple sort keys, ASC/DESC, NULLS FIRST / NULLS LAST, expressions, and nested fields.

SELECT id, name, age
FROM dql_users
ORDER BY age DESC, name ASC
LIMIT 2 OFFSET 1;

NULLS FIRST / NULLS LAST

Each sort key may declare where NULL values appear:

SELECT id, name, bonus
FROM dql_users
ORDER BY bonus DESC NULLS LAST;

Mapped to Elasticsearch’s sort.missing parameter:

NULLS FIRST → "missing": "_first"
NULLS LAST → "missing": "_last"

When omitted, defaults follow the Elasticsearch convention: ASC → nulls last, DESC → nulls first.

Different null orderings can be combined within a single query:

SELECT id, name, bonus, hire_date
FROM dql_users
ORDER BY bonus DESC NULLS LAST, hire_date ASC NULLS FIRST;

Caveat (ES6 Jest client): scroll / search_after queries in the ES6 Jest client do not propagate NULLS FIRST / NULLS LAST reliably across batches. For ES6 scroll/search_after, prefer client-side null-bucketing or upgrade to ES7+.

LIMIT / OFFSET

LIMIT n restricts returned rows
OFFSET m skips the first m rows
Translated to Elasticsearch from + size

UNION ALL

Combines results of multiple SELECT queries without removing duplicates. All SELECT statements must have the same number of columns with the same names.

SELECT id, name FROM dql_users WHERE age > 30
UNION ALL
SELECT id, name FROM dql_users WHERE age <= 30;

Executed using Elasticsearch Multi-Search (_msearch). ORDER BY and LIMIT apply per SELECT, not globally.

JOIN UNNEST

The Gateway supports JOIN UNNEST on ARRAY<STRUCT> columns.

SELECT
  o.id,
  items.product,
  items.quantity,
  SUM(items.price * items.quantity) OVER (PARTITION BY o.id) AS total_price
FROM dql_orders o
JOIN UNNEST(o.items) AS items
WHERE items.quantity >= 1
ORDER BY o.id ASC;

JOIN UNNEST produces one output row per array element, with parent fields duplicated — exactly like a standard SQL UNNEST. It supports expressions, filtering, and aggregations via window functions. Multi-level nesting is handled recursively.

Aggregations

Supported aggregate functions: COUNT(*), COUNT(expr), SUM, AVG, MIN, MAX, STDDEV / STDDEV_SAMP / STDDEV_POP, VARIANCE / VAR_SAMP / VAR_POP.

STDDEV defaults to sample standard deviation (STDDEV ≡ STDDEV_SAMP, Bessel-corrected) and VARIANCE defaults to sample variance (VARIANCE ≡ VAR_SAMP). This matches PostgreSQL and Snowflake; users coming from MySQL 5.5 or earlier should note that those releases defaulted STDDEV to population.

SELECT department,
       STDDEV(salary)   AS sd,
       VAR_POP(salary)  AS vp
FROM emp
GROUP BY department;

All six map to a single Elasticsearch extended_stats aggregation per call. Sample variants require Elasticsearch 7.7+; population variants work on Elasticsearch 6+.

Percentiles — `PERCENTILE_CONT` / `PERCENTILE_DISC`

-- p99 request latency per endpoint (SRE latency analysis)
SELECT endpoint,
       PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99
FROM requests
GROUP BY endpoint;

Forms accepted (for both PERCENTILE_CONT and PERCENTILE_DISC):

PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY column) — ANSI ordered-set aggregate (optionally with a top-level GROUP BY)
PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY column) OVER (PARTITION BY ...) — value column from WITHIN GROUP, partition from OVER
PERCENTILE_CONT(p) OVER (PARTITION BY ... ORDER BY column) — value column from the OVER ORDER BY
PERCENTILE_CONT(column, p) — column-first shorthand (many BI tools emit it)

The percentile literal p is a value in [0, 1] (e.g. 0.99 for p99); out-of-range values are rejected at parse time. The value column comes from the ORDER BY clause or the shorthand’s first argument; grouping comes from OVER (PARTITION BY ...) or a top-level GROUP BY. Both functions map to the Elasticsearch percentiles aggregation (TDigest). Elasticsearch has no native discrete percentile, so PERCENTILE_DISC is continuous-backed — it returns the same interpolated value as PERCENTILE_CONT rather than the nearest actual data point. All forms work on Elasticsearch 6+.

GROUP BY and HAVING

SELECT profile.city AS city,
       COUNT(*) AS cnt,
       AVG(age) AS avg_age
FROM dql_users
GROUP BY profile.city
HAVING COUNT(*) >= 1
ORDER BY COUNT(*) DESC;

GROUP BY supports nested fields
HAVING filters groups based on aggregate conditions
Translated to Elasticsearch aggregations

Parent-Level Aggregations on Nested Arrays

Compute aggregations over nested arrays while keeping one row per parent document (the original nested array is preserved):

SELECT
  o.id,
  o.items,
  SUM(items.price * items.quantity) OVER (PARTITION BY o.id) AS total_price
FROM dql_orders o
JOIN UNNEST(o.items) AS items
WHERE items.quantity >= 1
ORDER BY o.id ASC;

Returns one row per parent with the original nested array preserved and the aggregated value added as a top-level field.

Window Functions

Window functions operate over a logical window defined by OVER (PARTITION BY ... ORDER BY ...).

Supported: SUM, COUNT, STDDEV / STDDEV_SAMP / STDDEV_POP, VARIANCE / VAR_SAMP / VAR_POP, PERCENTILE_CONT / PERCENTILE_DISC, FIRST_VALUE, LAST_VALUE, ARRAY_AGG, ROW_NUMBER, RANK, DENSE_RANK.

SELECT
  product,
  customer,
  amount,
  SUM(amount) OVER (PARTITION BY product) AS sum_per_product,
  COUNT(_id) OVER (PARTITION BY product) AS cnt_per_product,
  FIRST_VALUE(amount) OVER (PARTITION BY product ORDER BY ts ASC) AS first_amount,
  LAST_VALUE(amount) OVER (PARTITION BY product ORDER BY ts ASC) AS last_amount,
  ARRAY_AGG(amount) OVER (PARTITION BY product ORDER BY ts ASC LIMIT 10) AS amounts_array
FROM dql_sales
ORDER BY product, ts;

Ranking windows (`ROW_NUMBER`, `RANK`, `DENSE_RANK`)

ORDER BY is REQUIRED inside OVER for ranking functions (ANSI). PARTITION BY is optional — when absent, the entire result set is treated as one partition.

SELECT name, salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn,
  RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS r,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr
FROM emp;

Tie semantics:

ROW_NUMBER — sequential within partition; no ties (1, 2, 3, 4, …)
RANK — ties share rank, next rank skips (1, 2, 2, 4, …)
DENSE_RANK — ties share rank, next rank does NOT skip (1, 2, 2, 3, …)

Top-N per group: inline LIMIT N inside OVER to keep only the top-N rows per partition. The engine pushes N into the Elasticsearch top_hits.size parameter, so only the top-N rows per partition are materialised:

SELECT name, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC LIMIT 3) AS r
FROM emp;

Without an explicit LIMIT, top_hits.size defaults to 100 — the Elasticsearch index.max_inner_result_window default. For larger partitions either supply LIMIT N inline or raise the index setting.

Functions

Numeric & Trigonometric

Function	Description
`ABS(x)`	Absolute value
`CEIL(x)` / `CEILING(x)`	Round up
`FLOOR(x)`	Round down
`ROUND(x, n)`	Round to n decimals
`SQRT(x)`	Square root
`POW(x, y)` / `POWER(x, y)`	Power
`EXP(x)`	Exponential
`LOG(x)` / `LN(x)`	Natural logarithm
`LOG10(x)`	Base-10 logarithm
`SIGN(x)` / `SGN(x)`	Sign of x
`SIN(x)`, `COS(x)`, `TAN(x)`	Trigonometric
`ASIN(x)`, `ACOS(x)`, `ATAN(x)`	Inverse trigonometric
`ATAN2(y, x)`	Arc-tangent of y/x
`PI()`	Pi constant
`RADIANS(x)`, `DEGREES(x)`	Angle conversion

String

Function	Description
`CONCAT(a, b, ...)`	Concatenate strings
`SUBSTRING(str, start, len)`	Extract substring
`LOWER(str)` / `LCASE(str)`	Lowercase
`UPPER(str)` / `UCASE(str)`	Uppercase
`TRIM(str)`, `LTRIM(str)`, `RTRIM(str)`	Trim whitespace
`LENGTH(str)` / `LEN(str)`	String length
`REPLACE(str, from, to)`	Replace substring
`LEFT(str, n)`, `RIGHT(str, n)`	Left/right n chars
`REVERSE(str)`	Reverse string
`POSITION(substr IN str)` / `STRPOS(str, substr)`	Position of substring
`REGEXP_LIKE(str, pattern)`	Regex match
`MATCH(str) AGAINST (query)`	Full-text search

Date & Time

Current:

Function	Description
`CURRENT_DATE` / `TODAY()` / `CURDATE()`	Current date (UTC)
`CURRENT_TIMESTAMP` / `NOW()` / `CURRENT_DATETIME`	Current timestamp (UTC)
`CURRENT_TIME` / `CURTIME()`	Current time (UTC)

Extraction:

Function	Description
`YEAR(date)`, `MONTH(date)`, `DAY(date)`	Date components
`HOUR(ts)`, `MINUTE(ts)`, `SECOND(ts)`	Time components
`MILLISECOND(ts)`, `MICROSECOND(ts)`, `NANOSECOND(ts)`	Sub-second components
`EXTRACT(unit FROM date)`	Extract any date/time unit

Arithmetic:

Function	Description
`DATE_ADD(date, INTERVAL n unit)`	Add interval
`DATE_SUB(date, INTERVAL n unit)`	Subtract interval
`DATETIME_ADD(ts, INTERVAL n unit)`	Add interval to timestamp
`DATETIME_SUB(ts, INTERVAL n unit)`	Subtract from timestamp
`DATE_DIFF(date1, date2, unit)`	Difference in units
`DATE_TRUNC(date, unit)`	Truncate to unit

Formatting & Parsing:

Function	Description
`DATE_FORMAT(ts, pattern)`	Format date as string
`DATE_PARSE(str, pattern)`	Parse string into date
`DATETIME_FORMAT(ts, pattern)`	Format timestamp as string
`DATETIME_PARSE(str, pattern)`	Parse string into timestamp

Special:

Function	Description
`LAST_DAY(date)`	Last day of month
`EPOCHDAY(date)`	Days since epoch (1970-01-01)
`OFFSET_SECONDS(date)`	Epoch seconds

Geospatial

-- Create point
POINT(longitude, latitude)

-- Calculate distance
ST_DISTANCE(location, POINT(2.3522, 48.8566))

Supported distance units: km, m, cm, mm, mi, yd, ft, in, nmi.

Conditional

Function	Description
`CASE WHEN ... THEN ... ELSE ... END`	Conditional expression
`COALESCE(a, b, c)`	First non-null value
`NULLIF(a, b)`	NULL if a = b
`GREATEST(e1, e2, ...)`	Largest non-null numeric value (NULL only when every arg is NULL)
`LEAST(e1, e2, ...)`	Smallest non-null numeric value (NULL only when every arg is NULL)
`ISNULL(expr)`	TRUE if NULL
`ISNOTNULL(expr)`	TRUE if NOT NULL

Type Conversion

Function	Description
`CAST(value AS TYPE)`	Convert type (error on failure)
`TRY_CAST(value AS TYPE)` / `SAFE_CAST(...)`	Convert type (NULL on failure)
`CONVERT(value, TYPE)`	Alias for CAST
`value::TYPE`	PostgreSQL-style cast

System

Function	Description
`VERSION()`	Engine version string

Scroll & Pagination

For large result sets, the Gateway uses Elasticsearch scroll or search-after mechanisms depending on backend capabilities.

LIMIT and OFFSET are applied after retrieving documents from Elasticsearch
Deep pagination may require scroll
search_after requires an explicit ORDER BY clause

SHOW and DESCRIBE Commands

-- Tables
SHOW TABLES [LIKE 'pattern'];
SHOW TABLE table_name;
SHOW CREATE TABLE table_name;
DESCRIBE TABLE table_name;

-- Pipelines
SHOW PIPELINES;
SHOW PIPELINE pipeline_name;
SHOW CREATE PIPELINE pipeline_name;
DESCRIBE PIPELINE pipeline_name;

-- Watchers
SHOW WATCHERS;
SHOW WATCHER STATUS watcher_name;

-- Enrich Policies
SHOW ENRICH POLICIES;
SHOW ENRICH POLICY policy_name;

-- Cluster
SHOW CLUSTER NAME;

-- License
SHOW LICENSE;
REFRESH LICENSE;

SHOW LICENSE

SHOW LICENSE;

Returns the current license type, quota values, expiration date, and grace status.

Column	Description
`license_type`	Current license tier (Community, Pro, Enterprise). Shows “(trial)” suffix for trial licenses, “(degraded)” suffix if degraded.
`trial`	`true` if the license is a Pro trial, `false` otherwise
`platform`	Platform scope of the current license key (PRODUCTION, STAGING, DEVELOPMENT, INTEGRATION). Defaults to “PRODUCTION” when not platform-scoped.
`max_materialized_views`	Maximum materialized views allowed, or “unlimited”
`max_clusters`	Maximum federated clusters allowed, or “unlimited”
`max_result_rows`	Maximum rows returned per query, or “unlimited”
`max_concurrent_queries`	Maximum concurrent queries allowed, or “unlimited”
`max_joins`	Maximum number of JOIN operations allowed per query, or “unlimited”
`expires_at`	License expiration timestamp, or “never” for Community
`days_remaining`	Days until expiration, or -1 for Community (no expiry)
`status`	”Active”, or grace period details if expired

REFRESH LICENSE

REFRESH LICENSE;

Forces an immediate license refresh from the backend (API key fetch). Returns the previous and new tier information.

Column	Description
`previous_tier`	License tier before refresh
`new_tier`	License tier after refresh
`trial`	`true` if the new license is a Pro trial, `false` otherwise
`expires_at`	New expiration timestamp
`status`	”Refreshed” on success, “Failed” on error
`message`	Error details (empty on success)

Requires API key configuration. Without an API key, returns an informational failure message.

Version Compatibility

Feature	ES6	ES7	ES8	ES9
Basic SELECT	Yes	Yes	Yes	Yes
Nested fields	Yes	Yes	Yes	Yes
UNION ALL	Yes	Yes	Yes	Yes
JOIN UNNEST	Yes	Yes	Yes	Yes
Aggregations	Yes	Yes	Yes	Yes
Parent-level nested array aggs	Yes	Yes	Yes	Yes
Window functions	Yes	Yes	Yes	Yes
Geospatial functions	Yes	Yes	Yes	Yes
Date/time functions	Yes	Yes	Yes	Yes
String / math functions	Yes	Yes	Yes	Yes

Limitations

Traditional SQL joins are supported only through the use of Materialized Views (only JOIN UNNEST on ARRAY<STRUCT> is available natively)
No correlated subqueries
No arbitrary subqueries in SELECT or WHERE
No GROUPING SETS, CUBE, ROLLUP
No DISTINCT ON
No explicit window frame clauses (ROWS BETWEEN ...)