Skip to main content

Elasticsearch Cheatsheet — 80+ Query DSL, Mapping, Analyzer, Aggregation, Replication with Real Pitfalls

Elasticsearch cheat sheet — 80+ Query DSL, mapping, analyzer, aggregation, replication, with real examples.

  • Runs locally
  • Category Developer & DevOps
  • Best for Formatting, validating, shrinking, or inspecting code-adjacent text.
172 commands
Index management (19)
PUT /<index>

Create an index. With no body, you get default 1 primary / 1 replica and dynamic mapping — fine for prototyping, almost never right for production.

Common pitfall: Default 1 shard is fine for < 50GB indices; oversharding ("100 shards for 1GB of logs") wrecks cluster state and recovery time. Size on actual data volume, not "what if we grow".

Examples
PUT /products
PUT /logs-2026.05 {"settings":{"number_of_shards":3,"number_of_replicas":1}}
DELETE /<index>

Delete an index and free disk space immediately. Permanent — no soft delete, no recycle bin. Combine with action.destructive_requires_name in elasticsearch.yml to refuse wildcards.

Common pitfall: DELETE /* on a cluster with default settings instantly wipes every index. Set action.destructive_requires_name: true so explicit names are required.

Examples
DELETE /products
DELETE /logs-2025.*
GET /<index>

Inspect an index — returns mapping, settings, aliases in one shot. Use ?include_defaults=true to see what every unset setting actually resolves to.

Examples
GET /products
GET /products?include_defaults=true
POST /<index>/_close · _open

Close an index to free heap (mappings stay, shards unload) and reopen it later. Settings that require a closed index (analyzers, similarity) only apply after close → update → open.

Common pitfall: A closed index occupies disk but cannot be searched or written. Closing the wrong production index = full-blown outage. Use index name explicitly, not a pattern.

Examples
POST /products/_close
POST /products/_open
POST /_aliases

Atomically add/remove aliases across multiple indices. The canonical zero-downtime reindex pattern: write to old index, build new index, swap alias in one atomic action.

Examples
POST /_aliases {"actions":[{"remove":{"index":"products_v1","alias":"products"}},{"add":{"index":"products_v2","alias":"products"}}]}
POST /_aliases {"actions":[{"add":{"index":"logs-2026.05","alias":"logs-current","is_write_index":true}}]}
PUT /_index_template/<name>

Index template (composable, 7.8+). Auto-applies mappings + settings to any new index whose name matches the pattern — the right way to standardize daily log indices, time-based rollovers, and tenant indices.

Common pitfall: Templates only apply to indices created AFTER you save the template. Existing indices keep their old mapping — reindex if you need the new shape applied retroactively.

Examples
PUT /_index_template/logs-template {"index_patterns":["logs-*"],"template":{"settings":{"number_of_shards":1,"number_of_replicas":1},"mappings":{"properties":{"@timestamp":{"type":"date"},"message":{"type":"text"}}}}}
POST /<index>/_rollover

Roll an alias to a new index when size / age / doc-count thresholds are hit. Pair with an ILM policy so daily/weekly indices stay bounded and old ones move to warm/cold tier automatically.

Examples
POST /logs-current/_rollover {"conditions":{"max_age":"7d","max_size":"50gb","max_docs":100000000}}
POST /_reindex

Copy all documents from one index to another, optionally remapping fields or filtering by query. The only way to change a field type or add a new analyzer to existing data.

Common pitfall: Default reindex runs synchronously and blocks. For > 100k docs, use ?wait_for_completion=false to get a task id, then poll GET /_tasks/<id>. Tune slices=auto for parallelism.

Examples
POST /_reindex {"source":{"index":"products_v1"},"dest":{"index":"products_v2"}}
POST /_reindex?wait_for_completion=false&slices=auto {"source":{"index":"logs-2025.*"},"dest":{"index":"logs-archive"}}
POST /<index>/_forcemerge

Merge segments down to N (typically max_num_segments=1) — reduces shard segment count, frees deleted-doc space, speeds searches on read-only indices.

Common pitfall: NEVER force-merge an actively written index. It generates massive merge IO and the next refresh creates new segments anyway. Only run on indices that have stopped receiving writes.

Examples
POST /logs-2025.12/_forcemerge?max_num_segments=1
POST /<index>/_refresh

Manually refresh an index — makes recent writes visible to search immediately. Default auto-refresh is every 1s; for high-throughput write workloads, raise refresh_interval to 30s and call _refresh on demand.

Examples
POST /products/_refresh
PUT /products/_settings {"index":{"refresh_interval":"30s"}}
POST /<index>/_flush

Flush the translog and commit Lucene segments to disk. Routine flushes happen automatically; manual flush is mostly useful before snapshot or shutdown.

Examples
POST /products/_flush
POST /_flush
PUT /<index>/_settings

Update dynamic index settings on a live index — refresh_interval, number_of_replicas, max_result_window, blocks. Static settings (number_of_shards, the analysis section) cannot be changed here; they need a closed index or a reindex.

Common pitfall: number_of_shards is fixed at creation and can never be updated — plan it up front or use _split / _shrink to change it. Only number_of_replicas is freely adjustable on a live index.

Examples
PUT /products/_settings {"index":{"number_of_replicas":2,"refresh_interval":"30s"}}
PUT /products/_settings {"index":{"max_result_window":50000}}
POST /<index>/_clone

Clone an index into a new one with the same mapping and the same shard count, hard-linking segments so it is near-instant and uses almost no extra disk. The source must be read-only (index.blocks.write) first.

Examples
PUT /products/_settings {"settings":{"index.blocks.write":true}}
POST /products/_clone/products_copy
POST /<index>/_shrink

Shrink an index to fewer primary shards (target count must divide the source). Used to consolidate an over-sharded index after rollover so old read-only indices stop wasting cluster-state overhead.

Common pitfall: Before _shrink the index must be read-only and all its primary shards must sit on the SAME node — set index.routing.allocation.require._name first. Forgetting this leaves the shrink stuck.

Examples
PUT /logs-old/_settings {"settings":{"index.blocks.write":true,"index.routing.allocation.require._name":"node-1"}}
POST /logs-old/_shrink/logs-old-shrunk {"settings":{"index.number_of_shards":1,"index.blocks.write":null}}
POST /<index>/_split

Split an index into MORE primary shards without reindexing. Source must be read-only; target shard count must be a multiple of the source. The escape hatch when an index outgrew its original shard count.

Examples
PUT /events/_settings {"settings":{"index.blocks.write":true}}
POST /events/_split/events_split {"settings":{"index.number_of_shards":6}}
GET /<index>/_stats

Per-index operational stats — doc count, store size, indexing/search/merge/refresh/flush rates, segment count, query cache hit ratio. The first stop when one index feels slow.

Examples
GET /products/_stats
GET /products/_stats/search,indexing,merge
GET /<index>/_segments

List the Lucene segments inside each shard — count, size, doc count, deleted-doc count. A high segment count or many deleted docs is the signal that a force-merge (on a read-only index) would help.

Examples
GET /products/_segments
GET /logs-2025.12/_segments?verbose=false
PUT /_component_template/<name>

Reusable building block for composable index templates — define mappings or settings once, then reference it from many index templates via composed_of. Keeps shared field definitions DRY across log/metric/trace templates.

Examples
PUT /_component_template/base-settings {"template":{"settings":{"number_of_shards":1,"number_of_replicas":1}}}
PUT /_index_template/logs {"index_patterns":["logs-*"],"composed_of":["base-settings"]}
PUT /<data-stream> (data stream)

A data stream is an append-only abstraction over time-series backing indices, auto-rolling via its index template + ILM. The modern replacement for "manage your own daily index + alias" — you write to one name, ES handles rollover.

Common pitfall: Data streams only accept create (append) ops — you cannot update or delete a single doc by id through the stream name; you must target the concrete backing index (.ds-<name>-<date>-NNNN).

Examples
PUT /_index_template/metrics {"index_patterns":["metrics-*"],"data_stream":{},"template":{"mappings":{"properties":{"@timestamp":{"type":"date"}}}}}
PUT /_data_stream/metrics-app
Documents (17)
PUT /<index>/_doc/<id>

Index a document with a known id (creates or fully replaces). To insert only if absent, use op_type=create and you get a 409 if the id exists.

Examples
PUT /products/_doc/1 {"name":"Headphones","price":99,"in_stock":true}
PUT /products/_doc/1?op_type=create {"name":"Headphones"}
POST /<index>/_doc

Index a document without specifying id — Elasticsearch auto-generates a base64 id. Slightly faster than PUT/_doc/<id> because it skips the "is this an update?" check.

Examples
POST /products/_doc {"name":"Mouse","price":25}
GET /<index>/_doc/<id>

Fetch a single document by id. Returns _source plus metadata. Add _source_includes / _source_excludes to project only the fields you care about — much cheaper on big docs.

Examples
GET /products/_doc/1
GET /products/_doc/1?_source_includes=name,price
POST /<index>/_update/<id>

Partial update — merges the supplied fields into the existing _source via a single atomic get-modify-index cycle. Use doc for simple merges, script for conditional logic.

Common pitfall: ES has no true in-place update; every "update" rewrites the whole document and marks the old one as deleted. Heavy update workloads bloat segments — reindex or use rollover indices.

Examples
POST /products/_update/1 {"doc":{"price":89}}
POST /products/_update/1 {"script":{"source":"ctx._source.views += params.n","params":{"n":1}}}
POST /<index>/_update_by_query

Update every document that matches a query in one server-side pass — no round-trip per doc. Combine with a script to backfill or migrate fields without writing a client program.

Examples
POST /products/_update_by_query {"script":{"source":"ctx._source.active = true"},"query":{"term":{"in_stock":true}}}
DELETE /<index>/_doc/<id>

Delete a single document by id. Disk is only reclaimed after Lucene segment merge — so disk usage does not drop immediately after a delete spike.

Examples
DELETE /products/_doc/1
POST /<index>/_delete_by_query

Delete every document matching a query. Safer than DROP TABLE because the index stays, mapping stays, only the data goes — and you can preview with a search first.

Common pitfall: Marks docs as deleted; disk is only reclaimed after segment merge. For "wipe the index", DELETE /<index> + recreate is faster than _delete_by_query.

Examples
POST /logs/_delete_by_query {"query":{"range":{"@timestamp":{"lt":"now-30d"}}}}
POST /_bulk

Batch index / update / delete in one request — the only way to hit ES write throughput at scale. NDJSON body: one action line + one source line per op.

Common pitfall: Sweet spot: 5-15MB per bulk request. < 1MB = too many round-trips; > 100MB = timeouts and HTTP 413. Set request timeout > 30s for big bulks.

Examples
POST /_bulk
{"index":{"_index":"products","_id":"1"}}
{"name":"A"}
{"index":{"_index":"products","_id":"2"}}
{"name":"B"}
GET /_mget

Multi-get — fetch many documents by id across one or more indices in a single round-trip. Much faster than N separate GET /_doc calls.

Examples
GET /_mget {"docs":[{"_index":"products","_id":"1"},{"_index":"products","_id":"2"}]}
GET /<index>/_count

Lightweight count of documents matching a query. Cheaper than search.size=0 + total because no scoring, no source loading.

Examples
GET /products/_count
GET /products/_count {"query":{"term":{"in_stock":true}}}
GET /<index>/_source/<id>

Fetch ONLY the _source of a document, skipping the metadata envelope (_index, _id, _version). Slightly lighter than GET /_doc/<id> when you just need the raw stored JSON.

Examples
GET /products/_source/1
GET /products/_source/1?_source_includes=name,price
HEAD /<index>/_doc/<id>

Existence check — returns 200 if the document exists, 404 if not, with no body transferred. The cheapest way to ask "is this id already indexed?" before deciding to create vs update.

Examples
HEAD /products/_doc/1
PUT /<index>/_doc/<id>?if_seq_no=&if_primary_term=

Optimistic concurrency control — only apply the write if the doc still has the seq_no + primary_term you last read. If another writer changed it first, you get a 409 instead of a silent lost update.

Common pitfall: ES has no row locks. Without if_seq_no / if_primary_term, two concurrent read-modify-write clients silently clobber each other. Always pass both for compare-and-set updates.

Examples
PUT /products/_doc/1?if_seq_no=12&if_primary_term=2 {"name":"Headphones","stock":5}
POST /<index>/_update/<id> (upsert)

Update-or-insert in one atomic call — run the script/doc if the id exists, otherwise index the upsert body. The idiomatic "increment a counter, creating it at 0 if absent" pattern.

Examples
POST /counters/_update/page-1 {"script":{"source":"ctx._source.hits += params.n","params":{"n":1}},"upsert":{"hits":1}}
POST /products/_update/1 {"doc":{"price":89},"doc_as_upsert":true}
POST /<index>/_doc?routing=<value>

Custom routing — pin a document to a shard chosen by your routing value (default is a hash of _id). Co-locate related docs (all of one tenant) on one shard so queries with the same routing hit a single shard.

Common pitfall: Custom routing can create hot shards if one routing value owns most of the data. And you MUST pass the same routing on GET/DELETE/update or ES looks on the wrong shard and reports 404.

Examples
POST /orders/_doc?routing=tenant-42 {"tenant":"tenant-42","total":99}
GET /orders/_search?routing=tenant-42 {"query":{"match_all":{}}}
GET /<index>/_explain/<id>

Explain exactly why one specific document does or does not match a query, and how its BM25 score was computed term by term. The right tool for "why is this result ranked here?" relevance debugging.

Examples
GET /products/_explain/1 {"query":{"match":{"description":"wireless"}}}
POST /<index>/_termvectors/<id>

Return the term vectors of a document — per-field terms with frequency, position, offset, and payload. Used to debug analysis, build "more like this" features, or inspect what actually got indexed.

Examples
POST /products/_termvectors/1 {"fields":["description"],"term_statistics":true}
Mapping (18)
PUT /<index>/_mapping

Add new fields or sub-fields to an existing index mapping. Adding a new field is allowed; changing a field type is NOT — reindex into a new index for type changes.

Common pitfall: Once a field is mapped as keyword, you cannot turn it into text in place — you have to create a new index with the right mapping and reindex.

Examples
PUT /products/_mapping {"properties":{"tags":{"type":"keyword"},"description":{"type":"text","analyzer":"english"}}}
GET /<index>/_mapping

Inspect the current mapping of an index — every property, its type, its analyzer, its sub-fields. Use this before writing any query so you query the right field type.

Examples
GET /products/_mapping
GET /products/_mapping/field/name
type: text vs keyword

text = analyzed into tokens for full-text search (match queries); keyword = stored exactly as one token for filtering, sorting, aggregations. Most string fields want BOTH via a multi-field.

Common pitfall: You CANNOT aggregate or sort on a text field by default (fielddata is off). Always declare strings as text with a keyword sub-field: {"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}.

Examples
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}
type: date

Date field. Accepts ISO 8601, millis-since-epoch, or any pattern declared in format. Internally stored as long millis — range queries are extremely fast.

Examples
{"properties":{"@timestamp":{"type":"date","format":"strict_date_optional_time||epoch_millis"}}}
type: nested

Nested object — preserves the relationship between sub-fields within an array of objects. Required when you need to query "any one element matches BOTH fields A and B at once".

Common pitfall: Plain object arrays flatten — {"tags":[{"k":"x","v":1},{"k":"y","v":2}]} becomes {tags.k:[x,y], tags.v:[1,2]} and a bool query "k=x AND v=2" matches falsely. Use nested when AND inside one object matters.

Examples
{"properties":{"variants":{"type":"nested","properties":{"sku":{"type":"keyword"},"price":{"type":"double"}}}}}
type: geo_point

Geo-point field — supports lat/lon, [lon,lat] array, "lat,lon" string, geohash. Enables geo_distance, geo_bounding_box, geo_polygon queries and geo_distance sorts.

Examples
{"properties":{"location":{"type":"geo_point"}}}
POST /stores/_doc {"location":{"lat":40.71,"lon":-74.0}}
dynamic: true · false · strict

Controls what happens when a doc with an unknown field arrives. true = auto-create (default, dangerous); false = ignore unknown fields; strict = reject the whole document with an error.

Common pitfall: Default dynamic:true on user-controlled input causes "mapping explosion" — millions of auto-created fields blow up cluster state. Always set dynamic:strict on user-supplied JSON.

Examples
{"mappings":{"dynamic":"strict","properties":{"name":{"type":"text"}}}}
multi-field (fields)

Index one source field as multiple sub-fields with different analyzers/types — full-text search on name, exact match on name.keyword, ngram autocomplete on name.autocomplete.

Examples
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256},"english":{"type":"text","analyzer":"english"}}}}}
index: false

Store but do not index a field — saves disk and memory, but the field becomes unsearchable. Use for fields you only ever return in _source (display labels, raw HTML, blobs).

Examples
{"properties":{"raw_html":{"type":"text","index":false}}}
numeric types (long, integer, double, scaled_float)

Pick the smallest numeric type that fits — byte/short/integer/long for ints, float/double/half_float for reals. scaled_float stores a float as a scaled long (price × 100), which is smaller and faster than double for fixed-precision money.

Common pitfall: Do NOT map an identifier like an order_id or phone number as a numeric type just because it looks like digits — you will never do range math on it, and keyword indexes and aggregates it far more efficiently.

Examples
{"properties":{"price":{"type":"scaled_float","scaling_factor":100}}}
{"properties":{"order_id":{"type":"keyword"}}}
type: ip

IP field — stores IPv4 and IPv6, and supports CIDR range queries directly ("everything in 10.0.0.0/8"). Far better than storing IPs as keyword strings, which cannot do subnet matching.

Examples
{"properties":{"client_ip":{"type":"ip"}}}
{"query":{"term":{"client_ip":"10.0.0.0/8"}}}
type: boolean

Boolean field — accepts true/false, plus the JSON strings "true"/"false". Indexed as a single term, so term/filter queries on it are extremely cheap and cacheable.

Examples
{"properties":{"in_stock":{"type":"boolean"}}}
{"query":{"term":{"in_stock":true}}}
type: object vs flattened

A plain object maps every sub-key as its own field (good for known shapes, bad for unbounded keys). flattened indexes an entire JSON object as ONE field of keyword-like leaves — perfect for arbitrary metadata/labels that would otherwise explode the mapping.

Common pitfall: flattened fields are exact-match only — no full-text analysis, no per-leaf type, no numeric range. Use it to TAME unbounded keys, not to search them like text.

Examples
{"properties":{"labels":{"type":"flattened"}}}
{"query":{"term":{"labels.env":"prod"}}}
type: completion (suggester)

Purpose-built type backing the completion suggester — an in-memory FST optimized for fast prefix autocomplete with weights. Use it for type-ahead search boxes instead of edge_ngram when you want ranked suggestions.

Examples
{"properties":{"suggest":{"type":"completion"}}}
POST /products/_search {"suggest":{"s":{"prefix":"head","completion":{"field":"suggest","size":5}}}}
type: dense_vector (kNN)

Store float vectors for semantic / kNN search (8.0+). With index:true and an HNSW similarity (cosine, dot_product, l2_norm) you get approximate nearest-neighbour search — the backbone of vector / embedding retrieval in ES.

Common pitfall: For dot_product similarity ES requires unit-length (normalized) vectors; feeding un-normalized vectors silently skews relevance. Normalize embeddings before indexing or use cosine.

Examples
{"properties":{"embedding":{"type":"dense_vector","dims":768,"index":true,"similarity":"cosine"}}}
type: alias

Field alias — a virtual name that points at a real field for queries and aggregations, without copying data. Lets you rename a field in your query layer (e.g. "ts" → "@timestamp") without reindexing.

Common pitfall: Aliases work for read paths (query, agg, sort) only — you cannot index INTO an alias field, and they cannot point at an object or another alias.

Examples
{"properties":{"@timestamp":{"type":"date"},"ts":{"type":"alias","path":"@timestamp"}}}
copy_to

Copy several source fields into one combined searchable field at index time — search "John Smith" across first_name + last_name without a multi_match. The combined field is searchable but not returned in _source.

Examples
{"properties":{"first_name":{"type":"text","copy_to":"full_name"},"last_name":{"type":"text","copy_to":"full_name"},"full_name":{"type":"text"}}}
runtime field

Define a field computed by a Painless script AT QUERY TIME, not stored on disk (schema-on-read). Add or fix a field on existing data with zero reindex — pay a small per-query CPU cost instead of disk.

Common pitfall: Runtime fields are computed per matching doc on every query — cheap for filtering a few results, expensive when sorted/aggregated over millions. Promote a hot runtime field to an indexed field once the shape stabilizes.

Examples
PUT /logs/_mapping {"runtime":{"status_class":{"type":"keyword","script":{"source":"emit(doc['status'].value >= 500 ? '5xx' : 'ok')"}}}}
Query DSL (31)
match

Full-text query — analyzes the query string with the field analyzer, matches any token, scores by BM25. The default tool for searching text fields.

Examples
GET /products/_search {"query":{"match":{"description":"wireless headphones"}}}
GET /_search {"query":{"match":{"description":{"query":"wireless headphones","operator":"and","minimum_should_match":"75%"}}}}
match_phrase

Phrase query — tokens must appear in the same order, with no gaps (or up to slop). Use for "exact phrase" search; combine with slop=2 to allow a couple of words between.

Examples
{"query":{"match_phrase":{"description":"wireless noise cancelling"}}}
{"query":{"match_phrase":{"description":{"query":"wireless headphones","slop":2}}}}
multi_match

Run the same query string against multiple fields at once, optionally with per-field boosts. Type "best_fields" picks the best-scoring field; "cross_fields" treats them as one big field.

Examples
{"query":{"multi_match":{"query":"wireless","fields":["name^3","description"],"type":"best_fields"}}}
term

Exact value query — NOT analyzed. Use on keyword / numeric / boolean / date fields. Searching "Apple" with term against a text field will miss everything because text was lowercased on index.

Common pitfall: term on a text field almost always returns 0 hits — text was lowercased and tokenized at index time, so "Apple" became "apple". Either query against name.keyword or use a match query.

Examples
{"query":{"term":{"status":"active"}}}
{"query":{"term":{"name.keyword":"Apple"}}}
terms

Match if the field equals ANY of the supplied values — the ES equivalent of SQL IN (...). Default cap is 65,536 terms; raise index.max_terms_count if you really need more.

Examples
{"query":{"terms":{"status":["active","pending","review"]}}}
range

Range query for numeric, date, or ip fields. Date math is supported — now-1d, now/d (rounded to day), now+5m, etc. Inclusive (gte/lte) and exclusive (gt/lt) bounds.

Examples
{"query":{"range":{"price":{"gte":10,"lte":100}}}}
{"query":{"range":{"@timestamp":{"gte":"now-7d/d","lt":"now/d"}}}}
bool (must / should / must_not / filter)

Boolean combinator. must = AND (scored), should = OR (scored, contributes to score), must_not = NOT (unscored), filter = AND (unscored, cacheable — fastest). Use filter wherever scoring does not matter.

Common pitfall: Putting equality / range clauses in must instead of filter wastes scoring CPU AND skips the filter cache. Move every "is this true?" clause to filter and only put "rank these" clauses in must.

Examples
{"query":{"bool":{"must":[{"match":{"description":"wireless"}}],"filter":[{"term":{"in_stock":true}},{"range":{"price":{"lte":200}}}],"must_not":[{"term":{"discontinued":true}}]}}}
wildcard

Wildcard match with * (zero or more chars) and ? (single char). Runs on keyword (or text with caveats). Leading-wildcard ("*foo") forces a full index scan — slow on large indices.

Common pitfall: Leading wildcard ("*foo*") is the slowest possible query — it cannot use the inverted index. For "contains" search, use a wildcard field (7.9+) or ngram analyzer instead.

Examples
{"query":{"wildcard":{"sku.keyword":{"value":"SKU-*-RED"}}}}
regexp

Regex match against a keyword field. Supports Lucene flavor regex (not full PCRE). Always anchored to the full term — no need for ^ or $.

Common pitfall: Regex queries can be exponentially slow on long terms; cap with max_determinized_states. For very high-cardinality fields, an ngram analyzer is faster than regex.

Examples
{"query":{"regexp":{"sku.keyword":{"value":"[A-Z]{3}-[0-9]{4}","max_determinized_states":10000}}}}
prefix

Prefix match — find terms starting with the given string. Cheap on keyword fields because the inverted index is sorted. Use this for type-ahead, not wildcard.

Examples
{"query":{"prefix":{"name.keyword":{"value":"Apple"}}}}
fuzzy

Levenshtein-distance fuzzy match — tolerates spelling mistakes. AUTO picks 0/1/2 edits based on term length. Slow on long terms; cap with prefix_length and max_expansions.

Examples
{"query":{"fuzzy":{"name":{"value":"helo","fuzziness":"AUTO","prefix_length":1}}}}
exists

Match documents where the field exists (has a non-null, non-empty value). The closest thing to IS NOT NULL in ES. Combine in must_not for IS NULL.

Examples
{"query":{"exists":{"field":"email"}}}
{"query":{"bool":{"must_not":[{"exists":{"field":"email"}}]}}}
ids

Fetch a small set of documents by id from one or more indices in a single query. Faster than separate _mget if you want the same query interface (scoring, highlighting).

Examples
{"query":{"ids":{"values":["1","2","42"]}}}
nested query

Query inside a nested field — required to enforce "all conditions match the same nested object". Use inner_hits to return which nested element actually matched.

Examples
{"query":{"nested":{"path":"variants","query":{"bool":{"must":[{"term":{"variants.color":"red"}},{"range":{"variants.price":{"lte":50}}}]}},"inner_hits":{}}}}
geo_distance

Match documents within distance D of a point. Pair with sort: _geo_distance for "nearest first". Default distance unit is meters; supports km, mi.

Examples
{"query":{"geo_distance":{"distance":"5km","location":{"lat":40.71,"lon":-74.0}}}}
function_score

Custom scoring — modify BM25 by recency, popularity, geo distance, random, or a script. Use score_mode and boost_mode to blend the original score with the modifier.

Examples
{"query":{"function_score":{"query":{"match":{"description":"wireless"}},"functions":[{"gauss":{"@timestamp":{"origin":"now","scale":"30d"}}}],"boost_mode":"multiply"}}}
highlight

Return per-hit snippets with matched terms wrapped in <em>...</em> (configurable). Three highlighters: unified (default, balanced), plain (slow but accurate), fvh (fast, needs term_vector=with_positions_offsets).

Examples
{"query":{"match":{"description":"wireless"}},"highlight":{"fields":{"description":{"fragment_size":150,"number_of_fragments":3}}}}
search_after (deep pagination)

Stable deep pagination — pass the sort values of the last hit as search_after on the next request. Replaces from + size for paging past 10,000.

Common pitfall: from + size deep paging breaks at index.max_result_window=10000 by default — and even before that, every coordinator pulls (from + size) docs from every shard. Use search_after or PIT.

Examples
{"size":20,"sort":[{"@timestamp":"desc"},{"_id":"asc"}],"search_after":["2026-05-26T10:00:00Z","abc123"]}
scroll · point_in_time

Scroll = legacy snapshot for offline export. PIT (point-in-time, 7.10+) = the modern replacement, pairs with search_after for resumable, snapshot-consistent paging.

Examples
POST /products/_pit?keep_alive=5m
GET /_search {"pit":{"id":"<pit-id>","keep_alive":"5m"},"size":100,"sort":[{"_shard_doc":"asc"}]}
match_all · match_none

match_all returns every document with a constant score of 1 — the default query and the right way to "give me everything" (paginated). match_none returns nothing, useful as a placeholder in templated queries.

Examples
{"query":{"match_all":{}}}
{"query":{"match_none":{}}}
terms_set

Match if a minimum number of the supplied terms are present, where the threshold comes from a field or script — "match if at least N of these skills are listed". The DSL way to express "any 3 of 5".

Examples
{"query":{"terms_set":{"skills":{"terms":["es","sql","python"],"minimum_should_match_field":"required_matches"}}}}
constant_score

Wrap a filter so every match gets the same fixed score (boost), skipping BM25 entirely. Use it when you want filter-cache speed but still need the clause inside a should to contribute a flat boost.

Examples
{"query":{"constant_score":{"filter":{"term":{"in_stock":true}},"boost":1.2}}}
dis_max

Disjunction-max — run several queries, take the single best-scoring one as the result score (plus tie_breaker × the rest). Stops a document from being double-counted when the same term matches multiple fields.

Examples
{"query":{"dis_max":{"queries":[{"match":{"title":"wireless"}},{"match":{"body":"wireless"}}],"tie_breaker":0.3}}}
query_string · simple_query_string

Parse a Lucene-syntax string ("wireless AND (headphones OR earbuds) -wired") into a query. query_string is powerful but throws on bad syntax; simple_query_string never errors — safe for raw end-user input.

Common pitfall: Never expose raw query_string to untrusted users — a malformed or hostile query (deep wildcards, huge boolean trees) can error or hammer the cluster. Use simple_query_string for public search boxes.

Examples
{"query":{"simple_query_string":{"query":"wireless +headphones -wired","fields":["name^2","description"]}}}
more_like_this (MLT)

Find documents similar to a given text or set of seed documents, based on shared significant terms. The classic "related articles" / "more like this product" feature without embeddings.

Examples
{"query":{"more_like_this":{"fields":["title","body"],"like":[{"_index":"articles","_id":"42"}],"min_term_freq":1,"max_query_terms":12}}}
knn search (vector)

Approximate k-nearest-neighbour search over a dense_vector field using HNSW (8.x top-level knn). Returns the k closest vectors to a query vector — the retrieval half of semantic / RAG search. Combine with a filter to restrict the candidate set.

Examples
POST /docs/_search {"knn":{"field":"embedding","query_vector":[0.12,0.83],"k":10,"num_candidates":100,"filter":{"term":{"lang":"zh"}}}}
distance_feature

Boost documents by how close a date or geo_point field is to an origin, decaying with distance — promote recent or nearby results cheaply inside a bool query. Faster than function_score for the common recency/proximity boost.

Examples
{"query":{"bool":{"must":{"match":{"description":"coffee"}},"should":{"distance_feature":{"field":"@timestamp","origin":"now","pivot":"7d"}}}}}
sort (multi-field, missing, mode)

Sort hits by one or more fields, each asc/desc, with missing:_first/_last for null handling and mode (min/max/avg/sum/median) to collapse array fields. Sorting by a field switches off score computation unless you ask for track_scores.

Examples
{"sort":[{"price":{"order":"asc","missing":"_last"}},{"_score":"desc"}]}
{"sort":[{"ratings":{"order":"desc","mode":"avg"}}]}
_source filtering · stored_fields

Control what each hit returns — _source:false drops the body entirely, _source:{includes,excludes} projects fields, fields:[...] returns formatted values (good for runtime fields and dates). Trims payload on wide documents.

Examples
{"_source":{"includes":["name","price"]},"query":{"match_all":{}}}
{"_source":false,"fields":["name","@timestamp"],"query":{"match_all":{}}}
GET /_search/template

Run a stored Mustache search template with runtime params — keep the query shape on the server, pass only values from the client. Cleaner and safer than building query JSON by string concatenation.

Examples
POST /_scripts/find_by_status {"script":{"lang":"mustache","source":{"query":{"term":{"status":"{{s}}"}}}}}
GET /products/_search/template {"id":"find_by_status","params":{"s":"active"}}
GET /_search?profile=true

Profile a query — get a per-shard, per-component breakdown of where time went (which subquery, rewrite, collector). The definitive tool for "why is this search slow?" once you have ruled out mapping mistakes.

Examples
GET /products/_search {"profile":true,"query":{"bool":{"must":[{"match":{"description":"wireless"}}]}}}
Aggregations (22)
terms agg

Group by field value, like SQL GROUP BY. Returns the top N buckets by doc count (default size=10). The single most-used aggregation — every facet, every "top sellers" report goes through this.

Common pitfall: terms agg on a high-cardinality field with default size=10 misses the long tail AND is approximate — see doc_count_error_upper_bound. Use composite agg when you need ALL buckets paginated.

Examples
{"size":0,"aggs":{"by_status":{"terms":{"field":"status","size":20}}}}
avg · sum · min · max · stats

Single-value metric aggregations on numeric fields. stats returns all five (count, min, max, avg, sum) in one pass — almost always preferable to running them separately.

Examples
{"size":0,"aggs":{"price_stats":{"stats":{"field":"price"}}}}
{"size":0,"aggs":{"avg_price":{"avg":{"field":"price"}}}}
cardinality

Approximate distinct-count using HyperLogLog++ — bounded memory regardless of cardinality. Tunable via precision_threshold (default 3000, max 40000) to trade accuracy for RAM.

Common pitfall: cardinality is APPROXIMATE — typical error 1-6%. If you need exact counts < 100k, use a terms agg + count buckets. Never use it for billing or compliance numbers.

Examples
{"size":0,"aggs":{"unique_users":{"cardinality":{"field":"user_id","precision_threshold":40000}}}}
date_histogram

Bucket docs by date interval — calendar_interval (1d, 1M, 1y, handles DST and month length) or fixed_interval (30s, 1h, 7d). The backbone of every time-series dashboard.

Examples
{"size":0,"aggs":{"daily":{"date_histogram":{"field":"@timestamp","calendar_interval":"1d","time_zone":"Asia/Shanghai"}}}}
histogram

Numeric histogram — bucket docs by fixed-width interval on a numeric field. Used for price ranges, latency distributions, etc.

Examples
{"size":0,"aggs":{"price_dist":{"histogram":{"field":"price","interval":50,"min_doc_count":1}}}}
range

Custom range buckets on numeric or date fields — define your own thresholds. Useful when histogram-fixed widths do not match business buckets (e.g. price tiers $0-49, $50-199, $200+).

Examples
{"size":0,"aggs":{"price_tier":{"range":{"field":"price","ranges":[{"to":50},{"from":50,"to":200},{"from":200}]}}}}
percentiles · percentile_ranks

Approximate percentile aggregations using t-digest or HDR algorithms. percentiles gives p50/p95/p99; percentile_ranks gives "what % is below X". The backbone of latency SLO dashboards.

Examples
{"size":0,"aggs":{"latency_p":{"percentiles":{"field":"latency_ms","percents":[50,95,99]}}}}
composite agg (paginate all buckets)

Walk EVERY bucket of one or more fields in stable order via after_key — the only safe way to "list all unique combinations" without loading them into memory at once.

Examples
{"size":0,"aggs":{"all_combos":{"composite":{"size":1000,"sources":[{"status":{"terms":{"field":"status"}}},{"day":{"date_histogram":{"field":"@timestamp","calendar_interval":"1d"}}}]}}}}
sub-aggregations (nested aggs)

Aggregations nest — every bucket can host metric or bucket sub-aggs. "Average price per status, broken down by day" is two levels deep and reads naturally.

Examples
{"size":0,"aggs":{"by_status":{"terms":{"field":"status"},"aggs":{"by_day":{"date_histogram":{"field":"@timestamp","calendar_interval":"1d"},"aggs":{"avg_price":{"avg":{"field":"price"}}}}}}}}
filter agg

Restrict a sub-aggregation to a subset of docs — like a one-off WHERE without affecting the outer query. Cheaper than re-running the query for each subset.

Examples
{"size":0,"aggs":{"in_stock_avg":{"filter":{"term":{"in_stock":true}},"aggs":{"avg_price":{"avg":{"field":"price"}}}}}}
top_hits agg

Inside each bucket, return the top N actual documents — by score or custom sort. Used for "show 3 sample docs per category" or "latest event per user" patterns.

Examples
{"size":0,"aggs":{"per_cat":{"terms":{"field":"category"},"aggs":{"sample":{"top_hits":{"size":3,"sort":[{"@timestamp":"desc"}]}}}}}}
significant_terms

Find terms that are statistically over-represented in a subset compared to the full corpus — surface "what makes this group different" without manual tuning.

Examples
{"size":0,"query":{"term":{"status":"fraud"}},"aggs":{"why_fraud":{"significant_terms":{"field":"ip_country"}}}}
value_count

Count how many documents have a value for a field — the metric counterpart of SQL COUNT(field). Unlike _count, it ignores docs where the field is missing, so it differs from the bucket doc_count.

Examples
{"size":0,"aggs":{"with_email":{"value_count":{"field":"email"}}}}
extended_stats

Like stats but adds variance, standard deviation, sum of squares, and standard-deviation bounds — the metrics you need for outlier detection and statistical anomaly thresholds.

Examples
{"size":0,"aggs":{"lat":{"extended_stats":{"field":"latency_ms","sigma":3}}}}
date_range

Bucket dates into named, explicit ranges (with date-math bounds like now-1M/M) — "this month vs last month vs older". Cleaner than date_histogram when you only need a handful of meaningful buckets.

Examples
{"size":0,"aggs":{"buckets":{"date_range":{"field":"@timestamp","ranges":[{"key":"last_7d","from":"now-7d/d"},{"key":"older","to":"now-7d/d"}]}}}}
nested agg

Step INTO a nested field so its sub-aggregations see one bucket per nested object, not per parent doc. Required to aggregate "average price across all variants" correctly when variants is a nested type. Pair with reverse_nested to climb back out.

Examples
{"size":0,"aggs":{"variants":{"nested":{"path":"variants"},"aggs":{"avg_price":{"avg":{"field":"variants.price"}}}}}}
filters agg (multi-bucket)

Define several named filters and get one bucket per filter in a single pass — "errors vs warnings vs info" counted together without three separate queries. The plural sibling of the single filter agg.

Examples
{"size":0,"aggs":{"levels":{"filters":{"filters":{"errors":{"term":{"level":"error"}},"warnings":{"term":{"level":"warn"}}}}}}}
global agg

Break a sub-aggregation out of the surrounding query so it sees ALL documents, not just the matched ones — compute "average price of everything" alongside "average price of the search results" in one request.

Examples
{"query":{"match":{"name":"wireless"}},"aggs":{"all":{"global":{},"aggs":{"avg_price":{"avg":{"field":"price"}}}}}}
bucket_script (pipeline)

A pipeline aggregation that computes a value FROM other sibling aggregations — e.g. conversion rate = sales ÷ visits per bucket. Lets you derive ratios and deltas in ES instead of post-processing in the client.

Examples
{"size":0,"aggs":{"by_day":{"date_histogram":{"field":"@timestamp","calendar_interval":"1d"},"aggs":{"sales":{"sum":{"field":"sale"}},"visits":{"sum":{"field":"visit"}},"cvr":{"bucket_script":{"buckets_path":{"s":"sales","v":"visits"},"script":"params.v > 0 ? params.s / params.v : 0"}}}}}}
derivative · cumulative_sum (pipeline)

Pipeline aggs over an ordered histogram — derivative gives the change between consecutive buckets ("daily growth"), cumulative_sum gives a running total ("total signups to date"). Both require a parent date_histogram or histogram.

Examples
{"size":0,"aggs":{"daily":{"date_histogram":{"field":"@timestamp","calendar_interval":"1d"},"aggs":{"signups":{"sum":{"field":"new_users"}},"growth":{"derivative":{"buckets_path":"signups"}},"total":{"cumulative_sum":{"buckets_path":"signups"}}}}}}
moving_fn (pipeline)

Run a window function (moving average, min, max, stdDev, linear-weighted) over a sliding window of histogram buckets — smooth a noisy time series or compute a 7-day trailing average right in the query.

Examples
{"size":0,"aggs":{"daily":{"date_histogram":{"field":"@timestamp","calendar_interval":"1d"},"aggs":{"v":{"sum":{"field":"sale"}},"ma7":{"moving_fn":{"buckets_path":"v","window":7,"script":"MovingFunctions.unweightedAvg(values)"}}}}}}
geohash_grid · geotile_grid

Bucket geo_point docs into a grid of cells at a chosen precision — the backend for heatmaps and clustered map markers. geotile_grid aligns to standard web-map tiles (z/x/y), so cells line up perfectly with map tiles.

Examples
{"size":0,"aggs":{"heat":{"geohash_grid":{"field":"location","precision":5}}}}
{"size":0,"aggs":{"tiles":{"geotile_grid":{"field":"location","precision":10}}}}
Analyzers (17)
standard analyzer

Default text analyzer — Unicode word boundaries (UAX #29), lowercase. Solid for most Western languages. NOT for CJK — Chinese / Japanese / Korean need ik / kuromoji / nori.

Examples
POST /_analyze {"analyzer":"standard","text":"Quick brown FOX 2026!"}
POST /_analyze

Test how a given analyzer tokenizes a string — the single most useful debugging tool when "my match query returns nothing". Always run this before opening a ticket.

Examples
POST /_analyze {"analyzer":"english","text":"Running quickly through the parks"}
POST /products/_analyze {"field":"name","text":"Wireless Headphones"}
language analyzers (english, french, ...)

Built-in language analyzers — stemming, stop words, lowercase per locale. english turns "running" into "run", "parks" into "park". Use these on user-facing text search for the right language.

Examples
{"properties":{"description":{"type":"text","analyzer":"english"}}}
custom analyzer

Compose char_filter + tokenizer + token_filter into your own analyzer. The recipe for autocomplete, search-as-you-type, dialect handling, and any non-default tokenization need.

Examples
PUT /demo {"settings":{"analysis":{"analyzer":{"my_a":{"type":"custom","tokenizer":"standard","filter":["lowercase","asciifolding","stop"]}}}}}
edge_ngram (autocomplete)

Generate prefixes of each token at index time — index "wireless" as ["wi","wir","wire",...,"wireless"]. Lets a simple match query power autocomplete with no leading wildcard.

Common pitfall: Use edge_ngram on the index analyzer ONLY. Set search_analyzer to standard / lowercase, or every search-time token also explodes into prefixes and you match way too much.

Examples
{"settings":{"analysis":{"filter":{"edge":{"type":"edge_ngram","min_gram":2,"max_gram":15}},"analyzer":{"ac_idx":{"tokenizer":"standard","filter":["lowercase","edge"]},"ac_search":{"tokenizer":"standard","filter":["lowercase"]}}}}}
ngram

Generate all substrings of length min_gram..max_gram from each token — enables "contains" search without wildcards. Costs disk and index time; reserve for short fields.

Examples
{"settings":{"analysis":{"filter":{"ng":{"type":"ngram","min_gram":3,"max_gram":4}},"analyzer":{"ng_a":{"tokenizer":"standard","filter":["lowercase","ng"]}}}}}
synonym filter

Map words to synonyms at index or search time — "tv" ⇄ "television", "laptop" ⇄ "notebook". Search-time synonyms cost no extra disk; index-time synonyms cost no extra query work but require reindex to update.

Examples
{"settings":{"analysis":{"filter":{"syn":{"type":"synonym","synonyms":["tv,television","laptop,notebook"]}},"analyzer":{"syn_a":{"tokenizer":"standard","filter":["lowercase","syn"]}}}}}
stop filter

Remove common stop words ("the", "a", "is") at index or search time. Saves index size and improves relevance — but breaks phrase queries that contain stop words ("to be or not to be").

Examples
{"settings":{"analysis":{"analyzer":{"my_a":{"tokenizer":"standard","filter":["lowercase","english_stop"]}},"filter":{"english_stop":{"type":"stop","stopwords":"_english_"}}}}}
asciifolding filter

Convert non-ASCII characters to their ASCII equivalents — "café" → "cafe", "naïve" → "naive". Lets users find accented terms by typing un-accented forms.

Examples
{"settings":{"analysis":{"analyzer":{"folding":{"tokenizer":"standard","filter":["lowercase","asciifolding"]}}}}}
CJK analyzers (ik, kuromoji, nori)

Standard analyzer treats CJK text as one big token — useless for Chinese/Japanese/Korean search. Install the right plugin: ik for Chinese, kuromoji for Japanese, nori for Korean.

Common pitfall: On managed services (AWS OpenSearch, Elastic Cloud), check whether the analyzer plugin is preinstalled before you ship — installing custom plugins may require a different tier.

Examples
PUT /zh_demo {"settings":{"analysis":{"analyzer":{"ik_a":{"type":"custom","tokenizer":"ik_smart"}}}}}
keyword analyzer

A no-op analyzer that emits the whole input as one token — no splitting, no lowercasing. Used on a text field when you actually want exact, un-tokenized matching but still want it to behave like text (e.g. for highlighting).

Examples
POST /_analyze {"analyzer":"keyword","text":"Wireless Headphones X1"}
whitespace · pattern · simple analyzers

Built-in lightweight analyzers — whitespace splits only on spaces (keeps case and punctuation), pattern splits on a regex you supply, simple splits on non-letters and lowercases. Handy for log tokens, codes, and CSV-like fields.

Examples
POST /_analyze {"analyzer":"whitespace","text":"ERROR 500 /api/v1"}
PUT /demo {"settings":{"analysis":{"analyzer":{"by_comma":{"type":"pattern","pattern":","}}}}}
search_as_you_type field

A field type that builds the edge-ngram sub-fields for you (._index_prefix, ._2gram, ._3gram) so a single multi_match powers as-you-type search without hand-wiring an edge_ngram analyzer.

Examples
{"properties":{"q":{"type":"search_as_you_type"}}}
{"query":{"multi_match":{"query":"head","type":"bool_prefix","fields":["q","q._2gram","q._3gram"]}}}
char_filter (mapping, html_strip, pattern_replace)

Pre-process the raw text BEFORE tokenization — html_strip removes tags, mapping swaps characters ("&" → "and"), pattern_replace runs a regex substitution. Runs before the tokenizer, so it can fix input the tokenizer would mangle.

Examples
PUT /demo {"settings":{"analysis":{"analyzer":{"clean":{"tokenizer":"standard","char_filter":["html_strip"]}}}}}
normalizer (keyword case-insensitive)

A token-filter-only pipeline for keyword fields — no tokenizer, just lowercase/asciifolding so "USA" and "usa" match exactly as one term. The right way to get case-insensitive exact match without switching to text.

Examples
PUT /demo {"settings":{"analysis":{"normalizer":{"lc":{"type":"custom","filter":["lowercase","asciifolding"]}}},"mappings":{"properties":{"country":{"type":"keyword","normalizer":"lc"}}}}}
analyzer vs search_analyzer

A field can use one analyzer at index time and a different one at search time. The classic pairing: edge_ngram on index, standard on search — so "wir" is indexed as prefixes but the query "wir" stays a single token.

Common pitfall: If you only set "analyzer", it applies to BOTH index and search. Forgetting to set search_analyzer with an edge_ngram index analyzer is the #1 cause of "autocomplete matches way too much".

Examples
{"properties":{"name":{"type":"text","analyzer":"ac_idx","search_analyzer":"standard"}}}
stemmer · kstem · porter

Token filters that reduce words to a root form — "running"/"ran" → "run". Choose algorithm by aggressiveness: kstem (light, keeps real words), porter/porter2 (classic), or a language-specific stemmer. Over-stemming hurts precision.

Examples
{"settings":{"analysis":{"filter":{"en_stem":{"type":"stemmer","language":"light_english"}},"analyzer":{"a":{"tokenizer":"standard","filter":["lowercase","en_stem"]}}}}}
Cluster & _cat (18)
GET /_cluster/health

Cluster health summary — status (green/yellow/red), node count, shard count, unassigned shards. green = all primaries + replicas assigned, yellow = primaries OK but some replicas unassigned, red = some primaries unassigned (data loss risk).

Examples
GET /_cluster/health
GET /_cluster/health?level=indices&wait_for_status=green&timeout=30s
GET /_cluster/stats

Cluster-wide stats — total docs, store size, JVM heap, OS load, indexing / search rates. The "top-level health snapshot" before drilling into nodes/indices.

Examples
GET /_cluster/stats
GET /_cluster/settings

Show cluster-wide dynamic and persistent settings. persistent survives a full cluster restart; transient is cleared on restart — almost always you want persistent.

Examples
GET /_cluster/settings?include_defaults=true
PUT /_cluster/settings {"persistent":{"cluster.routing.allocation.disk.watermark.low":"85%"}}
GET /_cluster/allocation/explain

Explain why a shard is or is not assigned to a node — the single best tool for debugging "yellow cluster forever" or "shard stuck in INITIALIZING". Returns the actual reason per node.

Examples
GET /_cluster/allocation/explain {"index":"products","shard":0,"primary":true}
GET /_cat/health

_cat APIs — human-friendly text tables, perfect for terminals and scripts. /_cat/health is the at-a-glance "is the cluster OK" check.

Examples
GET /_cat/health?v
GET /_cat/health?h=status,node.total,shards
GET /_cat/nodes

List every node — IP, role (m=master, d=data, i=ingest), heap %, CPU, load. Add ?v for a header row and ?h=name,role,heap.percent,ram.percent for tighter output.

Examples
GET /_cat/nodes?v
GET /_cat/nodes?h=name,role,heap.percent,ram.percent,cpu&v
GET /_cat/indices

Per-index summary — health, status, primary/replica counts, doc count, store size. Add ?s=store.size:desc to find your biggest indices fast.

Examples
GET /_cat/indices?v
GET /_cat/indices?v&s=store.size:desc&bytes=gb
GET /_cat/shards

List every shard with state (STARTED, INITIALIZING, RELOCATING, UNASSIGNED), node, size. Use ?h=index,shard,prirep,state,unassigned.reason to debug unassigned shards.

Examples
GET /_cat/shards?v
GET /_cat/shards?h=index,shard,prirep,state,unassigned.reason&v
GET /_cat/aliases · /_cat/templates · /_cat/recovery

More _cat helpers: aliases lists alias→index mappings, templates lists index templates by pattern, recovery shows shard recovery progress (useful during rolling restart).

Examples
GET /_cat/aliases?v
GET /_cat/templates?v
GET /_cat/recovery?v&active_only=true
GET /_nodes/stats

Detailed per-node stats — JVM, OS, process, fs, indices, thread_pool. Drill in with /_nodes/stats/thread_pool to find a saturated search / write thread pool.

Examples
GET /_nodes/stats/jvm,thread_pool
GET /_nodes/<node>/stats/indices/search
GET /_tasks

List long-running tasks (reindex, update_by_query, force-merge, snapshot) with progress. Cancel via POST /_tasks/<id>/_cancel — your kill-switch for runaway operations.

Examples
GET /_tasks?actions=*reindex&detailed=true
POST /_tasks/<task-id>/_cancel
GET /_nodes/hot_threads

Sample the hottest (most CPU-busy) threads across nodes — shows the actual stack traces eating CPU right now. The go-to when a node is pegged at 100% and you need to know which operation is to blame.

Examples
GET /_nodes/hot_threads?threads=3&interval=500ms
GET /_cat/thread_pool

Per-node thread-pool stats — active, queue, rejected counts for write, search, and other pools. A growing queue or non-zero rejected on the write/search pool is the clearest signal the cluster is overloaded.

Examples
GET /_cat/thread_pool/write,search?v&h=node_name,name,active,queue,rejected
POST /_cluster/reroute?retry_failed=true

Ask the cluster to re-attempt assigning shards that hit the max allocation-retry limit (5 by default) — the manual nudge after you have fixed the underlying cause (freed disk, restarted a node) of an UNASSIGNED shard.

Common pitfall: reroute?retry_failed only helps once the ROOT cause is gone — running it while disk is still full or a node is still down just burns the retry budget again. Check allocation/explain first.

Examples
POST /_cluster/reroute?retry_failed=true
GET /_cat/segments · /_cat/fielddata

More _cat diagnostics — segments shows per-shard Lucene segment counts/sizes (high count → consider force-merge on read-only indices), fielddata shows heap consumed by fielddata per field (find the field bloating heap).

Examples
GET /_cat/segments/products?v
GET /_cat/fielddata?v&s=size:desc
GET /_cat/count · /_cat/master · /_cat/allocation

Quick one-liners — count gives total docs across an index pattern, master shows the elected master node, allocation shows disk used/available and shard count per node (spot a lopsided node fast).

Examples
GET /_cat/count/logs-*?v
GET /_cat/master?v
GET /_cat/allocation?v
cluster.routing.allocation.enable

Temporarily control shard allocation — set to "none" before a rolling restart so the cluster does not waste effort rebalancing shards of a node you are about to bring right back. Set back to "all" afterwards.

Common pitfall: Forgetting to set it back to "all" after maintenance leaves new/replica shards permanently UNASSIGNED — a yellow/red cluster that no amount of waiting fixes. Always pair the disable with a re-enable.

Examples
PUT /_cluster/settings {"persistent":{"cluster.routing.allocation.enable":"none"}}
PUT /_cluster/settings {"persistent":{"cluster.routing.allocation.enable":"all"}}
GET /_cluster/pending_tasks

List cluster-state update tasks still waiting on the master — mapping updates, index creation, settings changes. A long pending-tasks queue means the master is a bottleneck (often from too many shards or rapid mapping churn).

Examples
GET /_cluster/pending_tasks
Replication & snapshots (14)
number_of_replicas

Replicas per primary shard — 1 is the production minimum (data survives one node loss), 2+ for higher availability or read throughput. Set 0 ONLY for throw-away test indices.

Common pitfall: replicas=0 means a single node loss = data loss. Lots of "test" indices accidentally become "kinda production" — set replicas=1 by default in every index template.

Examples
PUT /products/_settings {"index":{"number_of_replicas":2}}
PUT /_index_template/default {"index_patterns":["*"],"template":{"settings":{"number_of_replicas":1}}}
PUT /_snapshot/<repo>

Register a snapshot repository — S3, GCS, Azure, NFS, HDFS, or shared filesystem. Required before you can take or restore any snapshot. The only built-in disaster-recovery path.

Examples
PUT /_snapshot/s3_backups {"type":"s3","settings":{"bucket":"my-es-backups","region":"us-east-1"}}
PUT /_snapshot/<repo>/<snap>

Take a snapshot. Incremental — only new segments since the last snapshot in the same repo are uploaded. Safe to run on a live cluster.

Examples
PUT /_snapshot/s3_backups/2026-05-26?wait_for_completion=false {"indices":"products,logs-*","include_global_state":false}
POST /_snapshot/<repo>/<snap>/_restore

Restore a snapshot — optionally rename indices on restore so you can compare with current data before swapping aliases. The standard "oh shit" rollback path.

Common pitfall: Restore refuses to overwrite an existing OPEN index. Either close the index first or use rename_pattern + rename_replacement to restore to a side-by-side name.

Examples
POST /_snapshot/s3_backups/2026-05-26/_restore {"indices":"products","rename_pattern":"(.+)","rename_replacement":"restored_$1"}
GET /_snapshot/<repo>/_all

List all snapshots in a repo with state, start/end time, shard counts, failures. Required for any backup audit or scripted "find latest good snapshot" logic.

Examples
GET /_snapshot/s3_backups/_all
SLM (snapshot lifecycle management)

Built-in scheduler for periodic snapshots + retention. Beats hand-rolled cron because retention is policy-driven (keep 30 daily, 12 monthly, etc.) and ES enforces it.

Examples
PUT /_slm/policy/daily {"schedule":"0 30 1 * * ?","name":"<daily-{now/d}>","repository":"s3_backups","config":{"indices":["*"]},"retention":{"expire_after":"30d","min_count":7,"max_count":50}}
cross-cluster replication (CCR)

Continuously replicate indices from a leader cluster to one or more follower clusters — DR across regions, read-scaling, or geo-locality. Platinum/Enterprise license.

Examples
PUT /products-replica/_ccr/follow?wait_for_active_shards=1 {"remote_cluster":"leader_cluster","leader_index":"products"}
ILM (index lifecycle management)

Move indices through hot → warm → cold → frozen → delete phases automatically by age, size, or doc count. Pair with rollover for time-based logging at any scale.

Examples
PUT /_ilm/policy/logs_policy {"policy":{"phases":{"hot":{"actions":{"rollover":{"max_age":"7d","max_size":"50gb"}}},"warm":{"min_age":"30d","actions":{"forcemerge":{"max_num_segments":1}}},"delete":{"min_age":"90d","actions":{"delete":{}}}}}}
DELETE /_snapshot/<repo>/<snap>

Delete a snapshot. Because snapshots are incremental, ES only frees the segments not referenced by any other snapshot in the repo — deleting an old snapshot may free far less than its nominal size.

Examples
DELETE /_snapshot/s3_backups/2026-04-01
GET /_snapshot/<repo>/<snap>/_status

Detailed, real-time progress of a running or finished snapshot — per-shard bytes done vs total, file counts, state. Use this to watch a big snapshot, not _all (which only shows coarse state).

Examples
GET /_snapshot/s3_backups/2026-05-26/_status
POST /_snapshot/<repo>/_verify · _cleanup

verify checks that every node can read/write the repository (catches credential or network problems before a backup silently fails). cleanup removes orphaned data left in the repo by interrupted snapshot deletes.

Examples
POST /_snapshot/s3_backups/_verify
POST /_snapshot/s3_backups/_cleanup
searchable snapshots

Mount a snapshot directly as a searchable index without a full restore — the data stays in object storage (S3) and is fetched on demand. Powers the ILM cold/frozen tiers, cutting storage cost for rarely queried old data. Enterprise license.

Examples
POST /_snapshot/s3_backups/2026-01/_mount?wait_for_completion=true {"index":"logs-2026.01","renamed_index":"logs-2026.01-cold"}
wait_for_active_shards (write consistency)

On a write, require N copies of the target shard to be active before ES accepts it. Default 1 (just the primary); set "all" or a quorum when a single-copy ack is not durable enough for your data.

Common pitfall: Setting it higher than the number of available copies makes writes BLOCK until timeout, then fail — do not set "all" on an index with replicas=2 unless you can guarantee all replicas are up.

Examples
PUT /orders/_doc/1?wait_for_active_shards=2 {"total":99}
allocation awareness (rack / zone)

Tell ES which rack or availability-zone each node is in, and it will spread primary + replica across zones so one zone failure never takes out both copies of a shard. Essential for multi-AZ production clusters.

Examples
# elasticsearch.yml
node.attr.zone: zone-a
cluster.routing.allocation.awareness.attributes: zone
PUT /_cluster/settings {"persistent":{"cluster.routing.allocation.awareness.attributes":"zone"}}
Common pitfalls (16)
Heap > 32GB hits compressed-oops cliff

JVM compressed object pointers stop working above ~32GB heap — pointer size doubles, GC pressure spikes, throughput drops. Stay at OR below 30-31GB heap even on a 256GB box; run multiple nodes per host instead.

Examples
# jvm.options (good)
-Xms30g
-Xmx30g
# jvm.options (bad)
-Xms64g
-Xmx64g  # crossed the cliff
Mapping explosion from dynamic:true

Default dynamic:true on user JSON auto-creates a field per unique key. Bad client = millions of fields, gigabytes of cluster state, full GC, cluster unresponsive. The #1 reason production ES dies.

Common pitfall: Always set dynamic:strict on user input. Cap with index.mapping.total_fields.limit (default 1000) and refuse writes once approached. Audit GET /_cluster/state/metadata weekly for field count.

Examples
PUT /events_strict {"mappings":{"dynamic":"strict","properties":{"@timestamp":{"type":"date"},"event":{"type":"keyword"}}}}
Deep paging with from + size

Every from + size request asks every shard for (from + size) docs, then merges. from=10000 + size=10 on 5 shards = each shard ships 10010 docs = blown heap. Default cap: index.max_result_window=10000.

Common pitfall: Never raise max_result_window to "fix" deep paging — use search_after or a point_in_time + search_after. They scale with N regardless of how deep you are.

Examples
# wrong — will blow up
GET /products/_search {"from":100000,"size":10}
# right — sort + search_after
GET /products/_search {"size":10,"sort":[{"_id":"asc"}],"search_after":["abc123"]}
Refresh interval at default 1s

Default 1s refresh is fine for low-write apps but eats CPU on heavy write/log workloads — each refresh creates new segments, which later have to be merged. For log indices, raise refresh_interval to 30s.

Examples
PUT /logs-2026.05/_settings {"index":{"refresh_interval":"30s"}}
PUT /products/_settings {"index":{"refresh_interval":"-1"}}  # disable; manual _refresh only
Disk watermark blocks writes

At 85% disk full, ES stops allocating new shards to a node. At 90% it actively relocates. At 95% (flood_stage) it locks every index on that node into read-only. Free disk fast or remove the read-only block manually.

Common pitfall: Unlock with PUT /<index>/_settings {"index.blocks.read_only_allow_delete":null} — but ONLY after freeing disk, otherwise it locks again on the next watermark check.

Examples
PUT /_cluster/settings {"persistent":{"cluster.routing.allocation.disk.watermark.low":"85%","cluster.routing.allocation.disk.watermark.high":"90%","cluster.routing.allocation.disk.watermark.flood_stage":"95%"}}
PUT /*/_settings {"index.blocks.read_only_allow_delete":null}
Searching wrong field type

term query against a text field misses everything (text was lowercased + tokenized). match query against a keyword field misses anything not exact (keyword is not analyzed). Always check GET /<index>/_mapping before writing a query.

Examples
# wrong — term on text
{"query":{"term":{"name":"Apple"}}}  # 0 hits
# right — term on keyword sub-field
{"query":{"term":{"name.keyword":"Apple"}}}
Bulk size too small / too big

Bulk requests have a sweet spot around 5-15MB. Smaller = network round-trips dominate. Bigger = HTTP 413, OOM, or coordinating node falls over. Measure your payload — pick a doc count that lands in the band.

Examples
# python elasticsearch helpers
from elasticsearch.helpers import bulk
bulk(client, actions, chunk_size=2000, max_chunk_bytes=10*1024*1024)
Split brain (pre-7.x) / quorum loss (7.x+)

Pre-7.x: misconfigured discovery.zen.minimum_master_nodes could split a cluster into two "masters". 7.x+ uses voting config (auto-managed) — safer, but you still need ≥ 3 master-eligible nodes for quorum.

Common pitfall: Two master-eligible nodes is the WORST setup — any one going down loses quorum. Use 1 (test only), 3, 5, or 7 — odd numbers ≥ 3 for production.

Examples
# 3 master-eligible nodes, no voting-only nodes
node.roles: [ master, data, ingest ]
fielddata: true on text bloats heap

Enabling fielddata:true on a text field to "make it sortable" loads every term + every doc-id into heap. Easy way to OOM a node. Instead, add a keyword sub-field and aggregate/sort on that.

Examples
# wrong
{"properties":{"name":{"type":"text","fielddata":true}}}
# right
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}
Too many shards per node

Every shard carries fixed heap and cluster-state overhead regardless of how small it is. Thousands of tiny shards bloat the master, slow recovery, and exhaust heap. Rule of thumb: aim for shards of 10-50GB and keep well under ~20 shards per GB of heap.

Common pitfall: Oversharding usually comes from "1 index per customer per day" with default shard counts. Consolidate with data streams + rollover by size, and use _shrink on old read-only indices.

Examples
GET /_cat/allocation?v&h=node,shards,disk.indices
# consolidate: rollover by 50gb instead of 1 index/day
POST /logs/_rollover {"conditions":{"max_size":"50gb"}}
Aggregating on a high-cardinality keyword

A terms agg over millions of distinct keyword values builds a huge bucket map in heap and is still only approximate at default size. It is the second most common OOM source after fielddata-on-text.

Common pitfall: For "list ALL unique values" use a composite agg (paginated, bounded memory), not a terms agg with size:1000000. For distinct COUNT only, use cardinality (HLL) instead.

Examples
# wrong — builds a giant bucket map
{"aggs":{"u":{"terms":{"field":"user_id","size":1000000}}}}
# right — paginate with composite
{"aggs":{"u":{"composite":{"size":1000,"sources":[{"uid":{"terms":{"field":"user_id"}}}]}}}}
Wildcard / regexp / leading-wildcard on big fields

wildcard:"*foo*" and unanchored regexp cannot use the sorted inverted index — they scan every term in the field. On a high-cardinality field this turns a 5ms query into a multi-second cluster-stressing scan.

Common pitfall: For "contains" search use a wildcard field type (7.9+) or an ngram analyzer at index time; for "starts with" use prefix or a completion suggester. Reserve real wildcard queries for small, low-cardinality fields.

Examples
# slow on large fields
{"query":{"wildcard":{"msg.keyword":"*timeout*"}}}
# better — index with ngram and use match
Scripts in hot query paths

Painless in a script_score, script query, or runtime field runs per matching doc, every query — handy but CPU-heavy at scale. A script that touches doc-values across millions of hits will dominate your latency.

Common pitfall: Pre-compute the value at index time into a real field whenever the inputs are known at write time. Reserve scripts for genuinely dynamic, per-request logic — and always filter down the candidate set first.

Examples
# expensive
{"query":{"script_score":{"query":{"match_all":{}},"script":{"source":"Math.log(2 + doc['votes'].value)"}}}}
# cheaper — precompute log_votes at index time, sort on it
now in a filter kills the request cache

A range filter using bare now changes value every millisecond, so the shard request cache never gets a hit. Rounding to now/d (or now/h) makes the value stable within the day/hour and lets the cache work.

Common pitfall: Always round date-math in cacheable filters: now-7d/d instead of now-7d. The difference is the entire shard request cache being usable or dead for time-window dashboards.

Examples
# cache-busting
{"filter":{"range":{"@timestamp":{"gte":"now-7d"}}}}
# cache-friendly
{"filter":{"range":{"@timestamp":{"gte":"now-7d/d"}}}}
Index-time vs search-time analyzer mismatch

If a field was indexed with one analyzer and you query it expecting another, tokens never line up and match returns nothing. Common after changing an analyzer in the mapping WITHOUT reindexing the existing data.

Common pitfall: Changing an analyzer only affects docs indexed AFTER the change. Run POST /<index>/_analyze with both analyzers on the same text to confirm the tokens match, and reindex old data if they do not.

Examples
POST /products/_analyze {"field":"name","text":"Wireless"}
Returning huge _source per hit

Pulling a 50KB _source for every one of 1000 hits ships 50MB you may not need. If you only render a title and price, project them — the network and JSON-parse cost on big hit sets is real.

Common pitfall: Use _source includes/excludes, or fields + _source:false, to return only what the UI shows. For aggregation-only requests set size:0 so no hits are returned at all.

Examples
{"size":20,"_source":["title","price"],"query":{"match_all":{}}}
{"size":0,"aggs":{"by_cat":{"terms":{"field":"category"}}}}

What this tool does

Searchable Elasticsearch cheat sheet, 80+ entries SREs type into Kibana Dev Tools. Nine sections: index management (PUT/DELETE, aliases for zero-downtime reindex, composable templates, _rollover, _reindex with slices=auto, _forcemerge, refresh/flush), documents (_doc op_type=create, _update / _update_by_query, _bulk sweet-spot, _mget, _count), mapping (text vs keyword multi-fields, date, nested vs flattened object, geo_point, dynamic:strict to stop mapping explosion), Query DSL (match, match_phrase with slop, multi_match best_fields/cross_fields, term/terms, range with date math, bool must/should/must_not/filter, wildcard/regexp/ prefix/fuzzy, exists, nested with inner_hits, geo_distance, function_score with gauss decay, highlight, search_after, scroll vs point_in_time), aggregations (terms + composite, avg/sum/stats, cardinality HyperLogLog++, date_histogram with time_zone, histogram, range, percentiles, sub-aggs, filter agg, top_hits, significant_terms), analyzers (standard, language, custom char_filter + tokenizer + token_filter, edge_ngram for autocomplete, ngram, synonym, stop, asciifolding, CJK plugins ik/kuromoji/ nori), cluster ops (_cluster/health, allocation explain, every _cat API with the right columns, _tasks _cancel), replication and DR (number_of_replicas, snapshot repo, _snapshot create/restore with rename, SLM, CCR, ILM hot/warm/cold/frozen/delete), and production pitfalls (heap > 32GB cliff, mapping explosion from dynamic:true, deep paging with from + size, default 1s refresh on heavy writes, disk watermark flood_stage read-only lock, wrong field type, bulk size sweet spot, split-brain, fielddata:true heap OOM). Every entry: command + EN/ZH description + 1-3 pasteable JSON examples + common pitfall. Pure client-side — no connection, no upload. Pair with Redis, PostgreSQL, MongoDB and Docker cheat sheets.

Tool details

Input
Text
The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output
Live result + Copy + Preview
The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy
Browser-side processing
The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share
No account required
Open the page and use it; whether results survive refresh depends on the tool.
Performance budget
Initial JS <= 32 KB
No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit
Developer & DevOps · Developer
Category and role tags drive related tools, internal links, and quick fit checks.

How to use

  1. 1. Input

    Paste or drop your content into the tool panel.

  2. 2. Process

    Click the button. All processing is local in your browser.

  3. 3. Copy / Download

    Copy the result or download to disk in one click.

How Elasticsearch Cheatsheet fits into your work

Use it in the small gaps between coding, reviewing, debugging, and shipping.

Developer jobs

  • Formatting, validating, shrinking, or inspecting code-adjacent text.
  • Preparing snippets for documentation, tickets, commits, or handoff.
  • Checking a small payload quickly without switching tools.

Developer checks

  • Run irreversible transforms like minify or obfuscate on a copy.
  • Keep secrets out of pasted snippets unless the tool explicitly stays local.
  • Use your normal tests or linter before shipping transformed code.

Good next steps

These links move the current task into a more complete workflow.

  1. 1 JSON Formatter & Validator Format, validate, and minify JSON instantly — right in your browser. Open
  2. 2 Redis Cheatsheet Redis cheat sheet — 80+ commands across strings, hashes, lists, sets, sorted sets, pub/sub, streams, scripts, with examples and pitfalls. Open
  3. 3 PostgreSQL Cheatsheet PostgreSQL cheat sheet — 80+ commands & functions for psql, JSONB, CTEs, window functions, indexing, partitioning, advanced extensions. Open

Real-world use cases

  • A term filter returns zero hits on a 40M-doc product index in prod

    A "status:active" filter that worked in staging returns nothing in prod because the field got mapped as text, not keyword, so it was lowercased and tokenized. You filter the term query entries plus the "term returns nothing" pitfall, confirm the recipe is term on status.keyword, and run POST /_analyze in Dev Tools to prove how the value tokenized. Fix shipped in ten minutes instead of an afternoon of guessing.

  • Cutting a 6-shard 80GB index over to a new mapping with no downtime

    Marketing needs name searchable as full text, but it was mapped keyword and you cannot change type in place. You pull the zero-downtime reindex entry, create products_v2, run _reindex with slices=auto and wait_for_completion=false, poll _tasks, then do the atomic _aliases remove+add swap in one payload. The 80GB cutover happens while the app keeps reading and writing through the products alias.

  • A 12-node log cluster goes red after a node hits 96% disk

    Every index on that node flips read-only and ingest stalls. You grab the disk watermark flood_stage entry, confirm 95% triggers the read-only lock, run the _cat/allocation and allocation/explain commands from the cluster ops section to find the hot node, free space, then clear the read_only_allow_delete block. The cheat sheet hands you the exact PUT _settings call so you are not editing YAML at 2am.

  • Building a top-10-sellers-per-category dashboard panel

    You need group-by category, then top sellers each with their revenue, in sub-second time over 30M orders. You filter to aggregations, copy the terms agg nested with top_hits and a sum sub-agg, paste it into Dev Tools, and tune size and shard_size from the pitfall note about terms-agg accuracy. The panel ships against live ES instead of a nightly export to a separate analytics database.

Common pitfalls

  • Running term on a text field and getting zero hits. Map strings as text plus a keyword sub-field and query term on field.keyword, or use match for full text.

  • Deep paging with from + size past 10000, which makes every shard return from+size docs and blows heap. Use search_after with a sort, or point_in_time for stable cursors.

  • Leaving refresh_interval at the 1s default on a heavy log ingest, burning most CPU on tiny segments. Raise it to 30s and force a _refresh only when you must read-your-write.

Privacy

This cheat sheet is a single static page. Search runs entirely in your browser against an in-memory array of entries, so nothing you type is sent anywhere, no Elasticsearch is contacted, and no index names, queries, or field values leave the tab. Nothing is written to the URL either, so a shared link carries no query text. Safe inside bastion-only, air-gapped, or proxied networks where installing Kibana is not an option.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Made by Toolora · 100% client-side · Updated 2026-06-13