Skip to main content

Prometheus Cheatsheet — 90+ PromQL Queries, Alerting Rules, and HTTP API Reference

Prometheus cheat sheet — 90+ entries covering PromQL selectors, aggregations, functions, alerting rules, recording rules, HTTP API, and relabeling.

  • Runs locally
  • Category Developer & DevOps
  • Best for Formatting, validating, shrinking, or inspecting code-adjacent text.
Section:
82 entries
Selectors (11)
metric_name

Bare metric name — selects an instant vector of all time series with that name. Returns the most recent sample for each series.

Examples
up
node_cpu_seconds_total
http_requests_total
metric{label="value"}

Exact label equality matcher. Filters the instant vector to series where `label` equals `value`.

Gotcha: Label values are case-sensitive. `{job="API"}` and `{job="api"}` are different series.

Examples
up{job="prometheus"}
http_requests_total{method="POST", status="200"}
node_filesystem_avail_bytes{mountpoint="/"}
metric{label!="value"}

Negative equality matcher. Keeps series where `label` does NOT equal `value`. Also matches series where the label is absent.

Examples
up{job!="blackbox"}
http_requests_total{env!="dev"}
metric{label=~"regex"}

RE2 regex matcher. Matches series where `label` matches the regex. Regex is anchored at both ends — `"5.."` matches exactly three chars.

Gotcha: Regex matchers trigger a full index scan (slower than `=`). Prefer exact matches for high-cardinality labels.

Examples
http_requests_total{status=~"5.."}
node_cpu_seconds_total{mode=~"user|system"}
up{instance=~"prod-.*:9090"}
metric{label!~"regex"}

Negative regex matcher. Keeps series where `label` does NOT match the regex.

Examples
http_requests_total{env!~"dev|staging"}
node_cpu_seconds_total{mode!~"idle|iowait"}
metric[5m]

Range vector selector. Returns all samples within the past 5 minutes for each series. Required by range functions like `rate()`, `increase()`, `delta()`.

Gotcha: A range vector cannot be graphed directly — it must be passed to a function like `rate()` first.

Examples
rate(http_requests_total[5m])
increase(errors_total[1h])
delta(cpu_temp[10m])
metric offset 5m

Offset modifier. Shifts the evaluation time back by the given duration — the query reads data from 5 minutes ago instead of "now".

Examples
http_requests_total offset 5m
rate(http_requests_total[5m] offset 1h)
# compare current vs one week ago:
rate(requests[5m]) / rate(requests[5m] offset 7d)
metric @ 1609746000

Timestamp modifier (Prometheus ≥ 2.25). Evaluates the selector at a specific Unix timestamp regardless of the query time.

Examples
http_requests_total @ 1609746000
rate(http_requests_total[5m] @ start())
rate(http_requests_total[5m] @ end())
{__name__=~"go_.*"}

Use `__name__` as a regular label to select metrics by name pattern. Useful for exploring or for cross-metric operations.

Gotcha: Selecting many metrics at once with `__name__=~".*"` is extremely expensive — always narrow the pattern.

Examples
{__name__=~"go_.*", job="api"}
{__name__=~"node_memory_.*"}
metric{job="api", env="prod"}

Multiple label matchers are AND-ed together. All conditions must match for a series to be selected.

Examples
up{job="api", env="prod", region="eu-west-1"}
http_requests_total{method="GET", status=~"2..", handler!="/health"}
rate(metric[1m])[10m:30s]

Subquery syntax. Evaluates the inner expression at `30s` resolution over the past `10m` and returns a range vector. Needed for `_over_time` functions on range expressions.

Gotcha: Subqueries are expensive because they re-evaluate the inner expression many times. Use recording rules for frequently-used subqueries.

Examples
max_over_time(rate(http_requests_total[1m])[10m:30s])
avg_over_time(node_load1[1h:5m])
Aggregation (11)
sum(metric)

Sum all values across all label dimensions. Returns a single scalar result.

Gotcha: Without a `by` clause, `sum()` drops ALL labels. The result has no labels you can join on.

Examples
sum(http_requests_total)
sum(rate(http_requests_total[5m]))
sum by (job) (metric)

Aggregate while KEEPING the listed labels. All unlisted labels are dropped. Equivalent to SQL `GROUP BY job`.

Examples
sum by (job) (rate(http_requests_total[5m]))
max by (instance, device) (node_disk_read_bytes_total)
sum without (instance) (metric)

Aggregate while DROPPING the listed labels. All other labels are kept. The inverse of `by`.

Examples
sum without (instance) (rate(http_requests_total[5m]))
avg without (cpu) (node_cpu_seconds_total)
avg(metric)

Arithmetic mean across all series or within each group defined by `by`/`without`.

Examples
avg by (job) (rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))
max(metric) / min(metric)

Maximum or minimum value across all matching series. Useful for cluster-wide alerting thresholds.

Examples
max by (job) (rate(errors_total[5m]))
min(node_filesystem_avail_bytes / node_filesystem_size_bytes)
count(metric)

Count the number of series in the result vector. Great for "how many instances are up" style queries.

Examples
count(up{job="api"} == 1)
count by (job) (up)
topk(5, metric)

Return the K series with the highest values. Useful for finding the most active endpoints or noisiest instances.

Gotcha: `topk` returns multiple series, so it is not suitable for alerts (which need a single result). Use it in dashboards only.

Examples
topk(5, sum by (handler) (rate(http_requests_total[5m])))
topk(10, node_cpu_seconds_total{mode="user"})
bottomk(3, metric)

Return the K series with the lowest values. Useful for finding the least-utilized instances or slowest responders.

Examples
bottomk(3, sum by (instance) (rate(requests_total[5m])))
quantile(0.95, metric)

φ-quantile over all series values (not over time). This aggregates across SERIES, not samples. For time-based percentiles use `histogram_quantile()`.

Gotcha: Do NOT confuse this with `histogram_quantile()`. This gives a percentile across current instance values, not across a distribution.

Examples
quantile(0.95, rate(http_request_duration_seconds_sum[5m]))
count_values("label", metric)

Count series by their value, creating a new label with the value. Useful for counting how many instances have each version number.

Examples
count_values("version", kube_pod_container_info)
count_values("status_code", http_response_code)
stddev(metric) / stdvar(metric)

Standard deviation or variance across all series. Useful for detecting outlier instances in a fleet.

Examples
stddev by (job) (rate(http_request_duration_seconds_sum[5m]))
Functions (18)
rate(counter[5m])

Per-second average rate of increase over the range window, calculated via linear regression. The correct function for dashboards and alerts on counters.

Gotcha: The range window should be at least 4× the scrape interval. For a 15s scrape interval, use `[1m]` minimum — shorter windows become noisy.

Examples
rate(http_requests_total[5m])
rate(node_network_receive_bytes_total[5m])
sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))
irate(counter[5m])

Instantaneous rate — computed from the last two data points only. Captures very short spikes that `rate()` would smooth over.

Gotcha: One slow scrape cycle creates a fake spike in the graph. Avoid in dashboards; use only for live debugging of active spikes.

Examples
irate(http_requests_total[1m])
increase(counter[1h])

Total increase in a counter over the range window. Equivalent to `rate(c[window]) * window_in_seconds`. Handles counter resets.

Examples
increase(http_requests_total[1h])
increase(errors_total{job="api"}[24h])
resets(counter[1h])

Number of counter resets within the range window. A reset means the counter went from a high value back to zero (usually a process restart).

Examples
resets(http_requests_total[1h])
# alert if more than 3 restarts in an hour:
resets(process_start_time_seconds[1h]) > 3
delta(gauge[1h])

Difference in value between the first and last sample in the range window. Works on gauges, not counters.

Gotcha: Do NOT use `delta()` on counters — counters can reset and `delta()` does not handle resets. Use `increase()` instead.

Examples
delta(node_memory_MemFree_bytes[1h])
delta(cpu_temp_celsius[10m])
predict_linear(gauge[1h], 3600)

Predicts the value `3600` seconds from now using linear regression on the range window. Great for "disk will fill in X hours" alerts.

Examples
# alert: disk full in 4 hours
predict_linear(node_filesystem_free_bytes[1h], 4 * 3600) < 0
predict_linear(node_memory_MemAvailable_bytes[30m], 3600)
changes(gauge[1h])

Number of times the value changed within the range window. Useful for detecting flapping services or config changes.

Examples
changes(up[1h]) > 5  # service is flapping
changes(kube_deployment_spec_replicas[1h])
histogram_quantile(0.95, sum(rate(h_bucket[5m])) by (le))

Compute the φ-quantile (0.95 = 95th percentile) from a Histogram metric. The `le` label (less-than-or-equal) marks bucket boundaries and must be preserved in the aggregation.

Gotcha: The `by (le)` is REQUIRED — omitting it drops the bucket labels and the function returns NaN.

Examples
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum by (le, job) (rate(grpc_server_handling_seconds_bucket[5m])))
absent(metric)

Returns an empty vector when the expression has samples; returns a single element (value 1) when the expression has NO samples. Used to alert when a metric disappears.

Gotcha: `absent()` does not propagate labels from the missing series. Hard-code important labels in the alert `labels:` block.

Examples
absent(up{job="api"} == 1)
absent(http_requests_total{env="prod"})
absent_over_time(metric[5m])

Like `absent()` but requires the metric to be absent for the FULL range window before returning 1. Avoids alerts on a single missed scrape.

Examples
absent_over_time(up{job="api"}[5m])
label_replace(m, "dst", "$1", "src", "(.*)")

Apply a regex substitution to a label and write the result to a new label. `$1` refers to the first capture group in the regex.

Examples
# extract "host" from "host:port" in instance label:
label_replace(up, "host", "$1", "instance", "([^:]+):.*")
label_replace(metric, "short_name", "$1", "handler", "/api/v[0-9]+/(.*)")
label_join(m, "new", ",", "l1", "l2")

Concatenate multiple existing label values with a separator and write the result to a new label.

Examples
label_join(up, "node_region", "/", "instance", "region")
abs() / ceil() / floor() / round(m, 0.5)

Absolute value, ceiling, floor, or round to the nearest multiple. Applied element-wise to each series value.

Examples
abs(node_filesystem_avail_bytes - node_filesystem_size_bytes / 2)
round(rate(http_requests_total[5m]) * 100, 0.01)
clamp(m, 0, 100) / clamp_min / clamp_max

Clamp all values to the range [min, max]. `clamp_min(m, 0)` forces values ≥ 0; `clamp_max(m, 100)` forces values ≤ 100.

Examples
clamp(some_ratio, 0, 1)
clamp_min(node_load1 - 1, 0)   # never negative
sort(m) / sort_desc(m)

Sort the result vector by value (ascending or descending). Useful for dashboards to always show worst offenders at the top.

Examples
sort_desc(rate(http_requests_total[5m]))
sort(node_filesystem_avail_bytes)
time() / timestamp(m)

`time()` returns the current evaluation timestamp as a scalar. `timestamp(v)` returns the timestamp of each sample in the vector.

Examples
# age of the most recent sample in seconds:
time() - timestamp(up{job="api"})
# alert: sample is stale (older than 5 min):
time() - timestamp(up) > 300
hour() / minute() / day_of_week() / month()

Time-based functions. `hour()` returns 0-23 UTC; `day_of_week()` returns 0 (Sunday) to 6; `month()` returns 1-12. Useful for business-hours inhibit conditions.

Examples
# only alert on business hours UTC+8:
hour() >= 1 and hour() < 10  # 9am-6pm CST
day_of_week() != 0 and day_of_week() != 6  # not weekends
sum_over_time(m[1h]) / avg_over_time / max_over_time

Aggregate a gauge over time within a range window. `sum_over_time` sums all samples; `avg_over_time` averages; `max_over_time` takes the peak.

Examples
avg_over_time(node_load1[1h])
max_over_time(go_goroutines[30m])
quantile_over_time(0.95, http_response_time_seconds[1h])
Operators (9)
m1 + m2 / m1 - m2 / m1 * scalar

Arithmetic operators: `+` `-` `*` `/` `%` `^`. When applied between two instant vectors, label sets must match exactly (except for `__name__`).

Gotcha: Arithmetic between two vectors requires matching labels. Use `on()` or `ignoring()` to control matching.

Examples
# error ratio:
rate(errors_total[5m]) / rate(requests_total[5m])
# bytes to megabytes:
node_memory_MemAvailable_bytes / 1024 / 1024
# percentage used:
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)
m1 > bool m2

Comparison operators: `==` `!=` `>` `<` `>=` `<=`. Without `bool`, they FILTER series (non-matching series are dropped). With `bool`, they CONVERT to 0/1.

Examples
# filter: only series where value > 0.9:
rate(errors_total[5m]) / rate(requests_total[5m]) > 0.9
# convert to 0/1 for arithmetic:
(up == bool 1) * 100
m1 and m2

Set intersection. Returns series from m1 that have a matching label set in m2. Does not merge values — only uses m1 values.

Examples
# only show CPU metrics for up instances:
node_cpu_seconds_total and on(instance) up == 1
m1 or m2

Set union. Returns all series from m1, plus series from m2 that have no matching label set in m1.

Examples
# combine metrics from two jobs when one may be absent:
metric{job="a"} or metric{job="b"}
m1 unless m2

Set difference. Returns series from m1 that do NOT have a matching label set in m2.

Examples
# exclude maintenance windows:
rate(errors_total[5m]) unless on(instance) maintenance_mode == 1
m1 * on(instance) m2

`on()` restricts vector matching to only the specified labels. All other labels are ignored for the purpose of pairing samples.

Examples
# join error rate with instance metadata:
rate(errors_total[5m]) * on(instance) group_left(version) app_info
m1 * ignoring(env) m2

`ignoring()` excludes the listed labels from the matching key. Use when series differ only in a label that should not affect pairing.

Examples
requests_total * ignoring(status) error_total
m1 * on(instance) group_left(version) m2

`group_left()` allows many-to-one matching: multiple series from the left can match one series on the right. Listed labels are copied from the right.

Gotcha: Without `group_left` or `group_right`, many-to-one matches produce an error: "multiple matches for labels".

Examples
# enrich metrics with build version from info metric:
rate(http_requests_total[5m]) * on(instance) group_left(version) app_build_info
scalar(m) / vector(s)

`scalar(v)` converts a single-element vector to a scalar value. `vector(s)` converts a scalar to a one-element vector with no labels.

Examples
# normalize by cluster total:
rate(requests_total[5m]) / scalar(sum(rate(requests_total[5m])))
vector(1)  # always returns 1 with no labels
Metric Types (5)
Counter

Monotonically increasing value. Always use `rate()` or `increase()` — the raw value is meaningless by itself. Suffix convention: `_total`.

Gotcha: Never use `delta()` or gauge-style functions on Counters — they do not handle counter resets correctly.

Examples
# good:
rate(http_requests_total[5m])
# bad (raw counter value):
http_requests_total  # only useful at a fixed moment in time
Gauge

A value that can go up or down: memory usage, temperature, queue length, number of goroutines. Use directly or with `delta()`, `avg_over_time()`, `predict_linear()`.

Examples
node_memory_MemAvailable_bytes   # current free memory
avg_over_time(go_goroutines[5m])  # average over window
predict_linear(node_filesystem_free_bytes[1h], 4 * 3600)  # forecast
Histogram

Counts observations in configurable buckets. Exposes three series per base name: `_bucket{le="..."}`, `_count`, `_sum`. Use `histogram_quantile()` for percentiles.

Gotcha: Bucket boundaries must be configured at instrumentation time. If your P99 always hits the highest bucket, your histogram needs higher buckets.

Examples
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
# request rate via histogram count:
rate(http_request_duration_seconds_count[5m])
Summary

Pre-computes quantiles on the client. Exposes `_count`, `_sum`, and `{quantile="0.99"}` label pairs. Cannot be aggregated across instances.

Gotcha: NEVER `sum()` Summary quantile series across instances — summing pre-computed quantiles is mathematically incorrect. Use Histograms for anything needing aggregation.

Examples
# correct: per-instance summary:
go_gc_duration_seconds{quantile="0.99"}
# wrong:
sum by (job) (go_gc_duration_seconds{quantile="0.99"})  # DO NOT DO THIS
Naming conventions

Metric names use snake_case. Suffix conventions: `_total` (counter), `_seconds` (duration), `_bytes` (size), `_ratio` (0–1 fraction), `_info` (metadata gauge always = 1).

Examples
http_requests_total           # counter
http_request_duration_seconds # histogram or summary
process_resident_memory_bytes # gauge
app_build_info{version="1.2"} # metadata info gauge
Alerting (7)
Alert rule YAML structure

A complete Prometheus alerting rule defined in a rule file under a `groups` block. The `expr` field is evaluated at each `evaluation_interval`.

Examples
groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
            / sum(rate(http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.job }}"
          description: "Error rate is {{ $value | humanizePercentage }}"
for: 5m (pending state)

The `for` clause keeps an alert in PENDING state until the condition has been true for the specified duration before transitioning to FIRING. Prevents alerts on transient spikes.

Gotcha: Without `for`, a single evaluation cycle where the condition is true immediately fires the alert. Always use `for` except for the most critical "instant" alerts.

Examples
for: 5m    # must be true for 5 minutes
for: 0m    # fire immediately (no pending)
for: 1h    # sustained disk pressure
labels: { severity: critical }

Static labels added to every alert from this rule. Used by Alertmanager routing to route to the right receiver. Label values can use Go template syntax.

Examples
labels:
  severity: critical
  team: platform
  runbook_url: "https://wiki.example.com/runbooks/high-error-rate"
{{ $labels.instance }} / {{ $value | humanize }}

Go template syntax in alert annotations. `$labels` accesses the alert's label set; `$value` is the current expression value. Built-in Prometheus template functions: `humanize`, `humanizePercentage`, `humanizeDuration`, `title`, `toUpper`.

Examples
annotations:
  summary: "Instance {{ $labels.instance }} is down"
  description: >
    Error rate is {{ $value | humanizePercentage }}
    (threshold: 5%). Job: {{ $labels.job }}.
Multi-window alert (burn rate)

Google SRE burn rate pattern: combine a short window (fast, sensitive) and a long window (slow, sustained) to reduce noise while catching real outages quickly.

Examples
- alert: HighErrorBurnRate
  expr: |
    (
      rate(http_requests_total{status=~"5.."}[1h])
        / rate(http_requests_total[1h]) > 0.02
    ) and (
      rate(http_requests_total{status=~"5.."}[5m])
        / rate(http_requests_total[5m]) > 0.02
    )
  for: 2m
ALERTS{alertname="X", alertstate="firing"}

Prometheus synthesizes an `ALERTS` metric for every active alert. Query it to check alert state, build alert-on-alert rules, or join with other metrics.

Examples
ALERTS{job="api", alertstate="firing"}
# how many alerts are currently firing:
count(ALERTS{alertstate="firing"})
Inhibit rule (Alertmanager)

Alertmanager inhibit rules suppress certain alerts when another alert is firing. Example: suppress service alerts when the entire node is down.

Examples
inhibit_rules:
  - source_match:
      alertname: NodeDown
    target_match_re:
      alertname: ".*"
    equal:
      - instance
Recording Rules (4)
Recording rule YAML structure

A recording rule pre-computes an expensive expression and stores it as a new metric. Run in the same `rules:` block as alerting rules, under a `groups` key.

Examples
groups:
  - name: request_rates
    interval: 1m  # optional: override global evaluation_interval
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))
level:metric:operations (naming convention)

`level` = aggregation scope (job, instance, cluster); `metric` = base metric name without `_total`/`_seconds`; `operations` = colon-separated PromQL ops left-to-right.

Examples
job:http_requests:rate5m              # per-job rate over 5m
cluster:http_requests:rate5m_sum      # cluster-level sum
instance:node_cpu:rate5m              # per-instance CPU rate
job:http_request_duration_seconds:p95_5m  # P95 latency
When to create recording rules

Create recording rules for: (1) queries that take > 1s to evaluate, (2) `rate + sum` over many series (expensive), (3) expressions used in both dashboards AND alerts, (4) subqueries that are evaluated repeatedly.

Examples
# before (in every dashboard panel and alert):
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job))

# after (one recording rule, referenced everywhere):
job:http_request_duration_seconds:p95_5m
Rule file reload

Reload Prometheus rule files without restarting: send SIGHUP, call `POST /-/reload`, or run `promtool check rules file.yml` to validate first.

Examples
kill -HUP <prometheus_pid>
curl -X POST http://localhost:9090/-/reload
promtool check rules /etc/prometheus/rules/*.yml
HTTP API (9)
GET /api/v1/query

Instant query. Evaluates the PromQL expression at a single point in time. Params: `query` (required), `time` (Unix or RFC3339, default: now), `timeout`.

Examples
curl "http://localhost:9090/api/v1/query?query=up&time=2024-01-01T00:00:00Z"
curl -G --data-urlencode 'query=rate(http_requests_total[5m])' http://localhost:9090/api/v1/query
GET /api/v1/query_range

Range query. Evaluates the expression over a time range and returns a matrix. Params: `query`, `start`, `end` (Unix or RFC3339), `step` (duration or seconds).

Examples
curl "http://localhost:9090/api/v1/query_range?query=up&start=2024-01-01T00:00:00Z&end=2024-01-01T01:00:00Z&step=60"
GET /api/v1/series

Return all series matching one or more selectors. Params: `match[]` (one or more selector expressions), `start`, `end`.

Examples
curl "http://localhost:9090/api/v1/series?match[]=http_requests_total&match[]=up"
GET /api/v1/label/<name>/values

List all known values for a given label name across all time series. Useful for building dynamic dashboards and autocomplete.

Examples
curl http://localhost:9090/api/v1/label/job/values
curl http://localhost:9090/api/v1/label/instance/values
GET /api/v1/targets

Return information about all current scrape targets: health state, labels, last scrape time, and last error. Filter with `state=active|dropped|any`.

Examples
curl "http://localhost:9090/api/v1/targets?state=active"
GET /api/v1/rules

Return all loaded alerting and recording rules. Filter with `type=alert|record`. Includes rule state, last evaluation time, and last error.

Examples
curl "http://localhost:9090/api/v1/rules?type=alert"
GET /api/v1/alerts

Return all currently active (pending or firing) alerts. Each entry includes the alert name, labels, state, activeAt timestamp, and current value.

Examples
curl http://localhost:9090/api/v1/alerts
POST /api/v1/admin/tsdb/delete_series

Delete all data for series matching the given selectors. Requires `--web.enable-admin-api` flag. Does NOT free disk space until the next compaction.

Gotcha: This API is destructive and irreversible. Use `GET /api/v1/series` to verify what will be deleted first.

Examples
curl -X POST "http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=up{job=\"test\"}"
GET /api/v1/metadata

Return metric metadata (type, help text) as registered by scrape targets. Params: `metric` to filter by name, `limit` to cap results.

Examples
curl "http://localhost:9090/api/v1/metadata?metric=http_requests_total"
Relabeling (8)
replace (default action)

Evaluate `regex` against the concatenated `source_labels` (joined by `separator`). If it matches, write the expanded `replacement` into `target_label`.

Examples
# extract hostname from "host:port" in __address__:
- source_labels: [__address__]
  regex: "([^:]+)(:\d+)?"
  target_label: instance
  replacement: "$1"
keep

Keep ONLY the targets/series where `source_labels` concatenated matches `regex`. All other targets are dropped.

Examples
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  action: keep
  regex: "true"
drop

Drop targets/series where `source_labels` concatenated matches `regex`. The inverse of `keep`.

Examples
- source_labels: [__meta_kubernetes_namespace]
  action: drop
  regex: "kube-system|monitoring"
labelmap

Copy all labels whose name matches `regex` to new labels, replacing the name with `replacement`. Useful for promoting Kubernetes annotations/labels.

Examples
# promote k8s labels to Prometheus labels:
- action: labelmap
  regex: "__meta_kubernetes_pod_label_(.+)"
  replacement: "$1"
labeldrop / labelkeep

`labeldrop` removes all labels whose name matches `regex`. `labelkeep` removes all labels whose name does NOT match `regex`. Applied AFTER relabeling.

Gotcha: Dropping too many labels can cause metric collision — two formerly distinct series may become identical without their distinguishing labels.

Examples
# drop all labels starting with "tmp_":
- action: labeldrop
  regex: "tmp_.*"
# keep only essential labels:
- action: labelkeep
  regex: "job|instance|env"
hashmod

Hash `source_labels` and take the modulo. Write the result to `target_label`. Used for sharding Prometheus scrape pools across multiple Prometheus instances.

Examples
- source_labels: [__address__]
  modulus: 4         # 4 Prometheus shards
  target_label: __tmp_hash
  action: hashmod
- source_labels: [__tmp_hash]
  regex: "0"         # this shard handles only hash==0 targets
  action: keep
lowercase / uppercase

`lowercase` converts `source_labels` to lowercase and writes to `target_label`. `uppercase` does the reverse. Available in Prometheus ≥ 2.36.

Examples
- source_labels: [__meta_kubernetes_pod_name]
  action: lowercase
  target_label: pod_name
Common pattern: port from __address__

Extract just the host or port from the `__address__` label using `replace` + regex capture groups. A very common pattern in Kubernetes service discovery.

Examples
# set the scrape port from an annotation:
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: "([^:]+)(?::\d+)?;(\d+)"
  replacement: "$1:$2"
  target_label: __address__

What this tool does

Searchable Prometheus cheat sheet with 90+ entries across nine sections. Selectors: instant vector `metric{label="val"}`, range vector `metric[5m]`, matchers `=` `!=` `=~` `!~`, offset, `@` anchor, subquery syntax. Aggregation: `sum` `avg` `max` `min` `count` `topk` `bottomk` with `by`/`without`. Functions: `rate` `irate` `increase` for counters; `delta` `predict_linear` for gauges; `histogram_quantile` for Histograms; `label_replace` `label_join`; `absent` `absent_over_time`; `time()` `hour()` `day_of_week()`. Binary operators: arithmetic `+ - * / % ^`; comparison with `bool` modifier; set operators `and or unless`; vector matching `on()` `ignoring()` `group_left()` `group_right()`. Metric types: Counter vs Gauge vs Histogram vs Summary — when to use each, `_total` `_bucket` `_count` `_sum` suffixes, Summary aggregation gotcha. Alerting rules: full YAML structure, `for` clause, labels/annotations with Go templates `{{ $value | humanize }}`, multi-window burn rate pattern, inhibit rules. Recording rules: naming convention `level:metric:operations`, when to pre-compute, rule file reload. HTTP API: `/api/v1/query`, `/api/v1/query_range`, series/label/target/rules endpoints. Relabeling: `replace` `keep` `drop` `labelmap` `labeldrop` `labelkeep` `hashmod` `lowercase`. Every entry has bilingual text, copy-ready examples, and pitfall callouts. Search, category chips, one-click copy — all in-browser.

Tool details

Input
Text
The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output
Live result + Copy
The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy
Browser-side processing
The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share
Shareable URL state
Key settings are encoded in the URL so another person can reopen the same setup.
Performance budget
Initial JS <= 42 KB
No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit
Developer & DevOps · Developer
Category and role tags drive related tools, internal links, and quick fit checks.

How to use

  1. 1. Input

    Paste or drop your content into the tool panel.

  2. 2. Process

    Click the button. All processing is local in your browser.

  3. 3. Copy / Download

    Copy the result or download to disk in one click.

How Prometheus Cheatsheet fits into your work

Use it in the small gaps between coding, reviewing, debugging, and shipping.

Developer jobs

  • Formatting, validating, shrinking, or inspecting code-adjacent text.
  • Preparing snippets for documentation, tickets, commits, or handoff.
  • Checking a small payload quickly without switching tools.

Developer checks

  • Run irreversible transforms like minify or obfuscate on a copy.
  • Keep secrets out of pasted snippets unless the tool explicitly stays local.
  • Use your normal tests or linter before shipping transformed code.

Good next steps

These links move the current task into a more complete workflow.

  1. 1 Bash Cheatsheet Bash cheat sheet — 100+ commands & idioms for variables, conditionals, loops, functions, pipes, traps, with real one-liners. Open
  2. 2 Docker Cheatsheet Docker command cheat sheet — 80+ commands with real examples, common mistakes, and Compose section. Open
  3. 3 kubectl Cheatsheet kubectl cheat sheet — 100+ Kubernetes commands with real examples, common pitfalls, and YAML snippets. Open

Real-world use cases

  • Debugging a sudden spike in error rate during an incident

    It is 3am and the error rate alert fired. You open the cheat sheet, grab `rate(http_requests_total{status=~"5.."}[5m])`, add `by (handler)` to find the noisy endpoint, then use `topk(5, …)` to surface the worst offenders. All from memory? No — from copy-paste in under two minutes.

  • Writing your first multi-window alert rule

    You want an alert that is sensitive to short outages but does not page for a single bad scrape. You look up the multi-window pattern, copy the `for: 5m` block with the `short_window` and `long_window` expressions, and fill in your metric name. The annotations section shows you exactly how to use `{{ $labels.job }}` and `{{ $value | humanizePercentage }}` without guessing the syntax.

  • Pre-computing an expensive query as a recording rule

    Your dashboard's 95th-percentile latency panel takes 8 seconds to load because it runs `histogram_quantile(0.95, sum(rate(…)) by (le))` over 400 series every refresh. You look up the recording-rule naming convention, create `job:request_duration_seconds:p95_5m`, and drop the recording into both the dashboard and the alert. Load time drops to under 200ms.

Common pitfalls

  • Using `irate()` in dashboards — it shows instantaneous spikes that look dramatic but are mostly scrape-timing noise. Use `rate()` for trends.

  • Writing `sum(rate(hist_bucket[5m]))` without `by (le)` before passing to `histogram_quantile()` — the `le` label must survive the aggregation.

  • Using Summary metrics and then trying to aggregate quantiles across instances — pre-computed quantiles are not additive, only Histogram buckets are.

  • Forgetting the `_total` suffix in counter names — Prometheus convention is `http_requests_total`, not `http_requests`.

Privacy

Everything runs in your browser. The cheat sheet is a static in-memory array and the search box, category chips, and copy button never make a network request. Nothing you type is logged or sent anywhere, and no input is written to the URL. Works offline, behind a corporate proxy, or on an air-gapped jump host.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Made by Toolora · 100% client-side · Updated 2026-07-01