SIEM design — two planes: config-as-code + operational query/act
Design for the Google SecOps SIEM surface of secopsctl. SIEM splits into two
planes that need fundamentally different models. All identifiers here are
placeholders; the public repo stays tenant-neutral.
The split. Config is desired state — rules, lists, tables, parsers, feeds, dashboards. It’s detection-as-code: pull → review in
git diff→ push, reconciled. Operational is live data — events, alerts, cases. You don’t reconcile a case from a file; you query a subset and act on it, the way a SOC analyst triages. Two planes, two models, one CLI.
secopsctl
├── CONFIG plane (desired state, files, git) pull → diff → push (reconcile)
│ rules · reference_lists · data_tables · feeds · parsers · dashboards · curated …
│
└── OPERATIONAL plane (live data, no files) query → review → act
events (read-only) · alerts (triage) · cases (triage) · entities/iocs (enrich)
Auth note: the SIEM API needs ADC / SECOPS_ACCESS_TOKEN (the SOAR AppKey does
NOT work here). The official v1alpha REST surface HTTP-500s intermittently
(Google is still building SecOps); surface a clean error on 500, never retry
forever (see CLAUDE.md).
Plane 1 — Config as code (reconcile)
The user’s framing: “rules, curated rules, dashboards, etc. — similar things.”
They are. This plane reuses the proven product-neutral reconcile engine
(internal/mirror/reconcile, already shipped for SOAR + SIEM reference_lists):
identity + canonical diff + redaction + additive/--prune guard.
| Surface | Shape | Plan |
|---|---|---|
rules (YARA-L) |
source .yaral + deployment state machine |
bespoke (two-resource; keep push rules-create/disable, extend with update/enable/retrohunt) — not a single canonical body |
reference_lists |
typed, .txt+.yaml |
done (engine, NoDelete) |
data_tables |
.csv+.yaml, rows via a separate API |
done + live-validated (engine, push data_tables): columns immutable after create (update rejects a column change); rows = wholesale ReplaceDataTableRows; not prune-eligible (whole-table delete is high-blast). Gated write smoke TestLiveReconcileDataTableWriteSmoke passed on the tenant |
feeds |
typed, secrets in settings |
engine surface; redact on pull, overlay on update; resolve the assetNamespace(read) vs namespace(write) mismatch with a live smoke first |
parsers |
versioned/immutable (create new, no update) | engine surface, Create+Delete only (no update) |
dashboards (native) |
typed, charts as JSON | engine surface, full CUD |
curated / curated_rules |
Google-managed (read-mostly) | read + enable/disable/alerting toggle + exclusions only — not full CUD |
watchlists, rule_exclusions, forwarders, log_pipelines |
typed | engine surfaces where per-object CUD fits |
Discipline (same as SOAR, proven): workflow-spec the shape → verify SDK
signatures by hand → wire as a reconcile.Surface → live read-validate →
gated write-smoke on an inert throwaway. The SIEM write-smoke harness lives in
internal/mirror/reconcile_smoke_siem_test.go, gated by SECOPS_SIEM_SMOKE (read
round-trip of every SIEM surface) and SECOPS_SIEM_SMOKE_WRITE (the
create/update/delete cycle). No surface is trusted for --yes until its write loop
is live-validated.
Plane 2 — Operational query/act (the SOC workflow)
This is the part that needs the new design. Events/alerts/cases are live security data, not desired state. The loop is query → review → act on each or a subset — exactly how an analyst triages. The SDK is largely already built; this plane is about the operator model and safety, not new API code.
The three act surfaces (and one read-only)
| Domain | Query (read) | Act (mutate) | Mutability |
|---|---|---|---|
| events (UDM) | SearchUDM / NLSearch / GetStats / FindUDMFieldValues |
— | immutable telemetry — read-only, never mutate |
| alerts | GetAlerts (list) · GetAlert · ListDetections · SearchRuleAlerts |
UpdateAlert · BulkUpdateAlerts (status / verdict / priority / reason / comment) |
per-item + subset |
| cases (SIEM, UUID) | ListCases / SearchCases (filter) · GetCase · GetCases |
PatchCase (+etag/updateMask) · MergeCases · BulkClose/Assign/AddTag/ChangePriority/ChangeStage/Reopen |
per-item + subset |
| entities / IoCs | SummarizeEntity · ListIoCs · FetchAssociatedInvestigations |
— | enrichment — read-only |
SIEM cases and SOAR cases are TWO VIEWS OF THE SAME CASE — not two systems. Google SecOps = Chronicle (SIEM) + Siemplify (SOAR) merged; cases originate in the SOAR case-management engine and are surfaced natively in the SIEM. The proof is in the wire: a case carries both ids and
legacyBatchGetCasesreturnssoarPlatformInfo.caseId— the bridge between them.
SOAR case (Siemplify) SIEM case (Chronicle/SecOps) id integer (e.g. 234)UUID (resource name) api · auth /api/external/v1/cases· AppKeyv1beta cases+ v1alphalegacy:legacyListCases· ADCrole the engine — alerts→case, playbooks, tasks, queue, stages, close the first-class SecOps cases surface (get/list/patch/merge/bulk) — the convergence layer CLI soar case list/get(read) ·soar case <verb>(act)cases …today mature, reliable newer, flaky (v1beta 500 / v1alpha 404 observed) They stay separate command trees (different id, api, reliability), bridged by
soarPlatformInfo.caseIdonly when correlating. The SIEM-UUID api is the forward-looking unified one; the SOAR-integer api is what works today.
The query model
Every list/search command shares: a filter, a time window, a limit, pagination, and an output format.
secopsctl query udm '<udm filter>' [--hours N | --from TS --to TS] [--limit N] [--json] # events (exists)
secopsctl search nl '<question>' [--hours N] [--limit N] [--json] # NL → UDM → search
secopsctl stats '<query>' [--hours N] # aggregations
secopsctl alerts list [--filter EXPR] [--hours N] [--state OPEN|CLOSED] [--limit N] [--json]
secopsctl cases list [--filter EXPR] [--status …] [--priority …] [--limit N] [--json]
secopsctl entity summarize <ip|domain|hash|user> <value> [--hours N]
secopsctl iocs list [--prioritized] [--hours N] [--limit N] [--json]
- Default output is a compact table (id, key fields, status/time) for humans;
--jsonemits the raw objects for scripting and piping into an act command. --limitis mandatory-with-a-default (e.g. 100) so a query never pulls the whole tenant by accident; large pulls require an explicit large--limit.
The act model — single + subset, safe by construction
Two ways to act, mirroring how SOC consoles work (open one, or select rows →
bulk action). Both are guarded exactly like push: LIVE banner, dry-run by
default, real apply needs --yes.
1. Per-item — unambiguous, low blast radius:
secopsctl alerts update <id> --verdict FALSE_POSITIVE --priority LOW [--comment "…"]
secopsctl cases comment <id> "triaged: benign"
secopsctl cases assign <id> --user <analyst>
secopsctl cases close <id> --reason NOT_MALICIOUS --root-cause "…"
2. Subset (bulk) — the dangerous one; two selection paths, safest first:
- Reviewed-ids (preferred). Query → eyeball → act on the explicit set:
secopsctl alerts list --filter '…' --json | jq -r '.[].id' > ids.txt # review the set secopsctl alerts bulk close --ids @ids.txt --reason FALSE_POSITIVE --yesThe operator reviewed exactly what they’re acting on.
--idsaccepts1,2,3or@file. - Filter-in-one-shot (convenient, gated harder).
--filteron a bulk verb is dry-run-first, always: it prints the match count + a sample and refuses to mutate until re-run with--yes, and a--limitcaps the blast radius (refuse if the match set exceeds it unless--limitis raised explicitly):secopsctl cases bulk close --filter 'rule="<noisy>" AND priority=LOW' --reason FALSE_POSITIVE --dry-run → "MATCHES 412 cases (cap 100). Sample: …. Re-run with --yes --limit 500 to apply."
Guard summary (one rule): no operational mutation runs without an explicit
--yes; any --filter-driven bulk shows the count + sample first and is
--limit-capped. A live-data mutation is treated as a production deploy, same
as a config push.
Command tree
Designed shape. Built today: query udm, cases list/get/search. Everything
else here (incl. alerts …, cases <verb>, cases bulk) is the planned model, not
yet wired — authoritative per-command status is in CATALOG.md.
secopsctl query udm | search nl | stats | iocs list | entity summarize # read
secopsctl alerts list | get | update | bulk <close|verdict|priority|comment>
secopsctl cases list | get | search | comment | assign | tag | priority | stage | close | reopen | merge
+ cases bulk <close|assign|tag|priority|stage|reopen> # subset (--ids/--filter, guarded)
Cross-cutting
- etag / optimistic concurrency on
PatchCaseand alert updates — round-trip the stored etag; on mismatch surface a clean conflict (a teammate edited it), never silently overwrite. Same rule as config. - Idempotent reads, audited writes. Reads are free; every mutation prints what it touched (ids + the change) so the action is reviewable after the fact.
- Output for pipelines.
--jsonis the contract between query and act:list --json | jq | bulk --ids @-. Tables are for humans only. - Reliability. On a v1alpha 500, fail the command cleanly with the request id; do not retry a mutation (risk of double-apply). Reads may retry idempotently.
- No reconcile for live data. Events/alerts/cases are not snapshotted to files
for
git diff; that would imply a desired state they don’t have. (A read-only export of a query result to JSON is fine — it’s a report, not a mirror.)
First implementation wave — SIEM cases (operational)
Two case paths, one case. The SIEM-native cases collection below (v1beta) is new and returns intermittent 5xx; the reliable, complete path for case operations is the SOAR AppKey API —
soar case <verb>plus thesoar case list/getreads (ListCaseCards/GetCaseFullDetails, which also returns the case’s alerts). The SOAR reads are wired (soar case list/get), completing the reliable query → review → act loop; the SIEM-nativecases/alertscommands here are the unified view for when that collection stabilizes.
Decided: the subset-act model is both paths (reviewed---ids preferred,
--filter gated dry-run-first + --limit-capped), and the first wave is case
management — the full triage lifecycle.
# query
secopsctl cases list [--filter EXPR] [--status …] [--priority …] [--limit 100] [--json] # ListCases / ListCasesOpts
secopsctl cases search '<expr>' [--hours N] [--json] # SearchCases
secopsctl cases get <uuid> [--expand alerts|events] # GetCase
# per-item act (guarded: dry-run default, --yes to apply)
secopsctl cases comment <uuid> "<text>"
secopsctl cases assign <uuid> --user <analyst>
secopsctl cases tag <uuid> --tag <t>
secopsctl cases priority <uuid> --priority <…>
secopsctl cases stage <uuid> --stage <…>
secopsctl cases close <uuid> --reason <…> --root-cause "<…>"
secopsctl cases reopen <uuid> --comment "<…>"
secopsctl cases merge --into <uuid> --ids <a,b,c>
# subset act (guarded; --ids reviewed-set OR --filter dry-run-first + --limit cap)
secopsctl cases bulk <close|assign|tag|priority|stage|reopen> [--ids 1,2|@file | --filter EXPR] … [--yes] [--limit N]
Wiring: single-field edits go through PatchCase (etag + updateMask — round-trip
the etag, surface conflicts); merge → MergeCases; every bulk verb → the
matching Bulk* SDK method. Reuse the liveBanner + dry-run/--yes guard from
push. A cases <verb> shares the casesOps plumbing; cases bulk adds the
--ids/--filter+--limit selection on top.
Build discipline (same as SOAR, and the gate is real here): SIEM needs a token
(SECOPS_ACCESS_TOKEN; ADC is restricted) and the v1alpha surface 500s — so the
query/read layer + --dry-run previews are built and validated first (safe), and
no --yes bulk close/assign is trusted until a live smoke closes→reopens a
single throwaway-safe case (or runs against a non-prod instance). Until a token is
available, this wave ships read + dry-run only.
Non-goals
- No mutation of events (immutable telemetry) and no bulk delete of live data.
- No mixing planes: config stays reconcile (files/git), operational stays query/act (live). The dual SIEM-UUID / SOAR-int case worlds stay separate.
- No
--yes-by-default anywhere;--filterbulk is always dry-run-first.