Skip to main content

Update the node in a controlled rollout

  1. Verify the environment is currently healthy before changing anything.
  2. Record the active mode, build alignment, and readiness snapshot.
  3. Change one software or image layer at a time.
  4. Revalidate readiness and progression before broader rollout.
  5. Roll back to the previous known-good baseline if the node no longer serves cleanly.

1. Freeze the current baseline

Before the update:
  1. record the intended runtime mode,
  2. confirm APP_CONFIG and deployment metadata ownership,
  3. snapshot the current readiness outputs,
  4. confirm whether SGX build flags and runtime flags are aligned.
Do not start an update from an already-ambiguous environment.

2. Check prerequisites that commonly turn updates into incidents

Confirm before rollout:
  • chain and deployment-service reachability
  • Postgres and Redis health
  • PCCS and AESM reachability when enclave-capable runtime is in use
  • adequate disk and database headroom for the update window
If these are already degraded, fix them first instead of attributing later failures to the new version.

3. Change one layer at a time

For a safe public workflow:
  1. update one node software or image layer,
  2. restart only the components required for that layer,
  3. avoid bundling unrelated configuration and software changes in the same rollout.
This keeps rollback decisions clear.

4. Re-run readiness before declaring success

After the update, verify:
  • operator /v2/status
  • KYC /v1/status
  • operator encryption-key and mark-price probes
  • progression signals such as lastRequest.requestIndex and lastTx
If the node is healthy but not progressing, treat that as an update failure until explained.

5. Decide whether to continue or roll back

Continue only when:
  1. readiness is green,
  2. progression signals are moving,
  3. dependency checks remain healthy,
  4. no new attestation or registration fault has appeared.
If those conditions are not met, roll back to the last known-good software baseline rather than layering more changes on top.

6. Re-check monitoring after rollout

After a successful update:
  1. confirm the alert set is still firing on the expected fields,
  2. verify that the node is not silently degraded under the new version,
  3. watch storage and progression signals through the first post-update period.

Boundary rule

This guide covers rollout discipline, not the canonical fixed contracts or any private operator-specific incident process.

Next routes

Last modified on April 12, 2026