Update the node in a controlled rollout
- Verify the environment is currently healthy before changing anything.
- Record the active mode, build alignment, and readiness snapshot.
- Change one software or image layer at a time.
- Revalidate readiness and progression before broader rollout.
- Roll back to the previous known-good baseline if the node no longer serves cleanly.
1. Freeze the current baseline
Before the update:- record the intended runtime mode,
- confirm
APP_CONFIGand deployment metadata ownership, - snapshot the current readiness outputs,
- confirm whether SGX build flags and runtime flags are aligned.
2. Check prerequisites that commonly turn updates into incidents
Confirm before rollout:- chain and deployment-service reachability
- Postgres and Redis health
- PCCS and AESM reachability when enclave-capable runtime is in use
- adequate disk and database headroom for the update window
3. Change one layer at a time
For a safe public workflow:- update one node software or image layer,
- restart only the components required for that layer,
- avoid bundling unrelated configuration and software changes in the same rollout.
4. Re-run readiness before declaring success
After the update, verify:- operator
/v2/status - KYC
/v1/status - operator encryption-key and mark-price probes
- progression signals such as
lastRequest.requestIndexandlastTx
5. Decide whether to continue or roll back
Continue only when:- readiness is green,
- progression signals are moving,
- dependency checks remain healthy,
- no new attestation or registration fault has appeared.
6. Re-check monitoring after rollout
After a successful update:- confirm the alert set is still firing on the expected fields,
- verify that the node is not silently degraded under the new version,
- watch storage and progression signals through the first post-update period.