Handle node downtime and recovery
Use this guide when an operator-backed environment is unhealthy and you need to decide whether to re-verify dependencies, restore the deployment baseline, or escalate to a private operator process.Prerequisites
- You have already classified the issue as node-operations work.
- You know which localnet mode the environment is supposed to be using.
- You can inspect operator, KYC, chain, and deployment-service readiness.
1. Classify the outage before changing anything
Put the failure into one of these buckets first:| Failure class | Typical signal |
|---|---|
| mode mismatch | chain and deployment-service URLs do not match the same localnet mode |
| dependency failure | Postgres, Redis, PCCS, AESM, oracle, or KYC-side prerequisites are unavailable |
| readiness failure | operator or KYC status endpoints do not report healthy state |
| baseline drift | the environment has accumulated state and no longer reflects the deployment baseline |
2. Re-verify the mode and ownership contracts
Before restarting services, confirm:ETH_RPC_URLandDEPLOYMENT_SERVICE_URLmatch the same localnet modeAPP_CONFIG=/opt/dexlabsdeployment-serviceis still the only owner ofaddresses.json
3. Recheck the shared dependencies
Confirm the required dependencies are available again:- chain and deployment-service
- Postgres
- Redis
- PCCS and AESM when enclave-capable runtime is in use
- oracle and KYC-side upstream dependencies
4. Recheck operator and KYC readiness
Treat the status endpoints as the first readiness contract:| Surface | Healthy posture |
|---|---|
| operator | /v2/status reports a healthy running state and the localnet readiness checks succeed |
| KYC | /v1/status reports isHealthy=true |
5. Restore the canonical baseline when the environment has drifted
If the environment is unhealthy because it has drifted from the known-good localnet state:- restore the deployment baseline through the owning deployment-service flow
- keep release and registration effects outside the baseline snapshot
- rerun the registration and readiness checks after the baseline is clean again
6. Revalidate registration-dependent health
After baseline recovery, recheck:- release measurements and registration-report availability
- registration sequencing for operator and KYC paths
- operator and KYC readiness after registration completes
7. Decide the next route
| If you need… | Use this next |
|---|---|
| exact readiness, mode, and invariant lookup | Node Operations Reference |
| SGX-specific attestation diagnosis | How to Troubleshoot SGX Attestation Issues |
| monitoring signals before the next restart | How to Set Up Node Monitoring and Alerting |
| a private runbook or incident path | Support Channels |