How to Handle Node Downtime and Recovery

Handle node downtime and recovery
Prerequisites
1. Classify the outage before changing anything
2. Re-verify the mode and ownership contracts
3. Recheck the shared dependencies
4. Recheck operator and KYC readiness
5. Restore the canonical baseline when the environment has drifted
6. Revalidate registration-dependent health
7. Decide the next route

Handle node downtime and recovery

Use this guide when an operator-backed environment is unhealthy and you need to decide whether to re-verify dependencies, restore the deployment baseline, or escalate to a private operator process.

Prerequisites

You have already classified the issue as node-operations work.
You know which localnet mode the environment is supposed to be using.
You can inspect operator, KYC, chain, and deployment-service readiness.

1. Classify the outage before changing anything

Put the failure into one of these buckets first:

Failure class	Typical signal
mode mismatch	chain and deployment-service URLs do not match the same localnet mode
dependency failure	Postgres, Redis, PCCS, AESM, oracle, or KYC-side prerequisites are unavailable
readiness failure	operator or KYC status endpoints do not report healthy state
baseline drift	the environment has accumulated state and no longer reflects the deployment baseline

2. Re-verify the mode and ownership contracts

Before restarting services, confirm:

ETH_RPC_URL and DEPLOYMENT_SERVICE_URL match the same localnet mode
APP_CONFIG=/opt/dexlabs
deployment-service is still the only owner of addresses.json

If those assumptions are wrong, correct them before you continue.

3. Recheck the shared dependencies

Confirm the required dependencies are available again:

chain and deployment-service
Postgres
Redis
PCCS and AESM when enclave-capable runtime is in use
oracle and KYC-side upstream dependencies

If a shared dependency is unavailable, recover that dependency first instead of trying ad-hoc node restarts.

4. Recheck operator and KYC readiness

Treat the status endpoints as the first readiness contract:

Surface	Healthy posture
operator	`/v2/status` reports a healthy running state and the localnet readiness checks succeed
KYC	`/v1/status` reports `isHealthy=true`

If those readiness signals are still failing, do not assume that request sequencing or trading flows are the root problem.

5. Restore the canonical baseline when the environment has drifted

If the environment is unhealthy because it has drifted from the known-good localnet state:

restore the deployment baseline through the owning deployment-service flow
keep release and registration effects outside the baseline snapshot
rerun the registration and readiness checks after the baseline is clean again

Use the deploy-baseline model, not an ad-hoc checkpoint model.

6. Revalidate registration-dependent health

After baseline recovery, recheck:

release measurements and registration-report availability
registration sequencing for operator and KYC paths
operator and KYC readiness after registration completes

If readiness still fails after those checks, escalate with the exact failing stage rather than a generic downtime summary.

7. Decide the next route

If you need…	Use this next
exact readiness, mode, and invariant lookup	Node Operations Reference
SGX-specific attestation diagnosis	How to Troubleshoot SGX Attestation Issues
monitoring signals before the next restart	How to Set Up Node Monitoring and Alerting
a private runbook or incident path	Support Channels

Last modified on April 12, 2026

How to Troubleshoot SGX Attestation Issues How to Set Up Node Monitoring and Alerting

⌘I

How-to Index

API Integration

Governance

Integration and Composability

Open Source Contributions

Trading Operations

Node Operations

Institutional Diligence

How to Handle Node Downtime and Recovery

Handle node downtime and recovery

Prerequisites

1. Classify the outage before changing anything

2. Re-verify the mode and ownership contracts

3. Recheck the shared dependencies

4. Recheck operator and KYC readiness

5. Restore the canonical baseline when the environment has drifted

6. Revalidate registration-dependent health

7. Decide the next route

How-to Index

API Integration

Governance

Integration and Composability

Open Source Contributions

Trading Operations

Node Operations

Institutional Diligence

​Handle node downtime and recovery

​Prerequisites

​1. Classify the outage before changing anything

​2. Re-verify the mode and ownership contracts

​3. Recheck the shared dependencies

​4. Recheck operator and KYC readiness

​5. Restore the canonical baseline when the environment has drifted

​6. Revalidate registration-dependent health

​7. Decide the next route

Handle node downtime and recovery

Prerequisites

1. Classify the outage before changing anything

2. Re-verify the mode and ownership contracts

3. Recheck the shared dependencies

4. Recheck operator and KYC readiness

5. Restore the canonical baseline when the environment has drifted

6. Revalidate registration-dependent health

7. Decide the next route