How to Set Up Node Monitoring and Alerting

Set up node monitoring and alerting
Prerequisites
1. Start with the readiness endpoints
2. Add operator progress signals
3. Monitor the shared dependencies
4. Monitor request, transaction, and system pressure
5. Turn health signals into alert classes
6. Route follow-up work correctly

Set up node monitoring and alerting

Use this guide when you already operate an operator-backed environment and need a practical monitoring baseline for health, request flow, and storage pressure.

Prerequisites

You already know the environment is on a node-operations path.
You know which operator and KYC endpoints you are monitoring.
You can observe the deployment-service, chain, operator, and KYC dependencies that belong to that environment.

1. Start with the readiness endpoints

Treat these as the first health signals:

Surface	Signal to watch
operator	`/v2/status` plus the running-state and health fields used by the localnet readiness check
KYC	`/v1/status` with `isHealthy=true`
deployment-service	health and deployment responsiveness for the owned baseline/reset path

If readiness is red, alert on that first before digging into secondary performance signals.

2. Add operator progress signals

Track the fields that show whether the operator is actually moving:

Signal	Why to watch it
`runningState`	confirms whether the node is in a serving role
`health`	distinguishes serving state from degraded or blocked startup
`lastRequest.requestIndex`	shows whether requests are advancing
`lastTx`	shows whether transaction progression is stalled
`raftMetrics.nodeId` and `raftMetrics.voterIds`	help explain cluster-role and voter-set issues in status output

3. Monitor the shared dependencies

Add explicit checks for:

chain reachability
deployment-service reachability
Postgres availability
Redis availability
PCCS and AESM when enclave-capable runtime is in use
oracle-source availability

If those dependencies fail, alert there instead of only alerting on downstream operator symptoms.

4. Monitor request, transaction, and system pressure

The current repo monitoring scripts indicate these practical categories:

Category	Example focus
request KPIs	busiest request paths and aggregate request volume
transaction logs	whether transaction handling is moving or backing up
system KPIs	CPU and memory pressure
disk and database storage	filesystem pressure and database growth

These signals help distinguish “node is down” from “node is running but falling behind.”

5. Turn health signals into alert classes

Use simple alert classes:

Alert class	Trigger shape
readiness	operator or KYC readiness endpoint stays unhealthy
dependency	PCCS, AESM, Postgres, Redis, chain, or deployment-service becomes unavailable
progression stall	request index or transaction progression stops moving while the node is supposed to serve
storage pressure	disk or database growth threatens the environment’s stability

6. Route follow-up work correctly

After an alert fires:

use How to Handle Node Downtime and Recovery for recovery flow
use How to Troubleshoot SGX Attestation Issues when the underlying failure is attestation-related
use Node Operations Reference for the fixed readiness and invariant facts behind the alert

Last modified on April 12, 2026

How to Handle Node Downtime and Recovery How to Update Node Software Safely

⌘I

How-to Index

API Integration

Governance

Integration and Composability

Open Source Contributions

Trading Operations

Node Operations

Institutional Diligence

How to Set Up Node Monitoring and Alerting

Set up node monitoring and alerting

Prerequisites

1. Start with the readiness endpoints

2. Add operator progress signals

3. Monitor the shared dependencies

4. Monitor request, transaction, and system pressure

5. Turn health signals into alert classes

6. Route follow-up work correctly

How-to Index

API Integration

Governance

Integration and Composability

Open Source Contributions

Trading Operations

Node Operations

Institutional Diligence

​Set up node monitoring and alerting

​Prerequisites

​1. Start with the readiness endpoints

​2. Add operator progress signals

​3. Monitor the shared dependencies

​4. Monitor request, transaction, and system pressure

​5. Turn health signals into alert classes

​6. Route follow-up work correctly

Set up node monitoring and alerting

Prerequisites

1. Start with the readiness endpoints

2. Add operator progress signals

3. Monitor the shared dependencies

4. Monitor request, transaction, and system pressure

5. Turn health signals into alert classes

6. Route follow-up work correctly