Monitoring Your Validator
GenLayer validators expose comprehensive metrics that are ready for consumption by Prometheus and other monitoring tools. This allows you to monitor your validator's performance, health, and resource usage.
Accessing Metrics
The metrics endpoint is exposed on the operations port (default: 9153) configured in your config.yaml:
node:
ops:
port: 9153 # Metrics port
endpoints:
metrics: true # Enable metrics endpointOnce your node is running, you can access the metrics at:
http://localhost:9153/metricsAvailable Metrics
The validator exposes various metric collectors that can be individually configured:
- Node Metrics: Core validator performance metrics including block processing, transaction handling, and consensus participation
- GenVM Metrics: Virtual machine performance metrics, including execution times and resource usage
- WebDriver Metrics: Metrics related to web access and external data fetching
Configuring Metrics Collection
You can customize metrics collection in your config.yaml:
metrics:
interval: "15s" # Default collection interval
collectors:
node:
enabled: true
interval: "30s" # Override interval for specific collector
genvm:
enabled: true
interval: "20s"
webdriver:
enabled: true
interval: "60s"Example Metrics Query
To check if metrics are working correctly:
# Get all available metrics
curl http://localhost:9153/metrics
# Check specific metric (example)
curl -s http://localhost:9153/metrics | grep genlayer_node_The metrics endpoint also provides /health and /balance endpoints on the same port for additional monitoring capabilities.
Monitoring Best Practices
- Set up alerts for critical metrics like node synchronization status and missed blocks
- Monitor resource usage to ensure your validator has sufficient CPU, memory, and disk space
- Track GenVM performance to optimize LLM provider selection and configuration
- Use visualization tools like Grafana to create dashboards for easy monitoring
For production validators, we recommend setting up a complete monitoring stack with Prometheus and Grafana. This enables real-time visibility into your validator's performance and helps identify issues before they impact your validator's operation.
Logs and Metrics Forwarding
You can forward your logs and metrics to external systems for centralized monitoring and alerting by using the service alloy provided in the docker-compose.yaml from the extracted tarball.
Centralized Push to GenLayer Foundation Grafana Cloud (using Alloy)
To contribute your node's metrics and logs to the centralized GenLayer Foundation Grafana Cloud dashboard (improving aggregate network visibility, alerts, and community monitoring), use the built-in Alloy service.
Why contribute?
- Helps the Foundation and community track overall testnet health (validator participation, latency, resource usage).
- May positively influence testnet points/rewards (visible healthy nodes are prioritized).
- Setup takes 10–15 minutes once credentials are provided.
Prerequisites
- Metrics enabled in
config.yaml(endpoints.metrics: true— default in recent versions). - Ops port 9153 exposed in docker-compose (
ports: - "9153:9153"). - Credentials from the Foundation team (ask in #testnet-asimov):
CENTRAL_MONITORING_URL— Prometheus remote write base URL (e.g.,https://prometheus-prod-XX.grafana.net)CENTRAL_LOKI_URL— Loki push base URL (e.g.,https://logs-prod-XX.grafana.net)MONITORING_USERNAME— Instance ID (a number)MONITORING_PASSWORD— Grafana Cloud API Key (with write permissions for metrics and logs)
Steps
- Create or update .env (next to your docker-compose.yaml):
# Grafana Cloud credentials (request from Foundation team in Discord)
CENTRAL_MONITORING_URL=https://prometheus-prod-...grafana.net
CENTRAL_LOKI_URL=https://logs-prod-...grafana.net
MONITORING_USERNAME=1234567890 # your instance ID
MONITORING_PASSWORD=glc_xxxxxxxxxxxxxxxxxxxxxxxxxxxx # API key
# Your node labels (customize for easy filtering in dashboards)
NODE_ID=0xYourValidatorAddressOrCustomID
VALIDATOR_NAME=validatorname
# Usually defaults are fine
NODE_METRICS_ENDPOINT=localhost:9153
LOG_FILE_PATTERN=/var/log/genlayer/node*.log
METRICS_SCRAPE_INTERVAL=15s- **Add or verify the Alloy service in docker-compose.yaml (copy if missing):
alloy:
image: grafana/alloy:latest
container_name: genlayer-node-alloy
command:
- run
- /etc/alloy/config.river
- --server.http.listen-addr=0.0.0.0:12345
- --storage.path=/var/lib/alloy/data
volumes:
- ./alloy-config.river:/etc/alloy/config.river:ro
- ${NODE_LOGS_PATH:-./data/node/logs}:/var/log/genlayer:ro
- alloy_data:/var/lib/alloy
environment:
- CENTRAL_LOKI_URL=${CENTRAL_LOKI_URL}
- CENTRAL_MONITORING_URL=${CENTRAL_MONITORING_URL}
- MONITORING_USERNAME=${MONITORING_USERNAME}
- MONITORING_PASSWORD=${MONITORING_PASSWORD}
- NODE_ID=${NODE_ID}
- VALIDATOR_NAME=${VALIDATOR_NAME}
- NODE_METRICS_ENDPOINT=${NODE_METRICS_ENDPOINT}
- SCRAPE_TARGETS_JSON=${SCRAPE_TARGETS_JSON:-}
- METRICS_SCRAPE_INTERVAL=${METRICS_SCRAPE_INTERVAL:-15s}
- METRICS_SCRAPE_TIMEOUT=${METRICS_SCRAPE_TIMEOUT:-10s}
- ALLOY_SELF_MONITORING_INTERVAL=${ALLOY_SELF_MONITORING_INTERVAL:-60s}
- LOG_FILE_PATTERN=${LOG_FILE_PATTERN:-/var/log/genlayer/node*.log}
- LOKI_BATCH_SIZE=${LOKI_BATCH_SIZE:-1MiB}
- LOKI_BATCH_WAIT=${LOKI_BATCH_WAIT:-1s}
ports:
- "12345:12345" # Alloy UI for debugging
restart: unless-stopped
profiles:
- monitoring
volumes:
alloy_data:- Create or update ./alloy-config.river (use the provided version — it handles logs and metrics forwarding):
// Grafana Alloy Configuration for GenLayer Node Telemetry
// Handles both log collection and metrics forwarding
// ==========================================
// Log Collection and Forwarding
// ==========================================
local.file_match "genlayer_logs" {
path_targets = [{
__path__ = coalesce(env("LOG_FILE_PATTERN"), "/var/log/genlayer/node*.log"),
}]
}
discovery.relabel "add_labels" {
targets = local.file_match.genlayer_logs.targets
rule {
target_label = "instance"
replacement = env("NODE_ID")
}
rule {
target_label = "validator_name"
replacement = env("VALIDATOR_NAME")
}
rule {
target_label = "component"
replacement = "alloy"
}
rule {
target_label = "job"
replacement = "genlayer-node"
}
}
loki.source.file "genlayer" {
targets = discovery.relabel.add_labels.output
forward_to = [loki.write.central.receiver]
tail_from_end = true
}
loki.write "central" {
endpoint {
url = env("CENTRAL_LOKI_URL") + "/loki/api/v1/push"
basic_auth {
username = env("MONITORING_USERNAME")
password = env("MONITORING_PASSWORD")
}
batch_size = coalesce(env("LOKI_BATCH_SIZE"), "1MiB")
batch_wait = coalesce(env("LOKI_BATCH_WAIT"), "1s")
}
}
// ==========================================
// Prometheus Metrics Collection and Forwarding
// ==========================================
prometheus.scrape "genlayer_node" {
targets = json_decode(coalesce(env("SCRAPE_TARGETS_JSON"), format("[{\"__address__\":\"%s\",\"instance\":\"%s\",\"validator_name\":\"%s\"}]", coalesce(env("NODE_METRICS_ENDPOINT"), "localhost:9153"), coalesce(env("NODE_ID"), "local"), coalesce(env("VALIDATOR_NAME"), "default"))))
forward_to = [prometheus.relabel.metrics.receiver]
scrape_interval = coalesce(env("METRICS_SCRAPE_INTERVAL"), "15s")
scrape_timeout = coalesce(env("METRICS_SCRAPE_TIMEOUT"), "10s")
}
prometheus.relabel "metrics" {
forward_to = [prometheus.remote_write.central.receiver]
// Optional: filter only GenLayer metrics to save bandwidth
// rule {
// source_labels = ["__name__"]
// regex = "genlayer_.*"
// action = "keep"
// }
}
prometheus.remote_write "central" {
endpoint {
url = env("CENTRAL_MONITORING_URL") + "/api/v1/write"
basic_auth {
username = env("MONITORING_USERNAME")
password = env("MONITORING_PASSWORD")
}
queue_config {
capacity = 10000
max_shards = 5
max_samples_per_send = 500
batch_send_deadline = "15s"
}
}
}
// ==========================================
// Alloy Self-Monitoring
// ==========================================
prometheus.exporter.self "alloy" {}
prometheus.scrape "alloy" {
targets = prometheus.exporter.self.alloy.targets
forward_to = []
scrape_interval = coalesce(env("ALLOY_SELF_MONITORING_INTERVAL"), "60s")
}- Start Alloy:
docker compose --profile monitoring up -d- Verify it works:
- Open Alloy UI: http://localhost:12345/targets (opens in a new tab) — the "genlayer_node" scrape target should show status UP.
- Check logs for successful sends:
docker logs genlayer-node-alloy | grep "sent batch"docker logs genlayer-node-alloy | grep "remote_write"Look for messages indicating successful batch sending (no error codes like 401, 403, 500).
- In Foundation Grafana Cloud: search for metrics with labels
instance="${NODE_ID}"orvalidator_name="${VALIDATOR_NAME}"
(example:genlayer_node_uptime_seconds{instance="0xYourID"}).
Troubleshooting
- No local metrics:
curl http://localhost:9153/metrics— it should return Prometheus-formatted data. Authentication errors (401/403): Double-check MONITORING_USERNAME and MONITORING_PASSWORD in .env. No data pushed: Ensure URLs in .env have no trailing slash. Help: Share Alloy logs
docker logs genlayer-node-alloy