Name: SystemForge Software
Address: US
Price range: $$

There are two ways to find out your API has problems in production: your alerts tell you, or your customers tell you. The first is just an incident. The second is an incident and a trust crisis.

API monitoring in production isn't optional -- it's the quality infrastructure that separates a professional service from a prototype in production. And with the right tools, setting up basic observability takes less than a day of work.

Essential Metrics: Latency, Error Rate, and Throughput

The three fundamental metrics for any API are known as the "Golden Signals" of SRE (Site Reliability Engineering):

Latency -- how long your API takes to respond. The classic mistake is monitoring only the average. Averages hide outliers: if 95% of requests take 50ms and 5% take 10 seconds, the average might be 550ms -- which seems reasonable but represents a terrible experience for 1 in every 20 users.

The correct percentiles to monitor:

p50 (median): The "median" user. Good performance baseline.
p95: 95% of requests are below this value. What most users experience.
p99: The "long tail." Where the real performance problems live.
p99.9: For critical SLAs. An API with p99.9 of 2 seconds has 0.1% of requests slower than that -- still hundreds of requests at high volume.

Error Rate -- percentage of requests returning 5xx. A sudden spike from 0% to 5% error rate is more significant than a constant 2% error rate. Alert on relative variations, not just absolute thresholds.

Throughput -- requests per second. Useful for detecting both abnormal traffic drops (something broke upstream) and spikes that may indicate abuse or unusual traffic.

// Manual instrumentation with custom metrics in Next.js
import { metrics } from "@opentelemetry/api";

const meter = metrics.getMeter("api-metrics");

const httpRequestDuration = meter.createHistogram("http_request_duration_ms", {
  description: "HTTP request duration in milliseconds",
  unit: "ms",
});

const httpRequestsTotal = meter.createCounter("http_requests_total", {
  description: "Total HTTP requests",
});

// Instrumentation middleware
export function instrumentRequest(handler: NextApiHandler): NextApiHandler {
  return async (req, res) => {
    const startTime = Date.now();
    const labels = {
      method: req.method ?? "unknown",
      route: req.url?.split("?")[0] ?? "unknown",
    };

    try {
      await handler(req, res);
      const duration = Date.now() - startTime;
      httpRequestDuration.record(duration, { ...labels, status: String(res.statusCode) });
      httpRequestsTotal.add(1, { ...labels, status: String(res.statusCode) });
    } catch (error) {
      httpRequestDuration.record(Date.now() - startTime, { ...labels, status: "500" });
      httpRequestsTotal.add(1, { ...labels, status: "500" });
      throw error;
    }
  };
}

OpenTelemetry: Distributed Tracing in Practice

Metrics say "something is wrong." Traces say "exactly where it's wrong and why." OpenTelemetry (OTel) is the open standard for observability instrumentation -- it works with any backend (Jaeger, Tempo, Datadog, New Relic, Honeycomb).

A trace is composed of spans: work units with start timestamp, duration, and attributes. An HTTP request creates a root span; database calls, external services, and queues create child spans. The result is a waterfall graph showing exactly where time was spent.

// OpenTelemetry configuration in Next.js (instrumentation.ts)
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";

export function register() {
  const sdk = new NodeSDK({
    traceExporter: new OTLPTraceExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://localhost:4318/v1/traces",
    }),
    metricReader: new PeriodicExportingMetricReader({
      exporter: new OTLPMetricExporter({
        url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://localhost:4318/v1/metrics",
      }),
    }),
    instrumentations: [
      getNodeAutoInstrumentations({
        "@opentelemetry/instrumentation-fs": { enabled: false }, // too verbose
      }),
    ],
    serviceName: "orders-api",
  });

  sdk.start();
}

// Creating custom spans for critical operations
import { trace } from "@opentelemetry/api";

const tracer = trace.getTracer("order-service");

async function processPayment(orderId: string, amount: number) {
  const span = tracer.startSpan("process_payment", {
    attributes: {
      "order.id": orderId,
      "payment.amount": amount,
    },
  });

  try {
    const result = await paymentGateway.charge({ orderId, amount });
    span.setAttribute("payment.transaction_id", result.transactionId);
    span.setStatus({ code: SpanStatusCode.OK });
    return result;
  } catch (error) {
    span.recordException(error as Error);
    span.setStatus({ code: SpanStatusCode.ERROR, message: String(error) });
    throw error;
  } finally {
    span.end();
  }
}

With auto-instrumentations, you get traces for free: HTTP requests, Prisma/TypeORM queries, Redis commands, fetch/axios calls, and queue messages. The span tree immediately reveals whether slowness is in the database, an external service, or the application code.

Grafana + Prometheus: API Dashboard in 30 Minutes

Prometheus scrapes metrics by exposing an HTTP endpoint in text format. Grafana visualizes those metrics in dashboards. The combination is the most common observability standard in self-hosted systems.

# docker-compose.yml -- local observability stack
version: "3.8"
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana

  tempo:
    image: grafana/tempo:latest
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "3200:3200"   # Tempo API

volumes:
  grafana-data:

Essential Prometheus queries for API dashboards:

# p99 latency by route (last 5 minutes)
histogram_quantile(0.99,
  rate(http_request_duration_ms_bucket[5m])
) by (route)

# Error rate by route
sum(rate(http_requests_total{status=~"5.."}[5m])) by (route)
/
sum(rate(http_requests_total[5m])) by (route)

# Throughput (req/s)
sum(rate(http_requests_total[1m])) by (route)

# Apdex score: % of requests within SLA (e.g., 500ms)
(
  sum(rate(http_request_duration_ms_bucket{le="500"}[5m]))
  +
  sum(rate(http_request_duration_ms_bucket{le="2000"}[5m]))
) / 2
/
sum(rate(http_request_duration_ms_count[5m]))

Grafana has ready-made dashboards at grafana.com/grafana/dashboards for Node.js, Next.js, and generic APIs. Import, adjust the queries for your label names, and you have professional observability in minutes.

Smart Alerts: Avoiding Alert Fatigue

Alert fatigue is the phenomenon where a team stops responding to alerts because they receive so many non-urgent alerts that relevant ones get lost in the noise. It's one of the most serious problems in software operations.

The principles of smart alerting:

Alert on symptoms, not causes. "Error rate > 5%" is a symptom: something is wrong for users. "CPU > 80%" is a cause: it may or may not be affecting users. Prefer symptom alerts, use causes as context in the runbook.

Burn rate, not absolute thresholds. Instead of alerting when error rate > 1%, alert when you're "burning" your SLO faster than sustainable. If your SLO is 99.9% availability/month and you're at 10% error rate, you'll exhaust the month's entire error budget in 4 hours.

# Prometheus alerts (alertmanager)
groups:
  - name: api-alerts
    rules:
      # High urgency: burning SLO rapidly
      - alert: HighErrorBurnRate
        expr: |
          (
            sum(rate(http_requests_total{status=~"5.."}[1h]))
            / sum(rate(http_requests_total[1h]))
          ) > 0.05
        for: 5m
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "Error rate above 5% for 5 minutes"
          runbook: "https://wiki.company.com/runbooks/high-error-rate"

      # Latency degrading
      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99,
            rate(http_request_duration_ms_bucket[5m])
          ) > 2000
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency above 2 seconds"

      # Absence of traffic (possible upstream failure)
      - alert: NoTraffic
        expr: sum(rate(http_requests_total[5m])) < 0.1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "API without traffic for 10 minutes -- check if accessible"

Three severity levels:

Critical: wake someone up now. SLO at immediate risk, users impacted.
Warning: investigate during business hours. Concerning trend.
Info: log for audit. Don't interrupt anyone.

Conclusion

API observability isn't a future project -- it's a production requirement. The good news is that the cost of entry has dropped dramatically: OpenTelemetry is free and open source, Grafana + Prometheus can be self-hosted at zero cost, and basic configuration takes less than a day of work.

The return is immediate: next time a customer complains about slowness, you'll have precise data about which endpoint, at which latency percentile, from which moment -- not guesswork. Next time an incident happens, you'll know before the customer does.

At SystemForge, monitoring and SLOs are part of the deploy phase -- not added later as an afterthought. This means every delivered system comes with functional dashboards, configured alerts, and basic runbooks. Visit systemforgesoftware.com to learn more.

There are two ways to find out your API has problems in production: your alerts tell you, or your customers tell you. The first is just an incident. The second is an incident and a trust crisis.

Essential Metrics: Latency, Error Rate, and Throughput

The three fundamental metrics for any API are known as the "Golden Signals" of SRE (Site Reliability Engineering):

The correct percentiles to monitor:

p50 (median): The "median" user. Good performance baseline.
p95: 95% of requests are below this value. What most users experience.
p99: The "long tail." Where the real performance problems live.
p99.9: For critical SLAs. An API with p99.9 of 2 seconds has 0.1% of requests slower than that -- still hundreds of requests at high volume.

Throughput -- requests per second. Useful for detecting both abnormal traffic drops (something broke upstream) and spikes that may indicate abuse or unusual traffic.

// Manual instrumentation with custom metrics in Next.js
import { metrics } from "@opentelemetry/api";

const meter = metrics.getMeter("api-metrics");

const httpRequestDuration = meter.createHistogram("http_request_duration_ms", {
  description: "HTTP request duration in milliseconds",
  unit: "ms",
});

const httpRequestsTotal = meter.createCounter("http_requests_total", {
  description: "Total HTTP requests",
});

// Instrumentation middleware
export function instrumentRequest(handler: NextApiHandler): NextApiHandler {
  return async (req, res) => {
    const startTime = Date.now();
    const labels = {
      method: req.method ?? "unknown",
      route: req.url?.split("?")[0] ?? "unknown",
    };

    try {
      await handler(req, res);
      const duration = Date.now() - startTime;
      httpRequestDuration.record(duration, { ...labels, status: String(res.statusCode) });
      httpRequestsTotal.add(1, { ...labels, status: String(res.statusCode) });
    } catch (error) {
      httpRequestDuration.record(Date.now() - startTime, { ...labels, status: "500" });
      httpRequestsTotal.add(1, { ...labels, status: "500" });
      throw error;
    }
  };
}

OpenTelemetry: Distributed Tracing in Practice

// OpenTelemetry configuration in Next.js (instrumentation.ts)
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";

export function register() {
  const sdk = new NodeSDK({
    traceExporter: new OTLPTraceExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://localhost:4318/v1/traces",
    }),
    metricReader: new PeriodicExportingMetricReader({
      exporter: new OTLPMetricExporter({
        url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://localhost:4318/v1/metrics",
      }),
    }),
    instrumentations: [
      getNodeAutoInstrumentations({
        "@opentelemetry/instrumentation-fs": { enabled: false }, // too verbose
      }),
    ],
    serviceName: "orders-api",
  });

  sdk.start();
}

// Creating custom spans for critical operations
import { trace } from "@opentelemetry/api";

const tracer = trace.getTracer("order-service");

async function processPayment(orderId: string, amount: number) {
  const span = tracer.startSpan("process_payment", {
    attributes: {
      "order.id": orderId,
      "payment.amount": amount,
    },
  });

  try {
    const result = await paymentGateway.charge({ orderId, amount });
    span.setAttribute("payment.transaction_id", result.transactionId);
    span.setStatus({ code: SpanStatusCode.OK });
    return result;
  } catch (error) {
    span.recordException(error as Error);
    span.setStatus({ code: SpanStatusCode.ERROR, message: String(error) });
    throw error;
  } finally {
    span.end();
  }
}

Grafana + Prometheus: API Dashboard in 30 Minutes

# docker-compose.yml -- local observability stack
version: "3.8"
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana

  tempo:
    image: grafana/tempo:latest
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "3200:3200"   # Tempo API

volumes:
  grafana-data:

Essential Prometheus queries for API dashboards:

# p99 latency by route (last 5 minutes)
histogram_quantile(0.99,
  rate(http_request_duration_ms_bucket[5m])
) by (route)

# Error rate by route
sum(rate(http_requests_total{status=~"5.."}[5m])) by (route)
/
sum(rate(http_requests_total[5m])) by (route)

# Throughput (req/s)
sum(rate(http_requests_total[1m])) by (route)

# Apdex score: % of requests within SLA (e.g., 500ms)
(
  sum(rate(http_request_duration_ms_bucket{le="500"}[5m]))
  +
  sum(rate(http_request_duration_ms_bucket{le="2000"}[5m]))
) / 2
/
sum(rate(http_request_duration_ms_count[5m]))

Smart Alerts: Avoiding Alert Fatigue

The principles of smart alerting:

# Prometheus alerts (alertmanager)
groups:
  - name: api-alerts
    rules:
      # High urgency: burning SLO rapidly
      - alert: HighErrorBurnRate
        expr: |
          (
            sum(rate(http_requests_total{status=~"5.."}[1h]))
            / sum(rate(http_requests_total[1h]))
          ) > 0.05
        for: 5m
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "Error rate above 5% for 5 minutes"
          runbook: "https://wiki.company.com/runbooks/high-error-rate"

      # Latency degrading
      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99,
            rate(http_request_duration_ms_bucket[5m])
          ) > 2000
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency above 2 seconds"

      # Absence of traffic (possible upstream failure)
      - alert: NoTraffic
        expr: sum(rate(http_requests_total[5m])) < 0.1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "API without traffic for 10 minutes -- check if accessible"

Three severity levels:

Critical: wake someone up now. SLO at immediate risk, users impacted.
Warning: investigate during business hours. Concerning trend.
Info: log for audit. Don't interrupt anyone.

API Monitoring in Production: Metrics and Alerts

Essential Metrics: Latency, Error Rate, and Throughput

OpenTelemetry: Distributed Tracing in Practice

Grafana + Prometheus: API Dashboard in 30 Minutes

Smart Alerts: Avoiding Alert Fatigue

Conclusion

Need API and Integrations?

API Authentication: JWT, OAuth2, and Sessions

API Gateway: When Is It Worth Implementing?

Get articles on software engineering

API Monitoring in Production: Metrics and Alerts

Essential Metrics: Latency, Error Rate, and Throughput

OpenTelemetry: Distributed Tracing in Practice

Grafana + Prometheus: API Dashboard in 30 Minutes

Smart Alerts: Avoiding Alert Fatigue

Conclusion

Need API and Integrations?

API Authentication: JWT, OAuth2, and Sessions

API Gateway: When Is It Worth Implementing?

Get articles on software engineering

Essential Metrics: Latency, Error Rate, and Throughput

OpenTelemetry: Distributed Tracing in Practice

Grafana + Prometheus: API Dashboard in 30 Minutes

Smart Alerts: Avoiding Alert Fatigue

Conclusion

Need API and Integrations?

Related Articles

API Authentication: JWT, OAuth2, and Sessions

API Gateway: When Is It Worth Implementing?

Get articles on software engineering

Essential Metrics: Latency, Error Rate, and Throughput

OpenTelemetry: Distributed Tracing in Practice

Grafana + Prometheus: API Dashboard in 30 Minutes

Smart Alerts: Avoiding Alert Fatigue

Conclusion

Need API and Integrations?

Related Articles

API Authentication: JWT, OAuth2, and Sessions

API Gateway: When Is It Worth Implementing?

Get articles on software engineering