OpenTelemetry Distributed Tracing

February 13, 2026 | Observability OpenTelemetry

Instrumentation, Collector, and backends.

OpenTelemetry Distributed Tracing

In a microservices architecture, a single user request can span dozens of services. When something goes wrong, finding the root cause without distributed tracing is like debugging in the dark. OpenTelemetry provides vendor-neutral instrumentation for traces, metrics, and logs.

OpenTelemetry Architecture

  • SDK — Instrumentation libraries for your application language
  • Collector — Agent that receives, processes, and exports telemetry data
  • Backend — Storage and visualization (Jaeger, Tempo, Zipkin, Datadog)

Auto-Instrumentation

OpenTelemetry provides auto-instrumentation for common frameworks:

# Node.js
npm install @opentelemetry/auto-instrumentations-node
node --require @opentelemetry/auto-instrumentations-node/register app.js

# Python
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-instrument python app.py

# .NET
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol

.NET Configuration Example

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSqlClientInstrumentation()
        .AddOtlpExporter(opts => {
            opts.Endpoint = new Uri("http://otel-collector:4317");
        }))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter());

Collector Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    processors:
      batch:
        timeout: 5s
        send_batch_size: 1000
      memory_limiter:
        limit_mib: 512

    exporters:
      jaeger:
        endpoint: jaeger:14250
        tls:
          insecure: true
      prometheus:
        endpoint: 0.0.0.0:8889

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [jaeger]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [prometheus]

Trace Propagation

Ensure trace context is propagated across service boundaries:

  • W3C TraceContext — Standard propagation format (traceparent header)
  • B3 — Zipkin-compatible format
  • Auto-instrumentation handles HTTP propagation automatically
  • For message queues (SQS, Kafka), propagate context via message attributes

Sampling Strategies

In high-traffic systems, tracing every request is expensive. Use sampling:

processors:
  probabilistic_sampler:
    sampling_percentage: 10  # Sample 10% of traces

  tail_sampling:
    policies:
    - name: error-policy
      type: status_code
      status_code: {status_codes: [ERROR]}  # Always sample errors
    - name: slow-policy
      type: latency
      latency: {threshold_ms: 2000}  # Always sample slow requests
    - name: default
      type: probabilistic
      probabilistic: {sampling_percentage: 5}

What to Trace

  • HTTP requests — Inbound and outbound (auto-instrumented)
  • Database queries — SQL queries with timing (auto-instrumented)
  • Cache operations — Redis/Memcached hits and misses
  • Message queue operations — Publish and consume with context propagation
  • External API calls — Third-party service latency and errors

Eazy SaaS Tip: We implement OpenTelemetry with tail sampling as standard for our microservices clients. This captures 100% of errors and slow requests while sampling 5% of normal traffic — providing full debugging capability at 1/20th the storage cost.