bsb_otel package

Submodules

bsb_otel.tracer module

Tracer and lifecycle helpers for the BSB OpenTelemetry integration.

This module is outside the entry-point DMZ — it imports bsb.services at module top, which fires MPI_Init at import time. Import its members via deep imports (from bsb_otel.tracer import ...) so the heavy bsb/MPI dependency only loads when user code actually needs the tracer. Do not register anything from this module as a Python entry point, and do not import it from bsb_otel/__init__.py at module top level.

class bsb_otel.tracer.BsbTracer(name: str, version: str, otel_tracer)

Per-package BSB tracer. Wraps an OpenTelemetry tracer and adds MPI-aware span creation. Obtain an instance via get_bsb_tracer().

Parameters:
trace(name, attributes=None)

Start a new telemetry span. Use as a context manager.

When there is no active parent span and MPI is in use, the root span is automatically broadcast to all ranks so their child spans share the same trace. When called within an existing span, a regular child span is created.

Parameters:
  • name (str) – name of the span

  • attributes (dict) – OpenTelemetry attributes

Returns:

OpenTelemetry span context manager.

exception bsb_otel.tracer.TerminationError

Raised by the SIGTERM handler installed by ensure_spans_on_exit().

Subclasses SystemExit so that Python unwinds the call stack normally (calling __exit__ on any active span context managers and ending spans), and then runs atexit handlers before the process terminates.

bsb_otel.tracer.ensure_spans_on_exit()

Install a SIGTERM handler that raises TerminationError.

When SIGTERM is received, the running call stack is unwound cleanly: any active with tracer.trace(...) blocks have their __exit__ called, spans are ended and exported, and atexit handlers fire before the process exits.

Call this once at process startup (e.g. in your __main__ entry point or CLI bootstrap) to ensure telemetry is not lost when the process is terminated by an orchestrator or job scheduler.

bsb_otel.tracer.get_bsb_tracer(package_name: str, version: str = None) BsbTracer

Return the BsbTracer for package_name, creating and registering it on first call.

Parameters:
  • package_name (str) – package name (used as the OTel instrumentation scope)

  • version (str) – override the version; defaults to the installed package version

Returns:

BsbTracer

Return type:

BsbTracer

bsb_otel.tracer.local_tracing()

Disable cross-rank broadcast for spans created inside this block.

Shorthand for use_communicator(mpi4py.MPI.COMM_SELF). Use this around rank-divergent code paths (where different ranks make different sequences of trace() calls) to avoid the collective-broadcast deadlock that would otherwise occur.

Inside the block each rank only synchronises with the chosen communicator (COMM_SELF — i.e. itself), so no new cross-rank broadcast root is created. A cross-rank parent established before the block is preserved: spans created inside still inherit it as their parent, so their trace_id stays correlated across ranks.

Falls back to a no-op if mpi4py is not importable, since the broadcast machinery is then already inactive.

bsb_otel.tracer.use_communicator(comm)

Override the MPI communicator that BsbTracer uses for span broadcasts within this block.

  • Pass mpi4py.MPI.COMM_SELF (size 1 from this rank’s view) to disable cross-rank correlation — each rank traces independently. Use local_tracing() for that case.

  • Pass any sub-communicator to broadcast within that group only.

  • The default is the global bsb.services.MPI communicator.

Note

This only affects the bsb-otel broadcast logic. mpi.rank and mpi.size span attributes still report the global rank/size from bsb.services.MPI. Other BSB code (locks, gather, etc.) is unaffected.

Implemented as a contextvars.ContextVar, so it propagates across asyncio tasks and through contextvars.copy_context().run(...) (which the BSB job pool uses), but does not leak across threads spawned with the bare threading.Thread.

bsb_otel.exporters module

class bsb_otel.exporters.JSONLinesSpanExporter

OpenTelemetry span exporter that writes spans as JSON lines to a file.

The output path is read from the OTEL_EXPORTER_JSONLINES_PATH environment variable (default: traces_*.jsonlines). A * in the path is replaced with a random 8-character alphanumeric string for unique filenames.

Register as a traces exporter with opentelemetry-instrument:

OTEL_EXPORTER_JSONLINES_PATH=./logs.jsonlines \
    opentelemetry-instrument --traces_exporter jsonlines bsb compile
export(spans: Sequence[ReadableSpan])

Exports a batch of telemetry data.

Args:

spans: The list of opentelemetry.trace.Span objects to be exported

Returns:

The result of the export

Parameters:

spans (Sequence[ReadableSpan])

force_flush(timeout_millis=30000)

Hint to ensure that the export of any spans the exporter has received prior to the call to ForceFlush SHOULD be completed as soon as possible, preferably before returning from this method.

shutdown()

Shuts down the exporter.

Called when the SDK is shut down.

bsb_otel.replay module

Replay JSON Lines trace files into the active OpenTelemetry pipeline.

Reads spans written by JSONLinesSpanExporter, reconstructs them as ReadableSpan objects and feeds them to the active tracer provider’s span processor. Whatever exporter opentelemetry-instrument has configured (OTLP to a collector, console, …) then receives the replayed traces with their original ids and parent links.

By default the timestamps are shifted forward so the trace ends at the moment of replay, keeping every span’s relative offset and duration intact. This sidesteps collector search windows (e.g. Jaeger’s lookback) that would otherwise hide a trace recorded long ago. Pass shift_to_now=False to keep the original times.

This module imports the OpenTelemetry SDK at top level, so it is outside the entry-point DMZ (see bsb_otel): import it lazily, only from the replay-otel command handler.

bsb_otel.replay.load_spans(paths) list

Load raw span dicts from one or more JSON Lines files.

Each path may be a literal file or a glob pattern (e.g. traces_*.jsonlines); patterns that match nothing fall through as a literal path so a clear FileNotFoundError is raised.

Parameters:

paths – iterable of file paths or glob patterns

Returns:

list of raw span dicts, in file-and-line order

Return type:

list

bsb_otel.replay.replay_files(paths, shift_to_now=True) int

Load JSON Lines files and replay them into the active tracer provider.

Return type:

int

bsb_otel.replay.replay_spans(spans, provider=None, shift_to_now=True) int

Feed reconstructed spans into a tracer provider’s active span processor.

The spans keep their original ids, so the configured exporter forwards them as-is. They adopt provider’s resource, so the service name shown by the collector follows the --service_name / OTEL_RESOURCE_ATTRIBUTES of the replaying process rather than the recording one.

Parameters:
  • spans – raw span dicts, e.g. from load_spans()

  • provider – tracer provider to replay into; defaults to the active one

  • shift_to_now – shift all timestamps forward so the latest span ends at replay time, keeping relative offsets and durations; set False to keep the original timestamps

Returns:

number of spans replayed

Return type:

int

bsb_otel.testing module

class bsb_otel.testing.OTelFixture

Context manager that overrides the global tracer provider with a custom one that exports to a temporary file, allowing tests to assert on recorded spans.

The global tracer provider is restored on exit.

Usage:

with OTelFixture() as results:
    handle_command(["--version"])

spans = results()
assert spans[0]["name"] == "cli"
bsb_otel.testing.wrap_tests_with_traces(suite)

Wrap every unittest.TestCase in suite so that each test run is recorded as an OpenTelemetry trace span.

Also wraps setUpClass/tearDownClass (as standalone broadcast spans) and setUp/tearDown (as sub-spans within the test run span) when the class defines them.

Intended to be called from a load_tests hook:

from bsb_otel.testing import wrap_tests_with_traces


def load_tests(loader, tests, pattern):
    suite = loader.discover("tests")
    wrap_tests_with_traces(suite)
    return suite

Module contents

BSB OpenTelemetry integration package.

Entry-point DMZ. This module is loaded eagerly by opentelemetry-instrument whenever it discovers any of bsb_otel’s entry points (env vars, exporters, distro). Keeping this file empty — no module-level imports beyond the standard library — guarantees the DMZ rule: nothing here can drag in bsb, which would fire MPI_Init prematurely.

Under opentelemetry-instrument’s two-phase startup (entry-point discovery before execl, then sitecustomize after), a too-early MPI_Init runs twice and exhausts the SLURM PMI slot — the second init fails with PMI2 error 14.

The DMZ rule: no top-level bsb* import in this file or any module reachable from a registered entry point (bsb_otel._otel_env, bsb_otel.exporters, bsb_otel._distro). Transitive imports through bsb are unpredictable — bsb.services may get pulled in by something seemingly innocent — so the rule forbids bsb entirely, not just bsb.services.

Public API: import directly from the submodule that owns each symbol:

from bsb_otel.tracer import BsbTracer, get_bsb_tracer
from bsb_otel.tracer import local_tracing, use_communicator
from bsb_otel.tracer import TerminationError, ensure_spans_on_exit
from bsb_otel.exporters import JSONLinesSpanExporter
from bsb_otel.replay import replay_files
from bsb_otel.testing import OTelFixture