bsb_otel package¶
Submodules¶
bsb_otel.tracer module¶
Tracer and lifecycle helpers for the BSB OpenTelemetry integration.
This module is outside the entry-point DMZ — it imports bsb.services at
module top, which fires MPI_Init at import time. Import its members
via deep imports (from bsb_otel.tracer import ...) so the heavy
bsb/MPI dependency only loads when user code actually needs the tracer.
Do not register anything from this module as a Python entry point, and
do not import it from bsb_otel/__init__.py at module top level.
- class bsb_otel.tracer.BsbTracer(name: str, version: str, otel_tracer)¶
Per-package BSB tracer. Wraps an OpenTelemetry tracer and adds MPI-aware span creation. Obtain an instance via
get_bsb_tracer().- trace(name, attributes=None)¶
Start a new telemetry span. Use as a context manager.
When there is no active parent span and MPI is in use, the root span is automatically broadcast to all ranks so their child spans share the same trace. When called within an existing span, a regular child span is created.
- exception bsb_otel.tracer.TerminationError¶
Raised by the SIGTERM handler installed by
ensure_spans_on_exit().Subclasses
SystemExitso that Python unwinds the call stack normally (calling__exit__on any active span context managers and ending spans), and then runsatexithandlers before the process terminates.
- bsb_otel.tracer.ensure_spans_on_exit()¶
Install a SIGTERM handler that raises
TerminationError.When SIGTERM is received, the running call stack is unwound cleanly: any active
with tracer.trace(...)blocks have their__exit__called, spans are ended and exported, andatexithandlers fire before the process exits.Call this once at process startup (e.g. in your
__main__entry point or CLI bootstrap) to ensure telemetry is not lost when the process is terminated by an orchestrator or job scheduler.
- bsb_otel.tracer.get_bsb_tracer(package_name: str, version: str = None) BsbTracer¶
Return the
BsbTracerfor package_name, creating and registering it on first call.
- bsb_otel.tracer.local_tracing()¶
Disable cross-rank broadcast for spans created inside this block.
Shorthand for
use_communicator(mpi4py.MPI.COMM_SELF). Use this around rank-divergent code paths (where different ranks make different sequences oftrace()calls) to avoid the collective-broadcast deadlock that would otherwise occur.Inside the block each rank only synchronises with the chosen communicator (
COMM_SELF— i.e. itself), so no new cross-rank broadcast root is created. A cross-rank parent established before the block is preserved: spans created inside still inherit it as their parent, so their trace_id stays correlated across ranks.Falls back to a no-op if mpi4py is not importable, since the broadcast machinery is then already inactive.
- bsb_otel.tracer.use_communicator(comm)¶
Override the MPI communicator that
BsbTraceruses for span broadcasts within this block.Pass
mpi4py.MPI.COMM_SELF(size 1 from this rank’s view) to disable cross-rank correlation — each rank traces independently. Uselocal_tracing()for that case.Pass any sub-communicator to broadcast within that group only.
The default is the global
bsb.services.MPIcommunicator.
Note
This only affects the bsb-otel broadcast logic.
mpi.rankandmpi.sizespan attributes still report the global rank/size frombsb.services.MPI. Other BSB code (locks, gather, etc.) is unaffected.Implemented as a
contextvars.ContextVar, so it propagates across asyncio tasks and throughcontextvars.copy_context().run(...)(which the BSB job pool uses), but does not leak across threads spawned with the barethreading.Thread.
bsb_otel.exporters module¶
- class bsb_otel.exporters.JSONLinesSpanExporter¶
OpenTelemetry span exporter that writes spans as JSON lines to a file.
The output path is read from the
OTEL_EXPORTER_JSONLINES_PATHenvironment variable (default:traces_*.jsonlines). A*in the path is replaced with a random 8-character alphanumeric string for unique filenames.Register as a traces exporter with opentelemetry-instrument:
OTEL_EXPORTER_JSONLINES_PATH=./logs.jsonlines \ opentelemetry-instrument --traces_exporter jsonlines bsb compile
- export(spans: Sequence[ReadableSpan])¶
Exports a batch of telemetry data.
- Args:
spans: The list of opentelemetry.trace.Span objects to be exported
- Returns:
The result of the export
- Parameters:
spans (Sequence[ReadableSpan])
- force_flush(timeout_millis=30000)¶
Hint to ensure that the export of any spans the exporter has received prior to the call to ForceFlush SHOULD be completed as soon as possible, preferably before returning from this method.
- shutdown()¶
Shuts down the exporter.
Called when the SDK is shut down.
bsb_otel.replay module¶
Replay JSON Lines trace files into the active OpenTelemetry pipeline.
Reads spans written by JSONLinesSpanExporter,
reconstructs them as ReadableSpan objects and
feeds them to the active tracer provider’s span processor. Whatever exporter
opentelemetry-instrument has configured (OTLP to a collector, console, …)
then receives the replayed traces with their original ids and parent links.
By default the timestamps are shifted forward so the trace ends at the moment of
replay, keeping every span’s relative offset and duration intact. This sidesteps
collector search windows (e.g. Jaeger’s lookback) that would otherwise hide a
trace recorded long ago. Pass shift_to_now=False to keep the original times.
This module imports the OpenTelemetry SDK at top level, so it is outside the
entry-point DMZ (see bsb_otel): import it lazily, only from the
replay-otel command handler.
- bsb_otel.replay.load_spans(paths) list¶
Load raw span dicts from one or more JSON Lines files.
Each path may be a literal file or a glob pattern (e.g.
traces_*.jsonlines); patterns that match nothing fall through as a literal path so a clearFileNotFoundErroris raised.- Parameters:
paths – iterable of file paths or glob patterns
- Returns:
list of raw span dicts, in file-and-line order
- Return type:
- bsb_otel.replay.replay_files(paths, shift_to_now=True) int¶
Load JSON Lines files and replay them into the active tracer provider.
- Return type:
- bsb_otel.replay.replay_spans(spans, provider=None, shift_to_now=True) int¶
Feed reconstructed spans into a tracer provider’s active span processor.
The spans keep their original ids, so the configured exporter forwards them as-is. They adopt provider’s resource, so the service name shown by the collector follows the
--service_name/OTEL_RESOURCE_ATTRIBUTESof the replaying process rather than the recording one.- Parameters:
spans – raw span dicts, e.g. from
load_spans()provider – tracer provider to replay into; defaults to the active one
shift_to_now – shift all timestamps forward so the latest span ends at replay time, keeping relative offsets and durations; set
Falseto keep the original timestamps
- Returns:
number of spans replayed
- Return type:
bsb_otel.testing module¶
- class bsb_otel.testing.OTelFixture¶
Context manager that overrides the global tracer provider with a custom one that exports to a temporary file, allowing tests to assert on recorded spans.
The global tracer provider is restored on exit.
Usage:
with OTelFixture() as results: handle_command(["--version"]) spans = results() assert spans[0]["name"] == "cli"
- bsb_otel.testing.wrap_tests_with_traces(suite)¶
Wrap every
unittest.TestCasein suite so that each test run is recorded as an OpenTelemetry trace span.Also wraps
setUpClass/tearDownClass(as standalone broadcast spans) andsetUp/tearDown(as sub-spans within the test run span) when the class defines them.Intended to be called from a
load_testshook:from bsb_otel.testing import wrap_tests_with_traces def load_tests(loader, tests, pattern): suite = loader.discover("tests") wrap_tests_with_traces(suite) return suite
Module contents¶
BSB OpenTelemetry integration package.
Entry-point DMZ. This module is loaded eagerly by opentelemetry-instrument
whenever it discovers any of bsb_otel’s entry points (env vars,
exporters, distro). Keeping this file empty — no module-level imports beyond
the standard library — guarantees the DMZ rule: nothing here can drag in
bsb, which would fire MPI_Init prematurely.
Under opentelemetry-instrument’s two-phase startup (entry-point
discovery before execl, then sitecustomize after), a too-early
MPI_Init runs twice and exhausts the SLURM PMI slot — the second
init fails with PMI2 error 14.
The DMZ rule: no top-level bsb* import in this file or any module
reachable from a registered entry point (bsb_otel._otel_env,
bsb_otel.exporters, bsb_otel._distro). Transitive imports through
bsb are unpredictable — bsb.services may get pulled in by something
seemingly innocent — so the rule forbids bsb entirely, not just
bsb.services.
Public API: import directly from the submodule that owns each symbol:
from bsb_otel.tracer import BsbTracer, get_bsb_tracer
from bsb_otel.tracer import local_tracing, use_communicator
from bsb_otel.tracer import TerminationError, ensure_spans_on_exit
from bsb_otel.exporters import JSONLinesSpanExporter
from bsb_otel.replay import replay_files
from bsb_otel.testing import OTelFixture