Tracing & Observability¶

This recipe shows how to add structured observability to restgdf requests using the resilience extra and Python’s standard logging / structlog.

Prerequisites¶

pip install "restgdf[resilience]"

Enable resilience + logging¶

import logging

from restgdf import Config, ResilienceConfig

logging.basicConfig(level=logging.DEBUG)

cfg = Config(
    resilience=ResilienceConfig(
        enabled=True,
        rate_per_service_root_per_second=10.0,
    ),
)

The restgdf.resilience module logs every retry attempt, 429 cooldown, and error mapping through the restgdf.retry logger at DEBUG level. Set the logger to DEBUG to see each attempt:

logging.getLogger("restgdf.retry").setLevel(logging.DEBUG)

Tracing individual requests¶

The ResilientSession wrapper is transparent — you use it exactly like a plain aiohttp.ClientSession, but retries and rate limits happen automatically:

import asyncio

from aiohttp import ClientSession
from restgdf.resilience import ResilientSession

async def traced_query():
    async with ClientSession() as raw:
        session = ResilientSession(raw, cfg.resilience)
        url = "https://maps1.vcgov.org/arcgis/rest/services/Beaches/MapServer/6/query"
        params = {"where": "1=1", "f": "json", "resultRecordCount": "1"}
        async with session.get(url, params=params) as resp:
            data = await resp.json()
            print(data)

asyncio.run(traced_query())

Inspecting error attributes¶

When the retry wrapper raises, the exception carries structured attributes populated automatically:

Exception	Attributes
`RestgdfResponseError`	`url`, `status_code`, `request_id`, `model_name`
`TransportError`	`url`, `status_code`
`RestgdfTimeoutError`	`url`, `timeout_kind` (`"connect"`, `"read"`, `"total"`)
`RateLimitError`	`url`, `status_code`, `retry_after`

Example:

from restgdf.errors import RateLimitError, RestgdfTimeoutError

try:
    await session.get(url).__aenter__()
except RateLimitError as exc:
    print(f"429 at {exc.url}, retry after {exc.retry_after}s")
except RestgdfTimeoutError as exc:
    print(f"Timeout ({exc.timeout_kind}) for {exc.url}")

Integrating with OpenTelemetry¶

For full distributed tracing with automatic span creation, install the restgdf[telemetry] extra — see the :doc:/recipes/observability recipe. The example below shows how to add manual spans around resilience-wrapped calls using the structured error attributes:

from opentelemetry import trace

tracer = trace.get_tracer("restgdf-app")

async def otel_query(session, url, params):
    with tracer.start_as_current_span("arcgis.query", attributes={"url": url}) as span:
        try:
            async with session.get(url, params=params) as resp:
                span.set_attribute("http.status_code", resp.status)
                return await resp.json()
        except Exception as exc:
            span.set_status(trace.StatusCode.ERROR, str(exc))
            for attr in ("url", "status_code", "timeout_kind", "retry_after"):
                val = getattr(exc, attr, None)
                if val is not None:
                    span.set_attribute(f"restgdf.{attr}", val)
            raise

Monitoring 429 cooldowns¶

The CooldownRegistry and LimiterRegistry are per-ResilientSession (not process-global), so each session has independent rate-limit state.

When a 429 response is received, the retry wrapper:

Parses the Retry-After header (integer seconds or RFC 7231 HTTP-date).
Clamps the value to ResilienceConfig.respect_retry_after_max_s (default 60 s).
Sets a cooldown deadline for the service root — subsequent requests to the same FeatureServer/MapServer wait until the deadline passes.
Stamina’s exponential back-off then retries the request.

The cooldown is separate from the token-bucket limiter — a 429 does not drain AsyncLimiter tokens.