Streaming features¶
restgdf 3.0 exposes four streaming shapes on every
FeatureLayer. Three of them (stream_features,
stream_feature_batches, stream_rows) are built on top of the same
low-level iter_pages primitive and share its knobs. The fourth
(stream_gdf_chunks) is the legacy GeoDataFrame-per-page shape
backed by chunk_generator; it yields in completion order, does not
accept on_truncation / order / max_concurrent_pages, and does
not emit the R-61 feature_layer.stream parent span. All four are
safe to use on a base install unless explicitly noted.
Method |
Yields |
Install |
|---|---|---|
|
one raw ArcGIS feature dict |
base |
|
one |
base |
|
one row-shaped dict (attrs+geom) |
base |
|
one |
|
iter_pages is the raw generator underneath — yields full response
envelopes (features, objectIdFieldName, exceededTransferLimit, …).
Reach for it when you need the wire shape.
The three streaming shapes¶
stream_features — one feature at a time¶
import asyncio
from aiohttp import ClientSession
from restgdf import FeatureLayer
URL = "https://maps1.vcgov.org/arcgis/rest/services/Beaches/MapServer/6"
async def main():
async with ClientSession() as session:
layer = await FeatureLayer.from_url(URL, session=session)
n = 0
async for feature in layer.stream_features():
n += 1
return n
asyncio.run(main())
stream_features and iter_features are deliberate aliases — use
stream_features in new code; iter_features remains the lower-level
iterator primitive for introspection.
stream_feature_batches — page boundaries preserved¶
async def main():
async with ClientSession() as session:
layer = await FeatureLayer.from_url(URL, session=session)
async for batch in layer.stream_feature_batches():
# `batch` is exactly one page's features; use for backpressure,
# per-page normalization, or bulk DB inserts.
print(f"page of {len(batch)}")
stream_gdf_chunks — one GeoDataFrame per page (requires restgdf[geo])¶
async def main():
async with ClientSession() as session:
layer = await FeatureLayer.from_url(URL, session=session)
async for gdf_chunk in layer.stream_gdf_chunks():
# `gdf_chunk.attrs["spatial_reference"]` is populated from
# layer metadata (R-65) — same on every chunk.
print(gdf_chunk.shape, gdf_chunk.attrs.get("spatial_reference"))
Install the extra first:
pip install "restgdf[geo]"
Note
stream_gdf_chunks is backed by the legacy chunk_generator
pipeline, not iter_pages. It yields chunks in completion
order and does not accept on_truncation, order, or
max_concurrent_pages, and it does not emit the
feature_layer.stream parent span. If you need those knobs on geo
output, compose stream_rows or stream_features with your own
geometry assembly, or call get_gdf / get_gdf_list for a single-
shot batch.
stream_rows is the row-shaped sibling of stream_features: each item
is {**feature["attributes"], "geometry": feature.get("geometry")}. Use
it to feed the pandas/geopandas adapters without loading an entire layer
into memory.
Truncation handling: on_truncation¶
ArcGIS responses can set exceededTransferLimit=true when the server
could not pack every matching feature into a single page. The three
stream_features, stream_feature_batches, stream_rows, and
iter_pages accept on_truncation:
# Default: raise — safest for correctness-sensitive pipelines.
async for feat in layer.stream_features(on_truncation="raise"):
...
# Raises RestgdfResponseError(context='exceededTransferLimit') if any
# page is truncated.
# Log + continue — best when you know the server always truncates and
# your downstream pipeline is OK with partial pages.
async for feat in layer.stream_features(on_truncation="ignore"):
...
# Emits a structured warning on the `restgdf.pagination` logger.
# Bisect + recurse — fetch the missing records by splitting the OID list.
async for feat in layer.stream_features(on_truncation="split"):
...
# Up to 32 levels of recursion; irreducible partitions raise.
"split" requires an OID field on the layer and one additional
get_object_ids round-trip per split. It is the right choice when you
need completeness and cannot pre-compute page sizes.
Ordering: order="request" vs order="completion"¶
# Default: yield in submit order. Deterministic, easy to reason about.
async for batch in layer.stream_feature_batches(order="request"):
...
# Yield as fetches complete — may interleave; throughput > ordering.
async for batch in layer.stream_feature_batches(order="completion"):
...
Caveat: order="completion" does not preserve pagination order. If
your downstream logic assumes ascending objectIds or append-only
semantics (e.g. writing to a sorted file, computing running aggregates),
stick with "request".
Throughput: max_concurrent_pages¶
# Unbounded (default). Easy to overwhelm slow services.
async for feat in layer.stream_features():
...
# Cap concurrent in-flight page fetches.
async for feat in layer.stream_features(max_concurrent_pages=4):
...
max_concurrent_pages is in addition to
ConcurrencyConfig.max_concurrent_requests (which caps fan-out at
the top-level orchestration layer) — the more restrictive of the two
wins.
What about iter_pages?¶
iter_pages is the low-level generator that the three
iter_pages-based shapes (stream_features, stream_feature_batches,
stream_rows) compose on top of. Reach for it when you need the full
response envelope (exceededTransferLimit, pagination tokens, or
server-side warnings):
async for page in layer.iter_pages(on_truncation="ignore"):
if page.get("exceededTransferLimit"):
# Custom handling here instead of the built-in split/raise.
...
for feature in page.get("features", []):
yield feature
Deprecated: FeatureLayer.row_dict_generator¶
row_dict_generator still works but now emits a DeprecationWarning.
Migrate to stream_rows:
# Before
async for row in layer.row_dict_generator():
...
# After
async for row in layer.stream_rows():
...
Behavior is equivalent; stream_rows gains the streaming knobs
(order, max_concurrent_pages, on_truncation) and the R-61 parent
span.
Observability¶
When telemetry is enabled (restgdf[telemetry] +
RESTGDF_TELEMETRY_ENABLED=1), every iter_pages call emits exactly
one INTERNAL parent span named feature_layer.stream wrapping the
per-page loop. No per-page child spans are emitted. See
Tracing & Observability for the full span wiring, logger names, and
OpenTelemetry integration.