Utilities¶
Low-level helpers used by FeatureLayer and
Directory. Most users will never call these directly, but
they are public API and safe to use.
restgdf.utils.crawl, getinfo, token, and utils work in the
base pip install restgdf install. restgdf.utils.getgdf and other
GeoDataFrame/pandas-backed helpers require pip install "restgdf[geo]".
restgdf.utils.crawl¶
Recursive ArcGIS Server directory crawling and service discovery.
Provides fetch_all_data() for raw-dict output and safe_crawl()
for structured CrawlReport output with per-stage error
capture.
- async restgdf.utils.crawl.fetch_all_data(session, base_url, token=None, return_feature_count=False)[source]¶
Fetch all services and their layers in a highly concurrent manner.
- async restgdf.utils.crawl.safe_crawl(session, base_url, token=None, return_feature_count=False)[source]¶
Crawl an ArcGIS REST directory and aggregate results + errors.
Unlike
fetch_all_data(), this function never short-circuits on the first failure. Every recoverable error is captured as a typedCrawlErrorentry inCrawlReport.errorsand successful services are always present inCrawlReport.services.The three failure stages are
"base_metadata"(rootget_metadatacall),"folder_metadata"(per-folderget_metadatacall), and"service_metadata"(per-serviceservice_metadatacall). When a folder’s metadata fails, services discovered in earlier folders (and the base) are still returned.
restgdf.utils.getgdf¶
Get a GeoDataFrame from an ArcGIS FeatureLayer.
- restgdf.utils.getgdf.read_file(*args, **kwargs)[source]¶
Load a vector payload with geopandas only when geo support is needed.
- restgdf.utils.getgdf.get_sub_features(*args, **kwargs)[source]¶
Compatibility wrapper for the raw feature query helper.
- restgdf.utils.getgdf.combine_where_clauses(base_where, extra_where)[source]¶
Combine where clauses without changing the default all-records predicate.
- restgdf.utils.getgdf.chunk_values(values, chunk_size)[source]¶
Split values into evenly-sized chunks.
- async restgdf.utils.getgdf.get_query_data_batches(url, session, **kwargs)[source]¶
Build query payloads for each request needed to read a layer.
When the layer metadata advertises an explicit
advancedQueryCapabilities.maxRecordCountFactor(R-72), the value is forwarded tobuild_pagination_planasadvertised_factor=so pagination batch sizes honor the server-published upper bound. Layers that do not advertise the field keep today’s byte-for-byte batching: noadvertised_factorkwarg is supplied and the planner falls back to its_DEFAULT_FACTOR(1.0).Pages observed at stream time that return zero features while setting
exceededTransferLimit=trueare flagged withPaginationInconsistencyWarning(R-73) from the internal page resolver; see that helper for details.
- async restgdf.utils.getgdf.chunk_generator(url, session, **kwargs)[source]¶
Asynchronously yield GeoDataFrames from a FeatureLayer in chunks. This function retrieves GeoDataFrames in chunks based on the offset range and yields each GeoDataFrame as it is retrieved. Each yielded chunk has
gdf.attrs["spatial_reference"]populated from the layer’s metadata (R-65) when the layer reports a spatial reference.
- async restgdf.utils.getgdf.row_dict_generator(url, session, **kwargs)[source]¶
Yield row-shaped dicts from an ArcGIS FeatureLayer.
Deprecated since version 2.0: Module-level
row_dict_generatoris retained for backwards compatibility. Preferrestgdf.FeatureLayer.stream_rows()orrestgdf.adapters.stream.iter_rowsin new code.
restgdf.utils.getinfo¶
A package for getting GeoDataFrames from ArcGIS FeatureLayers.
Phase 2 split: this module is now a compatibility shim that re-exports public
names from the private submodules _http, _metadata, _query, and
_stats. The orchestration helpers get_offset_range and
service_metadata remain DEFINED here so that patches against
restgdf.utils.getinfo.<helper> continue to intercept their sibling calls
(see tests/test_getinfo_seams.py).
The from aiohttp import ClientSession line is PUBLIC API: tests patch
restgdf.utils.getinfo.ClientSession.post / .get. Do not remove it.
- class restgdf.utils.getinfo.ClientSession(base_url=None, *, connector=None, loop=None, cookies=None, headers=None, proxy=None, proxy_auth=None, skip_auto_headers=None, auth=None, json_serialize=<function dumps>, request_class=<class 'aiohttp.client_reqrep.ClientRequest'>, response_class=<class 'aiohttp.client_reqrep.ClientResponse'>, ws_response_class=<class 'aiohttp.client_ws.ClientWebSocketResponse'>, version=(1, 1), cookie_jar=None, connector_owner=True, raise_for_status=False, read_timeout=_SENTINEL.sentinel, conn_timeout=None, timeout=_SENTINEL.sentinel, auto_decompress=True, trust_env=False, requote_redirect_url=True, trace_configs=None, read_bufsize=65536, max_line_size=8190, max_field_size=8190, max_headers=128, fallback_charset_resolver=<function ClientSession.<lambda>>, middlewares=(), ssl_shutdown_timeout=_SENTINEL.sentinel)[source]¶
Bases:
objectFirst-class interface for making HTTP requests.
- ATTRS = frozenset({'_auto_decompress', '_base_url', '_base_url_origin', '_connector', '_connector_owner', '_cookie_jar', '_default_auth', '_default_headers', '_default_proxy', '_default_proxy_auth', '_json_serialize', '_loop', '_max_field_size', '_max_headers', '_max_line_size', '_middlewares', '_raise_for_status', '_read_bufsize', '_request_class', '_requote_redirect_url', '_resolve_charset', '_response_class', '_retry_connection', '_skip_auto_headers', '_source_traceback', '_timeout', '_trace_configs', '_trust_env', '_version', '_ws_response_class', 'requote_redirect_url'})¶
- ws_connect(url, *, method='GET', protocols=(), timeout=_SENTINEL.sentinel, receive_timeout=None, autoclose=True, autoping=True, heartbeat=None, auth=None, origin=None, params=None, headers=None, proxy=None, proxy_auth=None, ssl=True, verify_ssl=None, fingerprint=None, ssl_context=None, server_hostname=None, proxy_headers=None, compress=0, max_msg_size=4194304)[source]¶
Initiate websocket connection.
- property cookie_jar: AbstractCookieJar¶
The session cookies.
- property loop: AbstractEventLoop¶
Session’s loop.
- property timeout: ClientTimeout¶
Timeout for the session.
- property raise_for_status: bool | Callable[[ClientResponse], Awaitable[None]]¶
Should ClientResponse.raise_for_status() be called for each response.
- class restgdf.utils.getinfo.PaginationPlan(total_records, max_record_count, max_record_count_factor, effective_page_size, batches)[source]¶
Bases:
objectFrozen result of
build_pagination_plan().- Variables:
total_records (
int) – Layer-wide feature count the plan paginates over.max_record_count (
int) – Server-advertised per-page cap.max_record_count_factor (
float) – Effective factor after clamping against the advertised upper bound. Equals the caller-suppliedfactorwhen no clamp was applied.effective_page_size (
int) –max(1, int(max_record_count * max_record_count_factor))— the actual page size used to compute batches.batches (
tuple) – Tuple of(resultOffset, resultRecordCount)pairs. Empty whentotal_records == 0. Last pair’s count may be less thaneffective_page_size(partial tail page).
- restgdf.utils.getinfo.build_pagination_plan(total_records, max_record_count, *, factor=1.0, advertised_factor=None)[source]¶
Compute a
PaginationPlanfortotal_recordsrows.- Parameters:
total_records (
int) – Non-negative total row count (typically the result ofget_feature_count).max_record_count (
int) – Positive server-advertised per-page cap.factor (
float, optional) – Caller-supplied multiplier onmax_record_count. Defaults to 1.0 (puremax_record_countpagination).advertised_factor (
floatorNone, optional) – Server-advertisedadvancedQueryCapabilities.maxRecordCountFactorupper bound. When provided andfactor > advertised_factor, the factor is clamped down and a single warning is logged underrestgdf.pagination.
- Raises:
ValueError – If
total_records < 0,max_record_count <= 0, orfactor <= 0.
- restgdf.utils.getinfo.default_data(data=None, default_dict=None)[source]¶
Return a dict with default values for ArcGIS REST API requests.
- restgdf.utils.getinfo.default_headers(headers=None)[source]¶
Return request headers merged with ArcGIS-compatible defaults.
- async restgdf.utils.getinfo.get_feature_count(url, session, **kwargs)[source]¶
Get the feature count for a layer.
The JSON body is validated against
CountResponse(strict tier). A missing/ill-typedcountkey raisesRestgdfResponseErrorwith the original payload and request URL attached for operator triage.
- restgdf.utils.getinfo.get_fields_frame(layer_metadata)[source]¶
Get the fields of a layer as a DataFrame.
- restgdf.utils.getinfo.get_max_record_count(metadata)[source]¶
Get the maximum record count for a layer.
- async restgdf.utils.getinfo.get_metadata(url, session, token=None)[source]¶
Get the parsed metadata model for a layer.
The JSON body is validated against
LayerMetadata(permissive tier). Vendor-variance extras are preserved viaextra="allow"; missing fields default toNonerather than raise. Drift is logged throughrestgdf._models._driftrather than returned to the caller.
- restgdf.utils.getinfo.get_object_id_field(metadata)[source]¶
Get the object id field name for a layer.
- async restgdf.utils.getinfo.get_object_ids(url, session, **kwargs)[source]¶
Get the object id field name and matching object ids for a layer query.
The JSON body is validated against
ObjectIdsResponse(strict tier) so missing field names or non-list id payloads raiseRestgdfResponseErrorbefore the caller can misuse them. ArcGIS returnsobjectIds: nullfor zero-row matches; the model coerces that to[].
- async restgdf.utils.getinfo.get_offset_range(url, session, **kwargs)[source]¶
Get the offset range for a layer.
Orchestrator: resolves
get_feature_count,get_metadata, andget_max_record_countthrough this module’s namespace so thatunittest.mock.patch("restgdf.utils.getinfo.<helper>")intercepts.
- async restgdf.utils.getinfo.get_unique_values(url, fields, session, sortby=None, **kwargs)[source]¶
Get the unique values for a field.
- async restgdf.utils.getinfo.get_value_counts(url, field, session, **kwargs)[source]¶
Get the value counts for a field.
- restgdf.utils.getinfo.getfields(layer_metadata, types=False)¶
Get the fields of a layer.
- restgdf.utils.getinfo.getfields_df(layer_metadata)¶
Get the fields of a layer as a DataFrame.
- async restgdf.utils.getinfo.getuniquevalues(url, fields, session, sortby=None, **kwargs)¶
Get the unique values for a field.
- async restgdf.utils.getinfo.getvaluecounts(url, field, session, **kwargs)¶
Get the value counts for a field.
- async restgdf.utils.getinfo.nested_count(url, fields, session, **kwargs)[source]¶
Get the nested value counts for a field.
- async restgdf.utils.getinfo.nestedcount(url, fields, session, **kwargs)¶
Get the nested value counts for a field.
- async restgdf.utils.getinfo.service_metadata(session, service_url, token=None, return_feature_count=False, _sem=None)[source]¶
Asynchronously retrieve layers for a single service.
Orchestrator: resolves
get_metadataandget_feature_countthrough this module’s namespace so thatunittest.mock.patch("restgdf.utils.getinfo.<helper>")intercepts. The aggregated payload is validated againstLayerMetadatavia the drift adapter before being returned, so vendor-variance extras are logged (not raised) and callers get a typed envelope.BL-01:
_semis a private kwarg allowing a caller (e.g. thefetch_all_data/safe_crawlorchestrators) to share ONEBoundedSemaphoreacross nested fan-outs so the cap is global per top-level request. WhenNone, a fresh sem is created and the direct-call semantics are preserved.
restgdf.utils.utils¶
General-purpose string helpers for URL and query construction.
restgdf.utils.token¶
Token-session helpers for ArcGIS Online / Enterprise.
The AGOLUserPass and TokenSessionConfig models live in
restgdf._models.credentials. They are re-exported here for
backward compatibility with from restgdf.utils.token import
AGOLUserPass and with the public from restgdf import AGOLUserPass
surface documented in the README. The legacy frozen dataclass
AGOLUserPass was migrated to a pydantic StrictModel in v2.0.0;
the import path is unchanged but the constructor is keyword-only.
- class restgdf.utils.token.AGOLUserPass(*, username, password, referer=None, expiration=60)[source]¶
Bases:
StrictModelArcGIS Online / Enterprise credentials used to mint tokens.
passwordis stored aspydantic.SecretStr. Callcreds.password.get_secret_value()only at the HTTP-POST boundary; never store or log the unwrapped value.- password: SecretStr¶
- model_config = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class restgdf.utils.token.ArcGISTokenSession(session, credentials=None, token_url='https://www.arcgis.com/sharing/rest/generateToken', token_refresh_threshold=60, token=None, expires=None, verify_ssl=True, config=None)[source]¶
Bases:
objectWrap an aiohttp session with ArcGIS token refresh behavior.
Construction knobs (
token_url,token_refresh_threshold,credentials) are validated viaTokenSessionConfigin__post_init__()so a bogus scheme or zero-length username fails fast withRestgdfResponseErrorrather than surfacing as a 401 or anaiohttperror deep in the request path.- session: ClientSession¶
- credentials: AGOLUserPass | None = None¶
- config: TokenSessionConfig | None = None¶
- property expires_at: datetime | None¶
Return the token expiry as a tz-aware UTC
datetime.ArcGIS returns
expiresin either seconds or milliseconds since the Unix epoch — values above1e11are treated as milliseconds and divided by 1000. ReturnsNonewhen no expiry is set.
- update_dict(input_dict=None)[source]¶
Return a request payload/query dict merged with the active token.
- async update_token()[source]¶
Update the token by making a request to the token URL.
The
/generateTokenpayload is validated againstTokenResponse(strict tier) so malformed/error envelopes raiseRestgdfResponseErrorinstead ofKeyErrordeep in caller code paths.Retries up to
_MAX_TOKEN_RETRIEStimes with exponential backoff (base_BASE_BACKOFF_S) on transient network errors. Deterministic errors (bad credentials, content-type mismatches, validation failures) are re-raised immediately. After exhausting retries, raisesTokenRefreshFailedError.Emits structured log events: *
auth.refresh.start— before the POST *auth.refresh.success— after successful token update *auth.refresh.failure— on any exception
- async update_token_if_needed()[source]¶
Ensure the token is valid and refresh if necessary.
BL-03: concurrent callers racing on an expired token collapse onto a single
/generateTokenPOST via a lazily-initialized per-instanceasyncio.Lockwith a double-checkedtoken_needs_update()inside the lock (plan.md §3c R-18, kickoff phase-1a §10.4). The lock is created here — not in__post_init__— so instances constructed outside a running event loop (e.g. at import time or inside a sync test) never triggerDeprecationWarning: There is no current event loop.
- async get(url, params=None, headers=None, **kwargs)[source]¶
Make a GET request to the specified URL with the token.
- async post(url, data=None, headers=None, **kwargs)[source]¶
Make a POST request to the specified URL with the token.
- property closed: bool¶
Return
Truewhen the underlyingaiohttp.ClientSessionis closed.Delegating lets
ArcGISTokenSessionsatisfy the internalAsyncHTTPSessiontransport Protocol uniformly withaiohttp.ClientSession(R-71).
- async close()[source]¶
Close the underlying
aiohttp.ClientSession.Mirrors
aiohttp.ClientSession.close()so token sessions and raw aiohttp sessions are interchangeable through the internalAsyncHTTPSessiontransport Protocol. Idempotent: closing an already-closed session is a no-op.
- class restgdf.utils.token.TokenSessionConfig(*, token_url, credentials, transport='header', header_name='X-Esri-Authorization', referer=None, token=None, refresh_leeway_seconds=120, clock_skew_seconds=30, verify_ssl=True)[source]¶
Bases:
StrictModelValidated configuration for
ArcGISTokenSession.token_urlis intentionally a plainstrwith a custom validator rather thanpydantic.AnyHttpUrl. ArcGIS Enterprise deployments commonly run plain HTTP on internal networks, andAnyHttpUrlnormalizes/rejects real-world URLs (for example it appends trailing slashes and may reject edge cases). Accepting anyhttp://orhttps://string matches the behavior ArcGIS clients need.- Refresh semantics (BL-04 / R-36, R-37):
refresh_leeway_seconds(default120) — how far in advance of the token’s expiry the session eagerly refreshes.clock_skew_seconds(default30, capped at30when derived from the legacy alias) — extra padding for client / server clock drift.
refresh_threshold_secondsis retained as a deprecation-warning alias. Reads returnrefresh_leeway_seconds + clock_skew_seconds; writes via the constructor kwarg split the supplied total intoclock_skew_seconds = min(30, total)andrefresh_leeway_seconds = total - clock_skew_seconds.- credentials: AGOLUserPass¶
- transport: Literal['header', 'body', 'query']¶
- property refresh_threshold_seconds: int¶
Return the legacy threshold sum (
leeway + skew) and emit aDeprecationWarning.
- model_config = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- restgdf.utils.token.get_token(username, password)[source]¶
Synchronously request an ArcGIS Online token.
Deprecated since version 3.0: Use
ArcGISTokenSessioninstead for async token lifecycle.