Utilities

Low-level helpers used by FeatureLayer and Directory. Most users will never call these directly, but they are public API and safe to use.

restgdf.utils.crawl, getinfo, token, and utils work in the base pip install restgdf install. restgdf.utils.getgdf and other GeoDataFrame/pandas-backed helpers require pip install "restgdf[geo]".

restgdf.utils.crawl

Recursive ArcGIS Server directory crawling and service discovery.

Provides fetch_all_data() for raw-dict output and safe_crawl() for structured CrawlReport output with per-stage error capture.

async restgdf.utils.crawl.fetch_all_data(session, base_url, token=None, return_feature_count=False)[source]

Fetch all services and their layers in a highly concurrent manner.

async restgdf.utils.crawl.safe_crawl(session, base_url, token=None, return_feature_count=False)[source]

Crawl an ArcGIS REST directory and aggregate results + errors.

Unlike fetch_all_data(), this function never short-circuits on the first failure. Every recoverable error is captured as a typed CrawlError entry in CrawlReport.errors and successful services are always present in CrawlReport.services.

The three failure stages are "base_metadata" (root get_metadata call), "folder_metadata" (per-folder get_metadata call), and "service_metadata" (per-service service_metadata call). When a folder’s metadata fails, services discovered in earlier folders (and the base) are still returned.

restgdf.utils.getgdf

Get a GeoDataFrame from an ArcGIS FeatureLayer.

restgdf.utils.getgdf.read_file(*args, **kwargs)[source]

Load a vector payload with geopandas only when geo support is needed.

restgdf.utils.getgdf.get_sub_features(*args, **kwargs)[source]

Compatibility wrapper for the raw feature query helper.

restgdf.utils.getgdf.combine_where_clauses(base_where, extra_where)[source]

Combine where clauses without changing the default all-records predicate.

restgdf.utils.getgdf.chunk_values(values, chunk_size)[source]

Split values into evenly-sized chunks.

async restgdf.utils.getgdf.get_query_data_batches(url, session, **kwargs)[source]

Build query payloads for each request needed to read a layer.

When the layer metadata advertises an explicit advancedQueryCapabilities.maxRecordCountFactor (R-72), the value is forwarded to build_pagination_plan as advertised_factor= so pagination batch sizes honor the server-published upper bound. Layers that do not advertise the field keep today’s byte-for-byte batching: no advertised_factor kwarg is supplied and the planner falls back to its _DEFAULT_FACTOR (1.0).

Pages observed at stream time that return zero features while setting exceededTransferLimit=true are flagged with PaginationInconsistencyWarning (R-73) from the internal page resolver; see that helper for details.

async restgdf.utils.getgdf.get_sub_gdf(url, session, query_data, **kwargs)[source]
async restgdf.utils.getgdf.get_gdf_list(url, session, **kwargs)[source]
async restgdf.utils.getgdf.chunk_generator(url, session, **kwargs)[source]

Asynchronously yield GeoDataFrames from a FeatureLayer in chunks. This function retrieves GeoDataFrames in chunks based on the offset range and yields each GeoDataFrame as it is retrieved. Each yielded chunk has gdf.attrs["spatial_reference"] populated from the layer’s metadata (R-65) when the layer reports a spatial reference.

async restgdf.utils.getgdf.row_dict_generator(url, session, **kwargs)[source]

Yield row-shaped dicts from an ArcGIS FeatureLayer.

Deprecated since version 2.0: Module-level row_dict_generator is retained for backwards compatibility. Prefer restgdf.FeatureLayer.stream_rows() or restgdf.adapters.stream.iter_rows in new code.

async restgdf.utils.getgdf.concat_gdfs(gdfs)[source]
async restgdf.utils.getgdf.gdf_by_concat(url, session, **kwargs)[source]
async restgdf.utils.getgdf.get_gdf(url, session=None, where=None, token=None, **kwargs)[source]

restgdf.utils.getinfo

A package for getting GeoDataFrames from ArcGIS FeatureLayers.

Phase 2 split: this module is now a compatibility shim that re-exports public names from the private submodules _http, _metadata, _query, and _stats. The orchestration helpers get_offset_range and service_metadata remain DEFINED here so that patches against restgdf.utils.getinfo.<helper> continue to intercept their sibling calls (see tests/test_getinfo_seams.py).

The from aiohttp import ClientSession line is PUBLIC API: tests patch restgdf.utils.getinfo.ClientSession.post / .get. Do not remove it.

class restgdf.utils.getinfo.ClientSession(base_url=None, *, connector=None, loop=None, cookies=None, headers=None, proxy=None, proxy_auth=None, skip_auto_headers=None, auth=None, json_serialize=<function dumps>, request_class=<class 'aiohttp.client_reqrep.ClientRequest'>, response_class=<class 'aiohttp.client_reqrep.ClientResponse'>, ws_response_class=<class 'aiohttp.client_ws.ClientWebSocketResponse'>, version=(1, 1), cookie_jar=None, connector_owner=True, raise_for_status=False, read_timeout=_SENTINEL.sentinel, conn_timeout=None, timeout=_SENTINEL.sentinel, auto_decompress=True, trust_env=False, requote_redirect_url=True, trace_configs=None, read_bufsize=65536, max_line_size=8190, max_field_size=8190, max_headers=128, fallback_charset_resolver=<function ClientSession.<lambda>>, middlewares=(), ssl_shutdown_timeout=_SENTINEL.sentinel)[source]

Bases: object

First-class interface for making HTTP requests.

ATTRS = frozenset({'_auto_decompress', '_base_url', '_base_url_origin', '_connector', '_connector_owner', '_cookie_jar', '_default_auth', '_default_headers', '_default_proxy', '_default_proxy_auth', '_json_serialize', '_loop', '_max_field_size', '_max_headers', '_max_line_size', '_middlewares', '_raise_for_status', '_read_bufsize', '_request_class', '_requote_redirect_url', '_resolve_charset', '_response_class', '_retry_connection', '_skip_auto_headers', '_source_traceback', '_timeout', '_trace_configs', '_trust_env', '_version', '_ws_response_class', 'requote_redirect_url'})
request(method, url, **kwargs)[source]

Perform HTTP request.

ws_connect(url, *, method='GET', protocols=(), timeout=_SENTINEL.sentinel, receive_timeout=None, autoclose=True, autoping=True, heartbeat=None, auth=None, origin=None, params=None, headers=None, proxy=None, proxy_auth=None, ssl=True, verify_ssl=None, fingerprint=None, ssl_context=None, server_hostname=None, proxy_headers=None, compress=0, max_msg_size=4194304)[source]

Initiate websocket connection.

get(url, *, allow_redirects=True, **kwargs)[source]

Perform HTTP GET request.

options(url, *, allow_redirects=True, **kwargs)[source]

Perform HTTP OPTIONS request.

head(url, *, allow_redirects=False, **kwargs)[source]

Perform HTTP HEAD request.

post(url, *, data=None, **kwargs)[source]

Perform HTTP POST request.

put(url, *, data=None, **kwargs)[source]

Perform HTTP PUT request.

patch(url, *, data=None, **kwargs)[source]

Perform HTTP PATCH request.

delete(url, **kwargs)[source]

Perform HTTP DELETE request.

async close()[source]

Close underlying connector.

Release all acquired resources.

property closed: bool

Is client session closed.

A readonly property.

property connector: BaseConnector | None

Connector instance used for the session.

property cookie_jar: AbstractCookieJar

The session cookies.

property version: Tuple[int, int]

The session HTTP protocol version.

property requote_redirect_url: bool

Do URL requoting on redirection handling.

property loop: AbstractEventLoop

Session’s loop.

property timeout: ClientTimeout

Timeout for the session.

property headers: CIMultiDict[str]

The default headers of the client session.

property skip_auto_headers: FrozenSet[istr]

Headers for which autogeneration should be skipped

property auth: BasicAuth | None

An object that represents HTTP Basic Authorization

property json_serialize: Callable[[Any], str]

Json serializer callable

property connector_owner: bool

Should connector be closed on session closing

property raise_for_status: bool | Callable[[ClientResponse], Awaitable[None]]

Should ClientResponse.raise_for_status() be called for each response.

property auto_decompress: bool

Should the body response be automatically decompressed.

property trust_env: bool

Should proxies information from environment or netrc be trusted.

Information is from HTTP_PROXY / HTTPS_PROXY environment variables or ~/.netrc file if present.

property trace_configs: List[TraceConfig]

A list of TraceConfig instances used for client tracing

detach()[source]

Detach connector from session without closing the former.

Session is switched to closed state anyway.

class restgdf.utils.getinfo.PaginationPlan(total_records, max_record_count, max_record_count_factor, effective_page_size, batches)[source]

Bases: object

Frozen result of build_pagination_plan().

Variables:
  • total_records (int) – Layer-wide feature count the plan paginates over.

  • max_record_count (int) – Server-advertised per-page cap.

  • max_record_count_factor (float) – Effective factor after clamping against the advertised upper bound. Equals the caller-supplied factor when no clamp was applied.

  • effective_page_size (int) – max(1, int(max_record_count * max_record_count_factor)) — the actual page size used to compute batches.

  • batches (tuple) – Tuple of (resultOffset, resultRecordCount) pairs. Empty when total_records == 0. Last pair’s count may be less than effective_page_size (partial tail page).

total_records: int
max_record_count: int
max_record_count_factor: float
effective_page_size: int
batches: tuple[tuple[int, int], ...]
restgdf.utils.getinfo.build_pagination_plan(total_records, max_record_count, *, factor=1.0, advertised_factor=None)[source]

Compute a PaginationPlan for total_records rows.

Parameters:
  • total_records (int) – Non-negative total row count (typically the result of get_feature_count).

  • max_record_count (int) – Positive server-advertised per-page cap.

  • factor (float, optional) – Caller-supplied multiplier on max_record_count. Defaults to 1.0 (pure max_record_count pagination).

  • advertised_factor (float or None, optional) – Server-advertised advancedQueryCapabilities.maxRecordCountFactor upper bound. When provided and factor > advertised_factor, the factor is clamped down and a single warning is logged under restgdf.pagination.

Raises:

ValueError – If total_records < 0, max_record_count <= 0, or factor <= 0.

restgdf.utils.getinfo.default_data(data=None, default_dict=None)[source]

Return a dict with default values for ArcGIS REST API requests.

restgdf.utils.getinfo.default_headers(headers=None)[source]

Return request headers merged with ArcGIS-compatible defaults.

async restgdf.utils.getinfo.get_feature_count(url, session, **kwargs)[source]

Get the feature count for a layer.

The JSON body is validated against CountResponse (strict tier). A missing/ill-typed count key raises RestgdfResponseError with the original payload and request URL attached for operator triage.

restgdf.utils.getinfo.get_fields(layer_metadata, types=False)[source]

Get the fields of a layer.

restgdf.utils.getinfo.get_fields_frame(layer_metadata)[source]

Get the fields of a layer as a DataFrame.

restgdf.utils.getinfo.get_max_record_count(metadata)[source]

Get the maximum record count for a layer.

async restgdf.utils.getinfo.get_metadata(url, session, token=None)[source]

Get the parsed metadata model for a layer.

The JSON body is validated against LayerMetadata (permissive tier). Vendor-variance extras are preserved via extra="allow"; missing fields default to None rather than raise. Drift is logged through restgdf._models._drift rather than returned to the caller.

restgdf.utils.getinfo.get_name(metadata)[source]

Get the name of a layer.

restgdf.utils.getinfo.get_object_id_field(metadata)[source]

Get the object id field name for a layer.

async restgdf.utils.getinfo.get_object_ids(url, session, **kwargs)[source]

Get the object id field name and matching object ids for a layer query.

The JSON body is validated against ObjectIdsResponse (strict tier) so missing field names or non-list id payloads raise RestgdfResponseError before the caller can misuse them. ArcGIS returns objectIds: null for zero-row matches; the model coerces that to [].

async restgdf.utils.getinfo.get_offset_range(url, session, **kwargs)[source]

Get the offset range for a layer.

Orchestrator: resolves get_feature_count, get_metadata, and get_max_record_count through this module’s namespace so that unittest.mock.patch("restgdf.utils.getinfo.<helper>") intercepts.

async restgdf.utils.getinfo.get_unique_values(url, fields, session, sortby=None, **kwargs)[source]

Get the unique values for a field.

async restgdf.utils.getinfo.get_value_counts(url, field, session, **kwargs)[source]

Get the value counts for a field.

restgdf.utils.getinfo.getfields(layer_metadata, types=False)

Get the fields of a layer.

restgdf.utils.getinfo.getfields_df(layer_metadata)

Get the fields of a layer as a DataFrame.

async restgdf.utils.getinfo.getuniquevalues(url, fields, session, sortby=None, **kwargs)

Get the unique values for a field.

async restgdf.utils.getinfo.getvaluecounts(url, field, session, **kwargs)

Get the value counts for a field.

async restgdf.utils.getinfo.nested_count(url, fields, session, **kwargs)[source]

Get the nested value counts for a field.

async restgdf.utils.getinfo.nestedcount(url, fields, session, **kwargs)

Get the nested value counts for a field.

async restgdf.utils.getinfo.service_metadata(session, service_url, token=None, return_feature_count=False, _sem=None)[source]

Asynchronously retrieve layers for a single service.

Orchestrator: resolves get_metadata and get_feature_count through this module’s namespace so that unittest.mock.patch("restgdf.utils.getinfo.<helper>") intercepts. The aggregated payload is validated against LayerMetadata via the drift adapter before being returned, so vendor-variance extras are logged (not raised) and callers get a typed envelope.

BL-01: _sem is a private kwarg allowing a caller (e.g. the fetch_all_data / safe_crawl orchestrators) to share ONE BoundedSemaphore across nested fan-outs so the cap is global per top-level request. When None, a fresh sem is created and the direct-call semantics are preserved.

restgdf.utils.getinfo.supports_pagination(metadata)[source]

Return whether the layer supports resultOffset/resultRecordCount pagination.

restgdf.utils.utils

General-purpose string helpers for URL and query construction.

restgdf.utils.utils.ends_with_num(url)[source]

Return True if the given URL ends with a number.

restgdf.utils.utils.where_var_in_list(var, vals)[source]

Return a where clause for a variable in a list of values.

restgdf.utils.token

Token-session helpers for ArcGIS Online / Enterprise.

The AGOLUserPass and TokenSessionConfig models live in restgdf._models.credentials. They are re-exported here for backward compatibility with from restgdf.utils.token import AGOLUserPass and with the public from restgdf import AGOLUserPass surface documented in the README. The legacy frozen dataclass AGOLUserPass was migrated to a pydantic StrictModel in v2.0.0; the import path is unchanged but the constructor is keyword-only.

class restgdf.utils.token.AGOLUserPass(*, username, password, referer=None, expiration=60)[source]

Bases: StrictModel

ArcGIS Online / Enterprise credentials used to mint tokens.

password is stored as pydantic.SecretStr. Call creds.password.get_secret_value() only at the HTTP-POST boundary; never store or log the unwrapped value.

username: str
password: SecretStr
referer: str | None
expiration: int
model_config = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class restgdf.utils.token.ArcGISTokenSession(session, credentials=None, token_url='https://www.arcgis.com/sharing/rest/generateToken', token_refresh_threshold=60, token=None, expires=None, verify_ssl=True, config=None)[source]

Bases: object

Wrap an aiohttp session with ArcGIS token refresh behavior.

Construction knobs (token_url, token_refresh_threshold, credentials) are validated via TokenSessionConfig in __post_init__() so a bogus scheme or zero-length username fails fast with RestgdfResponseError rather than surfacing as a 401 or an aiohttp error deep in the request path.

session: ClientSession
credentials: AGOLUserPass | None = None
token_url: str = 'https://www.arcgis.com/sharing/rest/generateToken'
token_refresh_threshold: int = 60
token: str | None = None
expires: int | float | None = None
verify_ssl: bool = True
config: TokenSessionConfig | None = None
property token_request_payload: dict

Return the payload for the token request.

property expires_at: datetime | None

Return the token expiry as a tz-aware UTC datetime.

ArcGIS returns expires in either seconds or milliseconds since the Unix epoch — values above 1e11 are treated as milliseconds and divided by 1000. Returns None when no expiry is set.

property auth_headers: dict[str, str]

Return authentication headers with the token if available.

update_headers(headers=None)[source]

Return headers merged with the active token.

update_dict(input_dict=None)[source]

Return a request payload/query dict merged with the active token.

async update_token()[source]

Update the token by making a request to the token URL.

The /generateToken payload is validated against TokenResponse (strict tier) so malformed/error envelopes raise RestgdfResponseError instead of KeyError deep in caller code paths.

Retries up to _MAX_TOKEN_RETRIES times with exponential backoff (base _BASE_BACKOFF_S) on transient network errors. Deterministic errors (bad credentials, content-type mismatches, validation failures) are re-raised immediately. After exhausting retries, raises TokenRefreshFailedError.

Emits structured log events: * auth.refresh.start — before the POST * auth.refresh.success — after successful token update * auth.refresh.failure — on any exception

token_needs_update()[source]

Check if the token needs to be updated.

async update_token_if_needed()[source]

Ensure the token is valid and refresh if necessary.

BL-03: concurrent callers racing on an expired token collapse onto a single /generateToken POST via a lazily-initialized per-instance asyncio.Lock with a double-checked token_needs_update() inside the lock (plan.md §3c R-18, kickoff phase-1a §10.4). The lock is created here — not in __post_init__ — so instances constructed outside a running event loop (e.g. at import time or inside a sync test) never trigger DeprecationWarning: There is no current event loop.

async get(url, params=None, headers=None, **kwargs)[source]

Make a GET request to the specified URL with the token.

async post(url, data=None, headers=None, **kwargs)[source]

Make a POST request to the specified URL with the token.

property closed: bool

Return True when the underlying aiohttp.ClientSession is closed.

Delegating lets ArcGISTokenSession satisfy the internal AsyncHTTPSession transport Protocol uniformly with aiohttp.ClientSession (R-71).

async close()[source]

Close the underlying aiohttp.ClientSession.

Mirrors aiohttp.ClientSession.close() so token sessions and raw aiohttp sessions are interchangeable through the internal AsyncHTTPSession transport Protocol. Idempotent: closing an already-closed session is a no-op.

class restgdf.utils.token.TokenSessionConfig(*, token_url, credentials, transport='header', header_name='X-Esri-Authorization', referer=None, token=None, refresh_leeway_seconds=120, clock_skew_seconds=30, verify_ssl=True)[source]

Bases: StrictModel

Validated configuration for ArcGISTokenSession.

token_url is intentionally a plain str with a custom validator rather than pydantic.AnyHttpUrl. ArcGIS Enterprise deployments commonly run plain HTTP on internal networks, and AnyHttpUrl normalizes/rejects real-world URLs (for example it appends trailing slashes and may reject edge cases). Accepting any http:// or https:// string matches the behavior ArcGIS clients need.

Refresh semantics (BL-04 / R-36, R-37):
  • refresh_leeway_seconds (default 120) — how far in advance of the token’s expiry the session eagerly refreshes.

  • clock_skew_seconds (default 30, capped at 30 when derived from the legacy alias) — extra padding for client / server clock drift.

refresh_threshold_seconds is retained as a deprecation-warning alias. Reads return refresh_leeway_seconds + clock_skew_seconds; writes via the constructor kwarg split the supplied total into clock_skew_seconds = min(30, total) and refresh_leeway_seconds = total - clock_skew_seconds.

token_url: str
credentials: AGOLUserPass
transport: Literal['header', 'body', 'query']
header_name: str
referer: str | None
token: SecretStr | None
refresh_leeway_seconds: int
clock_skew_seconds: int
verify_ssl: bool
property refresh_threshold_seconds: int

Return the legacy threshold sum (leeway + skew) and emit a DeprecationWarning.

model_config = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

restgdf.utils.token.get_token(username, password)[source]

Synchronously request an ArcGIS Online token.

Deprecated since version 3.0: Use ArcGISTokenSession instead for async token lifecycle.