Utilities

Low-level helpers used by FeatureLayer and Directory. Most users will never call these directly, but they are public API and safe to use.

restgdf.utils.crawl

async restgdf.utils.crawl.fetch_all_data(session, base_url, token=None, return_feature_count=False)[source]

Fetch all services and their layers in a highly concurrent manner.

async restgdf.utils.crawl.safe_crawl(session, base_url, token=None, return_feature_count=False)[source]

Crawl an ArcGIS REST directory and aggregate results + errors.

Unlike fetch_all_data(), this function never short-circuits on the first failure. Every recoverable error is captured as a typed CrawlError entry in CrawlReport.errors and successful services are always present in CrawlReport.services.

The three failure stages are "base_metadata" (root get_metadata call), "folder_metadata" (per-folder get_metadata call), and "service_metadata" (per-service service_metadata call). When a folder’s metadata fails, services discovered in earlier folders (and the base) are still returned.

restgdf.utils.getgdf

Get a GeoDataFrame from an ArcGIS FeatureLayer.

restgdf.utils.getgdf.combine_where_clauses(base_where, extra_where)[source]

Combine where clauses without changing the default all-records predicate.

restgdf.utils.getgdf.chunk_values(values, chunk_size)[source]

Split values into evenly-sized chunks.

async restgdf.utils.getgdf.get_query_data_batches(url, session, **kwargs)[source]

Build query payloads for each request needed to read a layer.

async restgdf.utils.getgdf.get_sub_gdf(url, session, query_data, **kwargs)[source]
async restgdf.utils.getgdf.get_gdf_list(url, session, **kwargs)[source]
async restgdf.utils.getgdf.chunk_generator(url, session, **kwargs)[source]

Asynchronously yield GeoDataFrames from a FeatureLayer in chunks. This function retrieves GeoDataFrames in chunks based on the offset range and yields each GeoDataFrame as it is retrieved.

async restgdf.utils.getgdf.row_dict_generator(url, session, **kwargs)[source]
async restgdf.utils.getgdf.concat_gdfs(gdfs)[source]
async restgdf.utils.getgdf.gdf_by_concat(url, session, **kwargs)[source]
async restgdf.utils.getgdf.get_gdf(url, session=None, where=None, token=None, **kwargs)[source]

restgdf.utils.getinfo

A package for getting GeoDataFrames from ArcGIS FeatureLayers.

Phase 2 split: this module is now a compatibility shim that re-exports public names from the private submodules _http, _metadata, _query, and _stats. The orchestration helpers get_offset_range and service_metadata remain DEFINED here so that patches against restgdf.utils.getinfo.<helper> continue to intercept their sibling calls (see tests/test_getinfo_seams.py).

The from aiohttp import ClientSession line is PUBLIC API: tests patch restgdf.utils.getinfo.ClientSession.post / .get. Do not remove it.

class restgdf.utils.getinfo.ClientSession(base_url=None, *, connector=None, loop=None, cookies=None, headers=None, proxy=None, proxy_auth=None, skip_auto_headers=None, auth=None, json_serialize=<function dumps>, request_class=<class 'aiohttp.client_reqrep.ClientRequest'>, response_class=<class 'aiohttp.client_reqrep.ClientResponse'>, ws_response_class=<class 'aiohttp.client_ws.ClientWebSocketResponse'>, version=(1, 1), cookie_jar=None, connector_owner=True, raise_for_status=False, read_timeout=_SENTINEL.sentinel, conn_timeout=None, timeout=_SENTINEL.sentinel, auto_decompress=True, trust_env=False, requote_redirect_url=True, trace_configs=None, read_bufsize=65536, max_line_size=8190, max_field_size=8190, max_headers=128, fallback_charset_resolver=<function ClientSession.<lambda>>, middlewares=(), ssl_shutdown_timeout=_SENTINEL.sentinel)[source]

Bases: object

First-class interface for making HTTP requests.

ATTRS = frozenset({'_auto_decompress', '_base_url', '_base_url_origin', '_connector', '_connector_owner', '_cookie_jar', '_default_auth', '_default_headers', '_default_proxy', '_default_proxy_auth', '_json_serialize', '_loop', '_max_field_size', '_max_headers', '_max_line_size', '_middlewares', '_raise_for_status', '_read_bufsize', '_request_class', '_requote_redirect_url', '_resolve_charset', '_response_class', '_retry_connection', '_skip_auto_headers', '_source_traceback', '_timeout', '_trace_configs', '_trust_env', '_version', '_ws_response_class', 'requote_redirect_url'})
request(method, url, **kwargs)[source]

Perform HTTP request.

ws_connect(url, *, method='GET', protocols=(), timeout=_SENTINEL.sentinel, receive_timeout=None, autoclose=True, autoping=True, heartbeat=None, auth=None, origin=None, params=None, headers=None, proxy=None, proxy_auth=None, ssl=True, verify_ssl=None, fingerprint=None, ssl_context=None, server_hostname=None, proxy_headers=None, compress=0, max_msg_size=4194304)[source]

Initiate websocket connection.

get(url, *, allow_redirects=True, **kwargs)[source]

Perform HTTP GET request.

options(url, *, allow_redirects=True, **kwargs)[source]

Perform HTTP OPTIONS request.

head(url, *, allow_redirects=False, **kwargs)[source]

Perform HTTP HEAD request.

post(url, *, data=None, **kwargs)[source]

Perform HTTP POST request.

put(url, *, data=None, **kwargs)[source]

Perform HTTP PUT request.

patch(url, *, data=None, **kwargs)[source]

Perform HTTP PATCH request.

delete(url, **kwargs)[source]

Perform HTTP DELETE request.

async close()[source]

Close underlying connector.

Release all acquired resources.

property closed: bool

Is client session closed.

A readonly property.

property connector: BaseConnector | None

Connector instance used for the session.

property cookie_jar: AbstractCookieJar

The session cookies.

property version: Tuple[int, int]

The session HTTP protocol version.

property requote_redirect_url: bool

Do URL requoting on redirection handling.

property loop: AbstractEventLoop

Session’s loop.

property timeout: ClientTimeout

Timeout for the session.

property headers: CIMultiDict[str]

The default headers of the client session.

property skip_auto_headers: FrozenSet[istr]

Headers for which autogeneration should be skipped

property auth: BasicAuth | None

An object that represents HTTP Basic Authorization

property json_serialize: Callable[[Any], str]

Json serializer callable

property connector_owner: bool

Should connector be closed on session closing

property raise_for_status: bool | Callable[[ClientResponse], Awaitable[None]]

Should ClientResponse.raise_for_status() be called for each response.

property auto_decompress: bool

Should the body response be automatically decompressed.

property trust_env: bool

Should proxies information from environment or netrc be trusted.

Information is from HTTP_PROXY / HTTPS_PROXY environment variables or ~/.netrc file if present.

property trace_configs: List[TraceConfig]

A list of TraceConfig instances used for client tracing

detach()[source]

Detach connector from session without closing the former.

Session is switched to closed state anyway.

restgdf.utils.getinfo.default_data(data=None, default_dict=None)[source]

Return a dict with default values for ArcGIS REST API requests.

restgdf.utils.getinfo.default_headers(headers=None)[source]

Return request headers merged with ArcGIS-compatible defaults.

async restgdf.utils.getinfo.get_feature_count(url, session, **kwargs)[source]

Get the feature count for a layer.

The JSON body is validated against CountResponse (strict tier). A missing/ill-typed count key raises RestgdfResponseError with the original payload and request URL attached for operator triage.

restgdf.utils.getinfo.get_fields(layer_metadata, types=False)[source]

Get the fields of a layer.

restgdf.utils.getinfo.get_fields_frame(layer_metadata)[source]

Get the fields of a layer as a DataFrame.

restgdf.utils.getinfo.get_max_record_count(metadata)[source]

Get the maximum record count for a layer.

async restgdf.utils.getinfo.get_metadata(url, session, token=None)[source]

Get the parsed metadata model for a layer.

The JSON body is validated against LayerMetadata (permissive tier). Vendor-variance extras are preserved via extra="allow"; missing fields default to None rather than raise. Drift is logged through restgdf._models._drift rather than returned to the caller.

restgdf.utils.getinfo.get_name(metadata)[source]

Get the name of a layer.

restgdf.utils.getinfo.get_object_id_field(metadata)[source]

Get the object id field name for a layer.

async restgdf.utils.getinfo.get_object_ids(url, session, **kwargs)[source]

Get the object id field name and matching object ids for a layer query.

The JSON body is validated against ObjectIdsResponse (strict tier) so missing field names or non-list id payloads raise RestgdfResponseError before the caller can misuse them. ArcGIS returns objectIds: null for zero-row matches; the model coerces that to [].

async restgdf.utils.getinfo.get_offset_range(url, session, **kwargs)[source]

Get the offset range for a layer.

Orchestrator: resolves get_feature_count, get_metadata, and get_max_record_count through this module’s namespace so that unittest.mock.patch("restgdf.utils.getinfo.<helper>") intercepts.

async restgdf.utils.getinfo.get_unique_values(url, fields, session, sortby=None, **kwargs)[source]

Get the unique values for a field.

async restgdf.utils.getinfo.get_value_counts(url, field, session, **kwargs)[source]

Get the value counts for a field.

restgdf.utils.getinfo.getfields(layer_metadata, types=False)

Get the fields of a layer.

restgdf.utils.getinfo.getfields_df(layer_metadata)

Get the fields of a layer as a DataFrame.

async restgdf.utils.getinfo.getuniquevalues(url, fields, session, sortby=None, **kwargs)

Get the unique values for a field.

async restgdf.utils.getinfo.getvaluecounts(url, field, session, **kwargs)

Get the value counts for a field.

async restgdf.utils.getinfo.nested_count(url, fields, session, **kwargs)[source]

Get the nested value counts for a field.

async restgdf.utils.getinfo.nestedcount(url, fields, session, **kwargs)

Get the nested value counts for a field.

async restgdf.utils.getinfo.service_metadata(session, service_url, token=None, return_feature_count=False)[source]

Asynchronously retrieve layers for a single service.

Orchestrator: resolves get_metadata and get_feature_count through this module’s namespace so that unittest.mock.patch("restgdf.utils.getinfo.<helper>") intercepts. The aggregated payload is validated against LayerMetadata via the drift adapter before being returned, so vendor-variance extras are logged (not raised) and callers get a typed envelope.

restgdf.utils.getinfo.supports_pagination(metadata)[source]

Return whether the layer supports resultOffset/resultRecordCount pagination.

restgdf.utils.utils

restgdf.utils.utils.ends_with_num(url)[source]

Return True if the given URL ends with a number.

restgdf.utils.utils.where_var_in_list(var, vals)[source]

Return a where clause for a variable in a list of values.

restgdf.utils.token

Token-session helpers for ArcGIS Online / Enterprise.

The AGOLUserPass and TokenSessionConfig models live in restgdf._models.credentials. They are re-exported here for backward compatibility with from restgdf.utils.token import AGOLUserPass and with the public from restgdf import AGOLUserPass surface documented in the README. The legacy frozen dataclass AGOLUserPass was migrated to a pydantic StrictModel in v2.0.0; the import path is unchanged but the constructor is keyword-only.

class restgdf.utils.token.AGOLUserPass(*, username, password, referer=None, expiration=60)[source]

Bases: StrictModel

ArcGIS Online / Enterprise credentials used to mint tokens.

password is stored as pydantic.SecretStr. Call creds.password.get_secret_value() only at the HTTP-POST boundary; never store or log the unwrapped value.

username: str
password: SecretStr
referer: str | None
expiration: int
model_config = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class restgdf.utils.token.ArcGISTokenSession(session, credentials=None, token_url='https://www.arcgis.com/sharing/rest/generateToken', token_refresh_threshold=60, token=None, expires=None, verify_ssl=True, config=None)[source]

Bases: object

Wrap an aiohttp session with ArcGIS token refresh behavior.

Construction knobs (token_url, token_refresh_threshold, credentials) are validated via TokenSessionConfig in __post_init__() so a bogus scheme or zero-length username fails fast with RestgdfResponseError rather than surfacing as a 401 or an aiohttp error deep in the request path.

session: ClientSession
credentials: AGOLUserPass | None = None
token_url: str = 'https://www.arcgis.com/sharing/rest/generateToken'
token_refresh_threshold: int = 60
token: str | None = None
expires: int | float | None = None
verify_ssl: bool = True
config: TokenSessionConfig | None = None
property token_request_payload: dict

Return the payload for the token request.

property auth_headers: dict[str, str]

Return authentication headers with the token if available.

update_headers(headers=None)[source]

Return headers merged with the active token.

update_dict(input_dict=None)[source]

Return a request payload/query dict merged with the active token.

async update_token()[source]

Update the token by making a request to the token URL.

The /generateToken payload is validated against TokenResponse (strict tier) so malformed/error envelopes raise RestgdfResponseError instead of KeyError deep in caller code paths.

token_needs_update()[source]

Check if the token needs to be updated.

async update_token_if_needed()[source]

Ensure the token is valid and refresh if necessary.

async get(url, params=None, headers=None, **kwargs)[source]

Make a GET request to the specified URL with the token.

async post(url, data=None, headers=None, **kwargs)[source]

Make a POST request to the specified URL with the token.

class restgdf.utils.token.TokenSessionConfig(*, token_url, credentials, refresh_threshold_seconds=60, verify_ssl=True)[source]

Bases: StrictModel

Validated configuration for ArcGISTokenSession.

token_url is intentionally a plain str with a custom validator rather than pydantic.AnyHttpUrl. ArcGIS Enterprise deployments commonly run plain HTTP on internal networks, and AnyHttpUrl normalizes/rejects real-world URLs (for example it appends trailing slashes and may reject edge cases). Accepting any http:// or https:// string matches the behavior ArcGIS clients need.

token_url: str
credentials: AGOLUserPass
refresh_threshold_seconds: int
verify_ssl: bool
model_config = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

restgdf.utils.token.get_token(username, password)[source]

Synchronously request an ArcGIS Online token.