A County in Thirty Minutes

For a long time, adding a new county to TALON meant clearing a week.

Find the county’s parcel data — usually buried in a GIS portal, sometimes behind a form, occasionally requiring an email to a real person. Download it. Figure out the schema. Every county names their columns differently: PROP_ADDR, SITUS_ADDRESS, SITE_ADD_1 — all meaning the same thing, none of them matching what TALON expects. Write the translation layer. Deal with coordinate system differences. Download hundreds of gigabytes of LiDAR tiles from USGS. Process them into canopy height models. Run zonal stats across every parcel. Load everything into the database. Verify that nothing broke.

That’s the glamorous version. It leaves out the hours spent figuring out why a particular county’s GIS export uses a non-standard projection, or why 3,000 parcel geometries have self-intersections, or why the LiDAR tiles for one township are from a different survey year than the rest of the county and the canopy heights don’t match.

Each county was its own archaeology project.

The Ceiling

TALON’s coverage area has been small because of this. Not because the data doesn’t exist — USGS has mapped most of the rural eastern US with LiDAR, and county parcel records are public in every state. The data is there. Adding it has just been expensive in a specific way: expensive in developer time, expensive in storage, and completely unscalable as a path to regional coverage.

I kept bumping into the ceiling. Someone would ask whether we covered a particular county in western NC. We didn’t. When would we? I couldn’t honestly say. The answer was “when I have time to add it,” and that answer was going to stay true indefinitely unless something changed about how the adding worked.

The obvious question — one I’d been circling for a while — was whether the manual steps could be automated. Not the interesting parts. The boring, repetitive parts that are roughly the same every time: finding the data source, downloading the parcels, normalizing the schema, computing the terrain stats.

What Automation Actually Means

The hard part about automating county onboarding isn’t the downloading. It’s the translation.

Every county publishes parcel data, but no two counties publish it in the same format. A loader that works for Albemarle County, Virginia fails immediately on Buncombe County, North Carolina because the column names are different, the coordinate system is different, and the assessment data is structured differently. I described this problem in “County Lines” — each county is its own case.

The solution I landed on has two parts.

The first is a generic ArcGIS loader — a script that can query any ArcGIS FeatureServer endpoint, paginate through results, and load the geometry into TALON’s database. The county-specific knowledge lives in a schema mapping: a dictionary that says “this source calls it parno, TALON calls it gpin; this source calls it gisacres, TALON calls it acreage.” Write the mapping once for a source and every county that uses that source gets it for free.

North Carolina runs a statewide parcel database through NC OneMap — one endpoint, consistent schema, covering all 100 counties. Write the mapping once and the same loader that works for Buncombe works for Haywood works for any other NC county. That’s the leverage.

The second part is the terrain pipeline. I’d been downloading LiDAR tiles locally because that was the only option. It isn’t anymore. USGS publishes both its elevation models and its LiDAR point clouds as cloud-optimized files designed to be streamed rather than downloaded — you request just the pixels or points you need, over HTTP, without staging hundreds of gigabytes locally. Slope, aspect, and elevation can be computed for an entire county’s worth of parcels by streaming the relevant portions of remote files. No local storage. No waiting for downloads to finish before the compute can start.

Haywood County

I decided to test this properly rather than assume it worked. Haywood County, North Carolina — a county TALON had never touched, in the mountains west of Asheville.

I grew up in the Green Mountains of southern Vermont — a different part of the same ancient Appalachian range. When my mom moved to western NC about five years ago, the new terrain felt both foreign and familiar. Steeper, more varied, but connected to the same geology I’d wandered as a kid. That overlap is part of what sparked this whole project. Haywood has the kind of terrain where aspect matters — a north-facing slope and a south-facing slope fifty feet apart can have completely different microclimates, different timber species, different value. Good test case.

Running the parcel loader against NC OneMap with a filter for Haywood’s FIPS code: 47,862 parcels loaded in about two and a half minutes. That part worked immediately.

The terrain pipeline took longer to get right. There were bugs — the first attempt to discover cloud-hosted LiDAR tiles for the county returned geometry that was clearly wrong, tiles with footprints hundreds of degrees wide. The source data includes three-dimensional bounding boxes with elevation ranges, and the code that extracted two-dimensional footprints was reading the elevation minimum as a latitude value. Everything intersected everything. Once that was fixed, the STAC catalog query returned 55 sensible tiles covering the northern portion of the county.

Processing terrain stats — elevation, slope, aspect, for all 47,862 parcels — ran in about twelve minutes across four workers, streaming DEM tiles from S3 as needed. No local storage. 100% parcel coverage. Average parcel elevations mostly in the 2,400-3,000 foot range, average slopes ranging from nearly flat to around 25 degrees — with individual points well beyond both ends. Right for the terrain.

Total time from zero to a fully terrain-queryable county: under thirty minutes.

What the Data Said

The point of onboarding a county isn’t to have it in the system. It’s to be able to ask questions about it.

Haywood County has a higher rate of out-of-state ownership than I expected — about 22% of parcels. That number alone isn’t surprising for a mountain county with vacation properties. What’s more interesting is the breakdown by proximity. The binary absentee/non-absentee classification that TALON has used for other counties felt wrong here. A landowner from Canton, the next town over, isn’t absentee in any meaningful sense. Neither is someone from Asheville or Sylva. Calling them the same thing as a Florida LLC that bought 200 acres a decade ago and hasn’t visited since distorts the analysis.

I replaced the binary flag with a four-category classification: local (same county), regional (adjacent county), in-state (non-adjacent NC resident), and out-of-state. The logic uses ZIP code to county FIPS lookup and Census Bureau county adjacency data to determine which category each owner falls into. The binary is_absentee field still exists — it’s derived from the classification, marking regional and local as present, in-state and out-of-state as absent — but the underlying categories are preserved for analysis.

The distribution that came back: 69% local, 22% out-of-state, 5% in-state, 4% regional. Strip the large federal land parcel that was skewing the regional acreage numbers and the story clarifies. Out-of-state owners hold the largest typical parcels — median around 0.9 acres versus 0.6 for local owners. That’s the vacation lot signature: people buying slightly larger lots when they’re buying a mountain escape rather than a primary residence.

The combination of out-of-state ownership, above-median acreage, and eventual canopy cover data is exactly the filter that surfaces undermanaged timber holdings with absentee owners — the leads that make this useful for timber investors and conservation organizations. The analysis ran in minutes against a county that didn’t exist in the system that morning.

The System That Learns

The part I’m most interested in isn’t the loader or the terrain pipeline. It’s what happens to the knowledge that accumulates as more counties get onboarded.

Right now the schema mappings — which source field maps to which TALON field, for each data source — live in a configuration file. That’s fine for the current handful of counties. It doesn’t scale.

The design I’m moving toward is a database-backed registry. Every successful county onboarding writes its mapping to a table: source URL, field translations, confidence scores, how many times this mapping has been validated. When a new county comes in from the same state portal, the system checks the registry first. If NC OneMap has already been mapped and validated across three counties, county number four doesn’t need a developer — it matches the registry entry, runs the loader, and reports what it captured versus what it left on the table.

The report is the other piece. Each county onboarding generates a structured output: fields that were successfully mapped, source fields that had no TALON equivalent, TALON columns that stayed empty because the source didn’t have the data. That report surfaces what we got and what we missed. It’s also the input for deciding what additional sources to check — county-level GIS portals often have richer building and sales data than the state-level aggregation, and the report makes the gap visible.

Every county onboarded makes the next one faster and more complete. That’s the property worth building toward.

The Ceiling Is Gone

Adding Haywood County took an afternoon — most of which was debugging and fixing bugs that will never affect another county again. The actual loading and processing, once the infrastructure worked, took under thirty minutes.

That’s not a ten-times improvement on the old process. It’s a different category of operation. Counties that used to require a week of developer attention now require configuration and a run command. The ceiling on coverage is no longer about how much developer time exists. It’s about how many counties have public data and LiDAR coverage — which is most of them.

The questions that Haywood County can answer now — which parcels face south, which ones have absentee owners, what the slope distribution looks like across 47,000 properties — those questions were always answerable. The data was always public. It was just locked inside a process that didn’t scale.

The land doesn’t get harder to analyze because there’s more of it.