May 2026 8 min read Agentic development

No Meter Running

I moved face recognition in my family photo archive to local models because the cloud path was starting to make every experiment feel expensive.

Private family photo archive organized by year.

46k

Media items

1.12s

Per item

$80+

One dev path

Local

Inference

Local models are leverage when you can run them again and again without watching the meter.

The family photo archive started as a migration project: get years of photos and videos out of a hosted gallery and into a self-managed system without losing albums, originals, captions, dates, and the small bits of context that make old photos feel alive. Once that worked, the next obvious feature was people.

Not public face search. Not a social network tagging system. Just the family version of the question everyone eventually asks of a photo library: show me the beach photos with both daughters; find the Christmas album where everyone was around the table; help me build a book without manually opening fifteen years of folders.

The first version used AWS Rekognition because it was already close at hand and it proved the idea quickly. There is nothing wrong with that path. The problem was the meter. During development I spent more than $80 in token and API usage on one model path while I was still tuning. That was not catastrophic, but it changed how I worked. Every rerun felt like a small decision.

Local models changed the mood of the project. I could run the same album again, adjust a threshold, inspect what changed, and run it again. That loop is where the feature got good.

The Point

A lot of people try local models for agentic development, compare them to frontier cloud models, and come away disappointed. I get it. If the job is open-ended reasoning, writing, planning, coding, and tool use, the gap can be obvious.

But that is not the only way to use local models. In this project, the model was not the whole product. It had a simple job: find faces, turn each face into a numerical fingerprint, compare that fingerprint against known people, and return a structured answer.

The rest of the system did the careful work around it: measuring results, tuning thresholds, tracking where every decision came from, and deciding when a match was strong enough to publish automatically. That is the lesson I want to keep. Local models can be underwhelming as general replacements and very useful as specialized workers.

The models

I tried a few local face models before picking the production path. The table below is intentionally plain-English: who made it and how I used it.

Local model	Maker / link	Settings I used
InsightFace `buffalo_l`	DeepInsight / InsightFace	Two detector passes: 640px at 0.5 confidence, then 960px at 0.4 confidence. Recognition input is 112x112 and outputs a 512-number face fingerprint.
OpenCV YuNet	OpenCV Zoo, based on YuNet/libfacedetection work	`face_detection_yunet_2023mar.onnx`, 640x640 input, score threshold 0.6, NMS threshold 0.3.
OpenCV SFace	OpenCV Zoo	`face_recognition_sface_2021dec.onnx`, 112x112 face crop input, 128-number face fingerprint output.
MediaPipe BlazeFace short range	Google AI Edge / MediaPipe	`blaze_face_short_range.tflite`, detection only, default-style confidence filtering.

The Simple Architecture

The working system is easier to explain without the implementation names:

1. Load

Photo or video frame

Use the smaller web image for speed unless the album needs extra detail.

2. Find

Face boxes

Run InsightFace twice: once precise, once with higher recall for small or harder faces.

3. Fingerprint

Face embedding

Turn each face into a list of numbers that can be compared.

4. Compare

Known people

Compare the new fingerprint against approved examples for each person.

5. Decide

Approve or hold back

Strong matches publish automatically. Weak matches stay out of search.

6. Show

People pages

The family sees normal photo pages, not model output.

Why The Harness Mattered

The best decision was not picking a model. It was keeping the model behind a contract. A model can find faces and suggest matches, but the gallery decides what counts as a real person tag.

That made the work reversible. I could swap providers, rerun an album, compare results, and keep the public gallery stable. It also made automatic matching much safer, because the model never got to publish a guess just because it had a score.

Making It Automatic

At first I assumed a human would need to approve most matches. That would have worked for a tiny album, but not for tens of thousands of photos. The better question was: which matches are strong enough that I should not have to look at them?

To answer that, I used albums that already had reviewed labels. The local pipeline matched those albums without looking at the answers, then compared its guesses against the known labels. I swept the match score, the gap between the best and second-best person, the face detection confidence, and the minimum face size.

The strict profile selected from prepared calibration assets hit 100% auto-approval precision on the validation set: 117 correct automatic approvals, 0 false automatic approvals, and 0 unknown-person automatic approvals. A looser audit profile found more matches: 131 automatic approvals, 128 correct, 3 false, and still 0 unknown-person automatic approvals.

That gave me two lanes. Strong matches go straight into the archive. Audit matches can be useful right away, but they stay easy to review. Weak matches stay out.

Tuning The Detector

The first local runs missed too many small faces. The obvious fix would be to lower the detection threshold and hope for the best. That found more faces, but it also created more junk.

The better fix was a two-stage pass. First, run a normal detector at det_size=640 and det_thresh=0.5. Then run a higher-recall detector at det_size=960 and det_thresh=0.4. Keep the normal results. Rescue extra faces from the second pass only if they clear confidence, face-size, and overlap checks.

That made the model more sensitive without making it sloppy. If an odd result shows up later, the sidecar says whether it came from the normal pass or the rescue pass.

How Fast It Was

My production logs currently measure full runs rather than every internal sub-step. So the honest number is end-to-end: download or read the media, decode it, find faces, make fingerprints, compare them, write sidecars, refresh search, and verify the publish.

Measurement	What it includes	Average
Production, per media item	Full local pipeline on year-scale runs	1.12 seconds per photo/video entry
Production, per detected face	Same full pipeline, divided by detected faces	0.84 seconds per face
Typical year-scale run	Average of 2007, 2008, 2009, and 2013 slices	23.6 minutes for about 1,263 entries and 1,679 faces
Early bakeoff: InsightFace	80-entry sample, one local candidate run	58.1 seconds total, about 0.73 seconds per entry
Early bakeoff: YuNet + SFace	80-entry sample, one local candidate run	32.0 seconds total, about 0.40 seconds per entry
Early bakeoff: MediaPipe detector	80-entry sample, detection only	1.9 seconds total, about 0.02 seconds per entry

The exact seconds matter less than the shape of the loop. A year-sized run taking about twenty-five minutes means I can tune, rerun, and inspect without a cloud bill shaping every decision.

What This Ran On

These numbers came from my local MacBook Pro: Apple M3 Max, 16 CPU cores with 12 performance cores and 4 efficiency cores, a 40-core GPU, and 64 GB of memory.

I did not capture CPU or GPU utilization in these run logs. The production InsightFace path was configured through ONNX Runtime's CPUExecutionProvider, so the timings above should be read as CPU-path timings on this Mac, not as a GPU benchmark.

What Changed

The local path did not win because it was more glamorous. It won because it made the development loop cheap and repeatable. The model could be tested. The thresholds could be tuned. The automatic approvals could be checked against known answers. The public gallery could stay simple.

That is the bigger lesson for me. Local models do not need to replace frontier models at everything. They need a job they can do, a harness that measures them, and a product that knows when to trust them. When those pieces line up, local stops feeling like a compromise and starts feeling like leverage.