// Research / Computer Vision
Object detection that holds up on your data.
Public datasets won't cut it for industry-specific objects, unusual viewpoints, occlusion, or whatever else your environment throws at it. We build custom detectors and segmenters - labeled with foundation-model loops, trained for your latency budget, and shipped to the GPU or edge device that actually runs in production.
// What we see
Generic models pass the demo. Production has a different bar.
01
The long tail eats the accuracy
Pre-trained YOLO works on the 80% of scenes that look like COCO. Production traffic is mostly the other 20% - weird angles, occlusion, your specific lighting, your specific objects. The headline mAP looks fine; customer complaints don't.
02
Labels are 80% of the project
Most teams pick the detector first and the labeling pipeline last. That's backwards. The architecture you can swap in a week. The 50,000 labeled images you cannot. Teams that under-budget annotation discover this at week six, not week one.
03
The model fits the laptop, not the device
It hits 0.91 mAP in your notebook on an A100. Then it has to run on a Jetson Orin at 30 FPS, or on a CPU-only on-prem server that processes 200 camera feeds. The deployment target is where most CV projects actually stall.
// Case Study
Text-search across 200 live city camera feeds
Municipal operators type a description and the system surfaces matching events from across the city's live CCTV network. We built it for Neural; the City of Oława's Straż Miejska runs it on-prem. 200 cameras per server; review time on a typical incident dropped from ~8 hours of manual scrubbing to under 1 hour - an ~88% reduction.
200
live cameras per on-prem server
~88%
less time per incident review
~33K
residents covered (Oława)

// What we do
Three things that decide whether the detector ships.
Most production CV problems aren't solved by a better architecture. They're solved by the labeling loop, the right hardware target picked early, and an eval that sees the slices mAP hides.
Foundation-model labeling loops
Grounding DINO, SAM 2, and CLIP-based weak supervision turn a few hundred annotated images into tens of thousands of high-quality pseudo labels. Humans verify rather than label from scratch - 3-5x faster, higher inter-annotator agreement, and the loop tightens as the model improves.
Architecture chosen for your hardware
We profile the candidate models on your target device before final selection - YOLOv11 for 60 FPS edge, RT-DETR for accuracy on big GPUs, EfficientDet when memory is tight. The detector is picked after the deployment constraints are known, not before.
Eval beyond a single mAP number
A single mAP hides every failure that matters. We slice by class, scene type, occlusion, small-object size, low-light - and add a hard-negatives suite from your real production failures. The metric on your dashboard reflects what breaks, not what averages out.
// Method fit
Custom detection isn't the right tool for every CV problem.
skip it if
A cloud API covers your objects
If your detection target is common objects from common viewpoints (people, cars, faces, brand logos), AWS Rekognition / GCP Vertex Vision / Azure CV will hit your accuracy bar with no training. Custom is for the long tail those APIs miss.
Your real problem is OCR or document layout
Reading text, extracting tables, parsing forms - those are document-AI problems with their own toolchain (LayoutLM, Donut, OCR engines). A general object detector is the wrong instrument.
Document AI & OCRYou only need a label, not a box
If 'is there a forklift in this frame' is enough and you don't need to know where, a fine-tuned classifier wants 5-10x less labeling than a detector. Skip the boxes until you actually need spatial information.
use it if
Custom object detection fits when you have proprietary visual data, unusual viewpoints or environments, latency or hardware constraints that rule out cloud APIs, or accuracy requirements on industry-specific classes that public datasets don't cover.
// How we work
Hardware audit first. Iterate in the open. Hand off the retraining loop.
Every CV engagement starts with the constraints that are expensive to undo - target hardware, latency budget, label availability. The model gets picked after those are known, not before.
01
Data and hardware audit (week one)
We sit with your team, profile sample data, benchmark candidate detectors on your actual target device, and write down the labeling budget. The output is a design your engineers approve before any training run.
02
Iterate in a shared workspace
Every training run lands in a Weights & Biases or MLflow workspace your engineers can see. You watch the loss curves, the per-class precision, and the failure-case montages - in real time, not in a Friday demo.
03
Hand off the retraining loop
We hand off the model export (TensorRT, ONNX, OpenVINO), the eval suite, the labeling pipeline, and a runbook for retraining when the data drifts. Slack for 30 days after delivery for the questions that come up after we leave.

// Expert insight
“Most teams pick the detector first and the labeling pipeline last. That's backwards. The architecture you can swap in a week. The 50,000 labeled images you can't. Foundation models changed what's possible for a labeling budget - the team that adopts that loop early ships in half the time.”
Norbert Ropiak
Co-founder @ bards.ai
// Why bards.ai
Why us, instead of two senior CV engineers you'd hire.
You could hire the team. It would take a year and they'd learn your hardware constraints on you. We've already learned them - on production engagements at Comcast, Oława's municipal CCTV network, and the rest.
Production CV deployments at scale
Comcast's UI element detector runs daily on millions of VOD screenshots. Neural's video-search system runs on-prem in Oława across 200 live CCTV feeds. Both shipped, both still running.
Edge and on-prem deployment, not just GPU
TensorRT, ONNX Runtime, OpenVINO, Triton, Jetson Orin, and CPU-only on-prem boxes for environments where data can't leave. We benchmark on your hardware before we commit to a frame rate.
Senior engineers only, no juniors
Every person on your engagement has shipped CV models to paying customers. No ramp-up tax, no learning the labeling-loop story on your dollar.
// FAQ
Common questions about custom object detection
With foundation-model bootstrapping (Grounding DINO + SAM 2 pseudo labels, human verification), useful detectors start at 200-500 verified samples. Production-grade accuracy on most tasks lands in the 2-5K range. Without bootstrapping, multiply by 5-10x. The first thing we measure is your label budget; the architecture comes after.
Yes. We profile the candidate models on the target device in week one - before final architecture selection - so we don't over-train a model that won't fit. Jetson Orin AGX runs YOLOv11s at 30-60 FPS with INT8 quantization. CPU-only servers run smaller variants or batched inference for offline pipelines. We benchmark before we commit.
Cloud APIs are great when your objects look like the COCO/ImageNet distribution. They fall over on industry-specific classes, unusual viewpoints, regulated data that can't leave the customer perimeter, and edge deployment without internet. Custom models also let you tune the precision/recall tradeoff per class and own the retraining loop.
Engagements start at $40K. Most custom detection projects land between $40K and $120K depending on labeling scope, target hardware complexity, segmentation requirements, and whether the eval suite is greenfield. Fixed-fee proposal after the first scoping call - no time-and-materials surprise.
// Let's ship it
Send us a folder of images. We'll send back a plan.
Tell us about the objects, the environment, the target hardware, and the failure mode you can't fix with a cloud API. We'll come back with a labeling plan, an architecture pick, and a benchmark on your device - usually within a business day. Engagements from $40K, typically 4-8 weeks.

Norbert Ropiak
Co-founder @ bards.ai