How we think

The way you think about AI is the architecture of your product.

Most AI products fail in the design decisions, not the model. These are the principles that drive every HideView engagement — patterns I've found to hold regardless of stack, model provider, or vertical. The technologies named in each are proof points, not constraints.

Sharp enough to disagree with. Specific enough to test against your own roadmap. Updated as the work teaches me new ones.

The schema is the product.

Model the domain before you write a component.

The default

Ship UI fast, figure out the data shape later, refactor as you learn.

The cost

Every feature for the next two years fights the wrong shape. The 'small refactors' compound into a rewrite that nobody schedules.

Better

Spend the first week drawing tables, foreign keys, and history columns. The schema you commit to defines the products you can ship and the products you can't. Treat it like the most important code you write — because it is.

How this shows up: If your team is shipping UI fast and saying 'we'll figure out the data later,' you're paying this tax already. The features you can ship in year two depend on the schema you commit to in week one.

Multi-tenancy is week-one work.

RLS belongs in the schema, not the backlog.

The default

Prototype single-tenant, add tenant scoping later when there's a paying customer.

The cost

The retrofit is a complete rewrite by another name. Every window where tenancy is not enforced is a window where customer data can leak across boundaries.

Better

Add org_id and a row-level security policy to every table the day it's created. Audit the runtime context of every query path before launch. There is no 'temporary' single-tenant phase that survives contact with a real customer.

How this shows up: If you're a B2B product on a single shared database, this is the largest unmanaged risk on your roadmap. Every week without enforced tenancy is a week where one customer's data path can cross another's.

Voice AI is prompts and tool-calling, not protocol.

The WebSocket is the easy part.

The default

Spend the engineering budget on the bidirectional audio plumbing because it's what's new and unfamiliar.

The cost

Audio works perfectly while the model says the wrong thing. The user hears 'inside mount' interpreted as 'outside mount' and never trusts the system again.

Better

Spend 80% of the build on system prompts, tool schemas with strict types, refusal-on-uncertainty patterns, and domain grounding. The protocol is one library import; the judgment is the product.

How this shows up: If your team is debating audio protocols before they've drafted the system prompt, you're solving the easy problem first. The model's failures will be in the words it hears, not the bytes.

Verification gates over editing bad output.

The agent should self-assess before the human reads anything.

The default

Agent generates, human edits. Treat AI output as a fast first draft.

The cost

The human becomes a slow ranker for a fast generator. The job becomes drudgery, and the AI gets credit for work the human is doing.

Better

Require every agent output to come with a self-assessment — confidence, citations, structured rationale. Score it against thresholds. Below threshold, send it back with a critique instead of forwarding to the operator. The human's job becomes adjudicating disputed cases, not editing every line.

How this shows up: If a human edits every AI output before it ships, you've built the wrong loop. The agent should self-assess. The gate should catch the weak runs. The human should adjudicate disputed cases, not edit every line.

Multi-provider routing is risk management.

Single-provider is single-point-of-failure.

The default

Pick one frontier model and build everything against its SDK. The provider's SLA is the answer.

The cost

When the provider is down, your product is down. When a model rev quietly regresses on the task you depend on, your quality drops and you don't know why.

Better

Route per task by capability, latency budget, and cost ceiling. Log which model produced which output. Watch for quality regressions. Have a fallback chain. Treat the LLM layer the way you'd treat any other critical dependency — with redundancy and observability.

How this shows up: If your product is built against one provider's SDK, your product is down whenever they are. Routing is small to add up front and painful to retrofit after the first outage.

Persist every prompt, response, and tool call.

Today's logs are tomorrow's training data.

The default

Log LLM calls the way you'd log API calls — duration, status, error rate.

The cost

When something goes sideways in production, there's no replay. No eval set. No way to tell whether quality regressed because of the model, the prompt, or the data. No fine-tuning signal when you eventually want one.

Better

Persist the full prompt, the full response, every tool call, and the orchestration state alongside the conventional metrics. Treat AI traffic as a corpus that's worth more than the requests it served.

How this shows up: If something breaks in production at 3 AM and you can't replay the exact prompt and response that caused it, you're guessing. The fix takes hours instead of minutes — and quality regressions slide by unnoticed.

Boring infrastructure, novel surface.

Spend the novelty budget where it shows.

The default

Pick a new database, a new framework, a new deploy target — innovation everywhere.

The cost

Reliability suffers, shipping speed suffers, and the user never sees any of the cleverness because it's all in plumbing.

Better

Pick mature, well-understood infrastructure for the parts that aren't your differentiator — the database, the framework, the deploy target. Reserve the novelty budget for the AI surface — the part the user actually touches and remembers. Innovation is a finite resource; spend it where the customer feels it.

How this shows up: If your team is debugging a new database, a new framework, and the AI layer at the same time, the AI layer is getting your worst attention — and that's the part the customer remembers.

Embedding similarity is a starting point, not an answer.

Re-rank with hard constraints or you'll surface the wrong thing.

The default

Cosine similarity on embeddings, sort, ship.

The cost

You confidently surface 'semantically similar' results that fail hard constraints — wrong location, wrong work authorization, wrong required skill. Users learn to distrust the ranking.

Better

Treat the embedding score as a candidate generator. Re-rank with structured filters and operator-set must-haves. The product is the re-rank layer, not the embedding.

How this shows up: If your matching product surfaces results that fail real-world constraints — wrong location, wrong authorization, wrong required skill — the embedding rank isn't the problem. The missing re-rank layer is.

Senior end-to-end beats junior teams with handoffs.

AI projects are bottlenecked on judgment, not capacity.

The default

Scale the team to scale the output. More engineers, more progress. Hand the work off through layers of coordination.

The cost

AI development moves too fast for handoff overhead. The expensive resource is the speed of converting a clear thought into shipped code — and that speed dies in coordination, in offshoring, in handoffs from senior partners to juniors.

Better

Senior practitioners holding the whole problem, writing the code across stacks, deciding what to cut. The leverage isn't in capacity. It's in the absence of handoffs and the presence of judgment at every step.

How this shows up: If most of your AI program's time is spent in coordination meetings rather than shipping, the coordination is the bottleneck — and the answer isn't more headcount.

Failure modes are part of the product.

Design the bad-day path with the same care as the happy one.

The default

Build the success state, treat the rest as error handling, ship.

The cost

AI fails differently than software — quietly, plausibly, often confidently. The first time the system surfaces a confidently-wrong answer with no graceful path, the user's trust collapses and doesn't recover.

Better

Design the refusal, the degraded mode, the timeout, and the fallback as first-class user experiences. What the system does when it doesn't know, can't answer, or runs out of time is part of what you're shipping. Treat it that way.

How this shows up: If your team can't describe what your AI does when it doesn't know the answer, you haven't designed the product yet — you've designed half of it. The missing half is what users will remember.

Build the eval before the feature.

Measure quality before you write the prompt. Otherwise you're guessing.

The default

Iterate on the prompt until it looks good to the developer running it. Ship.

The cost

Quality regressions become invisible. Model revs surprise you. When a stakeholder asks 'is this actually working?' you have a vibe, not data. You can't tell whether your last prompt change made things better or worse.

Better

Define the evaluation set and the scoring criteria before writing the prompt. Run every candidate against it — current vs. next prompt, current vs. next model. Quality becomes something you measure and improve, not something you assert and defend.

How this shows up: If a stakeholder asks 'is this working better than last week?' and you have a vibe instead of a number, you can't tell whether your last prompt change made things better or worse. Ship the eval first.

Cost is a design constraint.

Architecture that's cheap to build is not the same as architecture that's cheap to run.

The default

Prototype on the most capable model, ship to production unchanged, watch the bill arrive.

The cost

AI costs scale with usage in ways traditional software doesn't. A feature that costs nothing in development can be unprofitable at customer scale — and refactoring for cost after launch is a rebuild by another name.

Better

Decide at design time which tasks deserve the frontier model and which can run on a smaller one. Cache aggressively where the input space allows it. Route by capability, latency budget, and cost ceiling. Treat token budgets the way you'd treat any other engineering constraint — with seriousness.

How this shows up: If your AI feature is cheap in dev because traffic is low and expensive in production because users showed up, the architecture is the bug. Cost decisions are product decisions in AI.

Calibration is a feature.

A model that knows when it doesn't know is worth more than one that's right slightly more often.

The default

Optimize for accuracy on the eval set; ignore whether the model's confidence is well-calibrated.

The cost

The system confidently lies. The few high-stakes mistakes outweigh the many ordinary successes. Users — once burned — distrust everything the model says, including the things it gets right.

Better

Use prompts and tooling to make the model express uncertainty honestly. Surface that uncertainty in the UI. An 'I'm not sure, ask a human' response is a product feature, not a failure mode. Calibration is what makes a probabilistic system worth trusting at all.

How this shows up: If your model is confidently wrong even five percent of the time, users will distrust it ninety-five percent of the time. Calibrated uncertainty is what makes a probabilistic system worth shipping.

Don't ship what you can't operate.

If you can't keep it running at 2 AM, you don't have a product.

The default

Ship the feature, figure out monitoring, alerts, and on-call later.

The cost

The first incident reveals there are no logs, no rollback, no playbook, and no clear way to diagnose. The clock keeps running while the team flails. Confidence — internal and external — takes the hit.

Better

Operability is part of done. Logs, dashboards, runbooks, and a tested rollback path before launch — not after. If you can't articulate how you'd debug a production incident at 2 AM without calling a meeting, you're not done shipping.

How this shows up: If your team's answer to 'what happens at 2 AM when this breaks?' is 'we'll figure it out,' you're not done shipping. Operability isn't a phase — it's part of done.

Every engagement runs against these principles.

Built to deliver AI products that ship — and stay shipped — for teams serious about their AI initiatives. Tell me what you're trying to deliver.

Book a 30-minute call →See these in production