How do I tell a genuine AI engineer from a prompt-slinger?

Ask to see code they've written. Not a screenshot of ChatGPT output — actual production code in a repository. A consultant who subcontracts the build will sidestep the question. A consultant who writes the code can show you their commits. This is the single fastest filter.

What questions should I ask in a sales meeting?

Five questions filter most pretenders: (1) Walk me through a previous bespoke build, end to end, including what broke. (2) What's in your stack and why? (3) Who hosts the production code — you, my server, or the cloud? (4) What happens after delivery — handover or retainer? (5) Can I talk to a previous client whose build you delivered? Real builders answer all five without hedging.

How do I know if a fixed-fee or retainer model is right?

Fixed fee works for a tightly-scoped one-off build where the deliverable can be specified upfront and the consultant walks away after handover. Retainer works when the build needs to keep moving with the business — when regulations change, new capabilities get added, or the system itself is the differentiator. Most genuinely transformative builds end up as retainers because the business they're embedded in keeps moving.

What red flags should I watch for?

Five common ones: (1) Generic case studies — same outcomes claimed across clients with different operations. (2) No code shown — only "demos" or marketing assets. (3) "Trust us" on data handling — no specific architecture answers about where prompts and data flow. (4) Off-the-shelf SaaS rebadged as bespoke. (5) Pressure to sign in the first call. Any one is worth pausing on; any two together is usually a no.

How long should the diagnosis (discovery) take before the build starts?

For a serious bespoke build, expect at least a 45–60 minute discovery call before anyone proposes scope, and a follow-up scoping conversation before contracts are signed. A consultant who proposes a build inside the first 15 minutes hasn't diagnosed your operation — they're pattern-matching against their existing offer. (Why this matters at length: see the linked guide on discovery methodology.)

What does it cost to commission a real bespoke AI build in the UK?

Focused builds for a single capability typically land £15k–£60k. Multi-capability platforms run £60k–£250k depending on scope, integration depth, and ongoing engineering needs. The cheapest "bespoke" quote is usually a SaaS wrapper. The most expensive quote is usually agency overhead. The right price comes from a consultant who can show you the engineering they'll actually do.

HomeGuidesEvaluate an AI Consultant

How to evaluate a UK AI consultant: five tests that filter the builders from the prompt-slingers

By Dean Griffiths · 13 May 2026

In short

Most UK AI consultants are one of three things: a prompt-slinger using ChatGPT with a logo, an agency that subcontracts the actual engineering, or a SaaS reseller rebadging someone else's product. Genuine bespoke builders are rare. Five tests filter the genuine from the rest: (1) Can they show you production code they wrote? (2) Can they walk through a previous build end-to-end including failures? (3) Can they answer specific architecture questions about where your data flows? (4) Can a previous client confirm the consultant did the engineering? (5) Did they diagnose before they sold?

The market is messy. Here are the five tests that cut through it.

"AI consultant" now describes someone who graduated from a six-week prompt-engineering bootcamp last year, a Big Four practice billing partner-day rates for slide decks, an agency that subcontracts the code to a dev shop in another timezone, and a former data scientist quietly building bespoke systems in production. They all use the same job title.

Five tests, applied in sequence, filter most of the noise.

Test 1 — Can they show you production code they wrote themselves?

Not a screenshot of a ChatGPT conversation. Not a demo dashboard. Actual code, in a repository, that someone else is paying to run in production. A consultant who writes their own code can show you commits with their name on them. A consultant who subcontracts will sidestep — "we have a development partner who handles that side" — which is fine if you wanted to hire an agency, but you should know what you're buying.

How to apply it: ask in the first conversation. "Show me a piece of production code you wrote for a previous client — even anonymised." The answer tells you what role you're actually hiring for.

Test 2 — Can they walk through a previous build end-to-end, including the parts that broke?

Real builds have failure modes. Integrations don't work the first time. A particular edge case takes longer than expected. The original scope misses a requirement that surfaces on contact with the business. A consultant who's actually shipped will describe these without prompting — "the EPC register API rate-limited us at v1, so we re-engineered the matching to batch and cache" — because the failures are how the build evolves.

A consultant who can only describe successes either hasn't shipped or hasn't reflected on what they shipped. Both are bad signals.

Test 3 — Can they answer specific architecture questions?

Where does your data live? Where do the prompts go? Who hosts the LLM calls? What happens when the model vendor changes their API? Where does the audit log live? What's the disaster-recovery plan? Who has access in production?

Real builders have answers. They might not be the right answers for your situation — that's a discovery conversation. But they have answers. A consultant who deflects to "we use industry-standard practices" is reading from a slide they didn't write.

Test 4 — Can a previous client confirm the consultant did the engineering?

Reference calls. Specifically: ask the previous client who actually wrote the code, who they spoke to during the build, and what they'd do differently. The cleanest signal is when the previous client describes the same consultant doing the engineering work, not a separate development team.

For solo operators (one-person consultancies), the test is simpler — the consultant is the engineer. For agencies, the test is whether the person you'll be hiring is the person who'll be writing the code, or whether you're paying a relationship layer above the actual builders.

Test 5 — Did they diagnose before they sold?

A consultant who proposes scope in the first 15 minutes hasn't diagnosed your operation. They've pattern-matched against their existing offer. That can be fine — if your operation does fit a common pattern. It is rarely the case for mid-market businesses with genuinely specific operations.

A discovery-first consultant asks where your time leaks before they propose what to build. The discovery call is the diagnostic — the build proposal comes after, costed, scoped, and with a defensible reason for each component. (More on why this matters: see the discovery methodology guide.)

Bonus red flags

Generic case studies. If the same outcomes appear across clients with different operations ("we saved them 40% on admin"), it usually means the consultant has a template and you're the next customer to receive it.
No code shown. Marketing assets, demo videos, and dashboards are not engineering evidence. Code is.
"Trust us" on data handling. A serious build has a clear answer for where your data flows, where it's stored, who can access it, and how it's deleted. Wave-hand answers mean the consultant either doesn't know or doesn't want to commit.
SaaS rebadged as bespoke. If the "bespoke" build is a thin wrapper around an existing SaaS product, you're paying bespoke prices for SaaS economics. Ask what the consultant would do differently if the SaaS vendor disappeared tomorrow.
Pressure to sign in the first call. A consultant who needs you to commit before you've understood the scope hasn't earned the commitment. Real builders are happy for you to think about it.

What "good" looks like (a positive signature)

The pattern that consistently produces working builds:

One person (or a small team) who can describe both the business problem and the technical architecture in the same conversation.
References to specific previous builds they can talk about in detail, including the failure modes.
A diagnostic-first sales conversation — they ask before they propose.
A clear answer to "where does my data live and who has access."
A commercial shape that fits the work — fixed fee for genuinely one-off builds, retainer for systems that need to keep moving with the business.
Willingness to say "no, this doesn't fit a bespoke build — buy the SaaS." A consultant who would rather walk away from a bad-fit deal is usually a consultant worth hiring for a good-fit one.

Next step

If you're about to hire an AI consultant, run them through the five tests before the contract. If you're considering AIMindShift specifically, the discovery call is the test you can run on us — 45–60 minutes, technical, diagnostic-first, costed bottleneck map at the end. You'll know inside the call whether the engineering depth is there.

Common questions on this topic

Want to apply this to your operation?

A 45–60 minute discovery call. Map the bottlenecks. Get a costed bottleneck map — whether we build or not.

Book a Discovery Call