By Dean Griffiths ·
Most UK AI consultants are one of three things: a prompt-slinger using ChatGPT with a logo, an agency that subcontracts the actual engineering, or a SaaS reseller rebadging someone else's product. Genuine bespoke builders are rare. Five tests filter the genuine from the rest: (1) Can they show you production code they wrote? (2) Can they walk through a previous build end-to-end including failures? (3) Can they answer specific architecture questions about where your data flows? (4) Can a previous client confirm the consultant did the engineering? (5) Did they diagnose before they sold?
"AI consultant" now describes someone who graduated from a six-week prompt-engineering bootcamp last year, a Big Four practice billing partner-day rates for slide decks, an agency that subcontracts the code to a dev shop in another timezone, and a former data scientist quietly building bespoke systems in production. They all use the same job title.
Five tests, applied in sequence, filter most of the noise.
Not a screenshot of a ChatGPT conversation. Not a demo dashboard. Actual code, in a repository, that someone else is paying to run in production. A consultant who writes their own code can show you commits with their name on them. A consultant who subcontracts will sidestep — "we have a development partner who handles that side" — which is fine if you wanted to hire an agency, but you should know what you're buying.
How to apply it: ask in the first conversation. "Show me a piece of production code you wrote for a previous client — even anonymised." The answer tells you what role you're actually hiring for.
Real builds have failure modes. Integrations don't work the first time. A particular edge case takes longer than expected. The original scope misses a requirement that surfaces on contact with the business. A consultant who's actually shipped will describe these without prompting — "the EPC register API rate-limited us at v1, so we re-engineered the matching to batch and cache" — because the failures are how the build evolves.
A consultant who can only describe successes either hasn't shipped or hasn't reflected on what they shipped. Both are bad signals.
Where does your data live? Where do the prompts go? Who hosts the LLM calls? What happens when the model vendor changes their API? Where does the audit log live? What's the disaster-recovery plan? Who has access in production?
Real builders have answers. They might not be the right answers for your situation — that's a discovery conversation. But they have answers. A consultant who deflects to "we use industry-standard practices" is reading from a slide they didn't write.
Reference calls. Specifically: ask the previous client who actually wrote the code, who they spoke to during the build, and what they'd do differently. The cleanest signal is when the previous client describes the same consultant doing the engineering work, not a separate development team.
For solo operators (one-person consultancies), the test is simpler — the consultant is the engineer. For agencies, the test is whether the person you'll be hiring is the person who'll be writing the code, or whether you're paying a relationship layer above the actual builders.
A consultant who proposes scope in the first 15 minutes hasn't diagnosed your operation. They've pattern-matched against their existing offer. That can be fine — if your operation does fit a common pattern. It is rarely the case for mid-market businesses with genuinely specific operations.
A discovery-first consultant asks where your time leaks before they propose what to build. The discovery call is the diagnostic — the build proposal comes after, costed, scoped, and with a defensible reason for each component. (More on why this matters: see the discovery methodology guide.)
The pattern that consistently produces working builds:
If you're about to hire an AI consultant, run them through the five tests before the contract. If you're considering AIMindShift specifically, the discovery call is the test you can run on us — 45–60 minutes, technical, diagnostic-first, costed bottleneck map at the end. You'll know inside the call whether the engineering depth is there.
A 45–60 minute discovery call. Map the bottlenecks. Get a costed bottleneck map — whether we build or not.
Book a Discovery Call