How to vet an AI engineer in 2026. The questions that filter the senior from the senior-titled.

Anyone who has hired an AI engineer in the last twelve months knows the shape of the problem. You post the role on LinkedIn. You get 200 CVs in 48 hours. Three quarters of them mention LangChain, vector databases, RAG, agents, fine-tuning, the full glossary. Twenty of them get past the first call. Two of them have actually shipped a production AI system that survived a real customer load. The rest have built notebooks.

We run a vetting screen on every engineer that goes into the Dissel Talent network, and this is the version we wish every CTO and engineering lead had on their desk before opening a role. It is built around one principle: ask questions whose answers cannot be faked by reading the docs the day before the interview.

Filter 1: what have you shipped that real users hit?

Open with one question: name an AI feature you have built that was used by real, paying users for at least three months. Walk me through the moment a user first hit it in production.

Strong engineers go straight into specifics. The latency was bad in week one, here is how we cut it. The model hallucinated on a class of inputs, here is the eval we built. Cost spiked when traffic doubled, here is the routing change. Weak engineers describe the demo, the architecture diagram, the model they picked. They cannot describe what happened after launch, because they were not there.

Filter 2: how do you handle hallucinations?

The cheap answer is "we use retrieval augmented generation." Every CV says that. Push the question one level: when retrieval returns the wrong chunk, what does your system do? When the model confidently answers a question outside the context, how do you catch it? When a user reports a wrong answer, how do you reproduce it and fix it without breaking the other 99 percent?

Senior engineers have a stack of answers. Confidence scoring, abstention prompts, citation-required outputs, eval suites that grow with every reported bug, rerankers, query rewriting. They have the scars and they describe them by name. Junior engineers say "we tune the prompt." That is the line.

Filter 3: show me your evaluation setup

Every production AI engineer has an opinion on evals. They will tell you about their golden set, how they version it, who writes the cases, how they run it on every prompt change, the dashboards they look at on Monday morning. They will mention LLM-as-judge and immediately tell you the three ways it fooled them.

Engineers who have not run a system in production talk about evals in the abstract. They mention Ragas or DeepEval as a name. They have not actually wired it into CI. That gap is the difference between someone who can ship and someone who has read about shipping.

Filter 4: what is your cost story?

Ask: what is the per-request cost of the last AI feature you shipped, and how did it change over the first three months in production? A senior answer involves model routing (cheap model first, expensive model on uncertain cases), caching, prompt compression, switching providers, batching, and a real number per call. A weaker answer is "we used GPT-4o and it was fine."

Cost discipline is the cleanest proxy for production seniority in agentic AI right now. Anyone can prototype with the most expensive model. Only people who have lived through a real bill make the trade-offs that get the unit economics to work.

Filter 5: where does the data live, and who can break it?

Agentic engineers are data engineers in disguise. The agent is only as good as the layer it reads from. Ask: how is your data layer structured, what is the schema discipline, how do you handle PII, what happens when a source system changes a field.

An engineer who has shipped real agents has firm opinions on this. They have a contract layer between the agent and the upstream systems. They have a watcher that flags schema drift. They have written the migration the team needed when a CRM admin added a custom field. Engineers who have not lived this give vague answers about "connecting to Notion" or "calling the Salesforce API".

Filter 6: a small practical exercise

Skip the algorithmic puzzles. Run a 45-minute paired exercise on a real problem. Give them a messy dataset of 50 inbound emails. Ask them to classify each into one of five categories, with confidence, and to draft an output prompt that an agent could use. Watch how they think.

Strong candidates start by reading the data. They notice three of the categories are ambiguous and ask which one wins. They sample model outputs by hand before they design the prompt. They build the eval set as they go. Weak candidates open ChatGPT, paste the first email, and write a long prompt. The work patterns reveal more than any whiteboard could.

Filter 7: how do you keep current without becoming a hype follower?

The AI field moves fast enough that yesterday's best practice is today's footgun. Ask how the candidate stays current. The best engineers name two or three sources they actually read (Latent Space, the Anthropic and OpenAI engineering posts, a small Discord), and three things they tried recently that did not work. They are skeptical of new frameworks. They have opinions on which abstractions are worth adopting and which add weight without value.

Engineers who quote Twitter threads and call every new framework a game changer are usually a step behind the people who built the thing that the thread is hyping. Hype-following is a strong negative signal at the senior level.

“The market is full of engineers who can build a demo. We are looking for the smaller group who has run a production system long enough to know what breaks at month three.”
- Derk Disselhoff

What this filter eliminates

Notebook engineers. Engineers whose only production experience is a Jupyter notebook handed to someone else to deploy. They know the models. They have not lived with the consequences of a model in production.

Title inflation. Engineers who held a senior title at a company where AI never reached production. The title is real. The shipping experience is not.

Framework tourists. Engineers who can name every framework and have shipped nothing meaningful with any of them. Pattern recognition beats framework recognition every time.

What it leaves

A small pool of engineers who have built, shipped, and supported AI systems that real users depended on. In a market of 200 CVs, this filter typically leaves 5 to 10 candidates worth a final interview. That is the entire point. The cost of a wrong AI hire in 2026, three months of salary plus a stalled roadmap plus a frustrated team, is high enough that taking longer at the top of the funnel pays back inside a quarter.

If you are hiring

Use the seven filters above, in order, on every candidate. If you do not have the bandwidth to run them, that is what Dissel Talent exists for. Every engineer in our network has been through this screen, and every placement comes with the option of an implementation review from our delivery team in week one. The CV is the start. The screen is what determines who actually ships.