Vetting standard

Vetted for production AI, not resume keywords.

Devlyn screens engineers for shipped AI work, evaluation discipline, failure-mode reasoning, AI-native tooling, communication, and ownership before they reach your shortlist.

Book a 30-minute role scope See roles

Direct answer

What makes AI vetting different?

A generic coding screen can miss the work that breaks AI products: evals, grounding, tool permissions, model cost, latency, failure modes, user feedback, and release judgment.

Why generic tests fail

AI production risk is role-specific.

A candidate can pass a standard code screen and still be weak at retrieval evals, model routing, agent traces, prompt injection risk, or cost-aware release decisions. Devlyn screens for the production surface the role will own.

Screening modules

The AI-native vetting standard.

No fake acceptance percentage. The standard is visible in the checks.

Shipped-AI evidence

Proof of LLM features shipped to real users, not tutorials or side projects.

Weak signal: Only demos, hackathons, or course projects.

Strong signal: Production features with users, traffic, and lessons from what broke.

Role specialism review

Depth in the specific role — retrieval, agents, platform, security, or decision science — instead of a broad AI label.

Weak signal: Generalist answers that apply to any AI job.

Strong signal: Specific tradeoffs and failure modes from their specialism.

Live technical deep-dive

A working session in their specialism with senior engineers, not a recruiter quiz.

Weak signal: Memorized definitions with no hands-on reasoning.

Strong signal: Thinks aloud, debugs, and defends choices under questioning.

Eval and failure-mode reasoning

How they measure quality and catch regressions before users do.

Weak signal: Judges output by feel with no baseline.

Strong signal: Builds eval sets and reasons about edge cases and regressions.

Cost and latency reasoning

Whether they treat token cost and latency as first-class production constraints.

Weak signal: Ignores cost and speed until the bill or the user complains.

Strong signal: Reasons about cost, latency, and quality as deliberate tradeoffs.

AI-native workflow review

Whether AI tooling is part of how they actually work, not a novelty.

Weak signal: Talks about AI tools but does not use them in practice.

Strong signal: Uses agentic tooling daily and knows where it helps and hurts.

Communication and ownership

Clear writing, honest status updates, and the instinct to own outcomes, not tickets.

Weak signal: Waits for instructions and reports activity, not results.

Strong signal: Writes clearly, flags risk early, and owns the outcome end to end.

Security judgment

Awareness of the AI-specific attack surface: injection, leakage, and tool misuse.

Weak signal: Assumes normal appsec covers AI risk.

Strong signal: Anticipates prompt injection, data exposure, and unsafe tool authority.

What we reject

Most candidates do not pass — on purpose.

Demo-only AI claims with no users or production history.
No eval thinking — quality judged by feel, not a baseline.
Vague, prompt-only experience with no specialism depth.
Cannot explain failure modes for their own role.
Cannot communicate tradeoffs in writing or in the deep-dive.
Ignores cost and latency until something breaks.
Ignores security boundaries, permissions, and tool authority.

Role-specific checks

Every role has different failure modes.

The shortlist is filtered for the role, not a broad AI label.

FDE

Forward-Deployed AI Engineer

Workflow discovery judgment
Production code ownership
Stakeholder clarity

APP

AI Application Engineer

AI product UX
Structured outputs
Streaming and retries

LLM

LLM Engineer

Eval design
Model comparison discipline
Cost and latency judgment

RAG

RAG & Context Engineer

Retrieval evals
Chunking and metadata judgment
Permission-aware context

AGT

Agentic Workflow Engineer

Agent graph design
Tool permission boundaries
Human approval gates

PLT

AI Platform Engineer

Platform API design
Provider routing
Observability

SEC

AI Security Engineer

AI threat modeling
Prompt injection reasoning
Tool permission audit

DSC

Data Scientist

Metric design
Data quality skepticism
Experiment judgment

What buyers see

The shortlist should explain itself.

Role fit

Why the engineer matches the exact ownership model.

Relevant proof

AI systems, workflows, evals, or delivery evidence tied to the role.

Interview focus

Questions that test the role’s highest-risk decisions.

Trial proof

What the first two weeks should produce.

Trial connection

Vetting continues in your codebase.

Pull request card

The trial should create inspectable code, not a private demo.

Eval report

Quality, retrieval, routing, or workflow behavior should have a baseline.

Architecture decision record

Tradeoffs should be written clearly enough for your team to maintain.

Workflow map

The engineer should make scope, actors, systems, and failure paths visible.

FAQ

Vetting questions.

How does Devlyn vet AI engineers?

Devlyn reviews shipped-AI evidence, role specialism, live technical judgment, eval thinking, cost/latency reasoning, security judgment, and communication.

Do you use generic coding tests?

No generic test is enough. AI roles are screened around their production failure modes: retrieval quality, agents, evals, model routing, platform, security, or decision science.

Do you claim top 1% or top 3%?

No. Devlyn avoids unverifiable percentile claims and explains the vetting standard instead.

What does a buyer see in the shortlist?

A short profile set with role fit, relevant proof, availability, and matching rationale.

What does Devlyn reject?

Vague AI claims, demo-only experience, no eval discipline, weak production judgment, unclear communication, and inability to explain failure modes.

How does vetting connect to the trial?

The trial tests the same criteria inside the buyer’s real workflow.

Can we run our own interview?

Yes. Devlyn encourages role-specific interviews focused on practical judgment.

Is security reviewed for every role?

Security judgment is part of the screen, with deeper review for AI Security, RAG, Agentic, and Platform roles.