AI Engineer (Evaluation Systems)

New

Skills

Anthropic AWS Docker OpenAI Postgresql Python

Design a structured, configurable evaluation engine combining deterministic checks with LLM-as-judge verdicts. Build calibration workflows using expert-labeled examples, measure precision and recall accurately, handle delayed outcomes and low-confidence review flows, and store structured verdicts to power dashboards and analytics.

Key Responsibilities
  • Design a configurable evaluation engine
  • Combine deterministic checks with LLM-as-judge verdicts
  • Build calibration workflows using expert-labeled examples
  • Measure precision and recall accurately
  • Handle delayed outcomes and low-confidence review flows
  • Store structured verdicts to power dashboards and analytics
Required Skills & Qualifications
  • 4+ years backend / ML engineering experience
  • 2+ years building production AI/LLM systems
  • Python, Docker, and PostgreSQL experience
  • AWS, OpenAI, Anthropic, and other LLM APIs knowledge
  • Proven experience building LLM-based production systems
  • Experience developing evaluation/QA/score pipelines
  • Remote work with LATAM focus
  • Independent contractor via payroll platform
  • Remote work allocated at client
  • Human-in-the-loop workflow design (Plus)
  • OpenTelemetry familiarity (Plus)

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: 12 Months

Share this job:

Similar Jobs

AI Automation Engineer

New

Lead AI automation across internal operations

Prototyping MVPs using LLM APIs and agents

Anthropic Gemini Javascript Python

Strategic Partnerships & AI Ecosystem Lead

Posted 5 days ago

Define and lead ISV ecosystem strategy

Expand AWS co-sell partnerships with key players

Ai Anthropic AWS google

Staff AI Engineer

Posted 10 days ago

Set architectural direction for agentic systems

Identify and fix reliability and scalability gaps

Anthropic Gemini OpenAI Python

Risk Engineering Software Engineer

Posted 29 days ago

Automate risk workflows using Go and AI tools.

Prototype and turn experiments into production.

Anthropic CoPilot Go Kubernetes

AI Product Engineer Role

Posted 304 days ago

Rapid prototyping and deployment of AI features

Building scalable applications using modern cloud stacks

Anthropic Next.js OpenAI Python
overtime