Model Evaluation Engineer

New

Skills

Cloud infrastructure Data Pipelines Large language models (llm) Machine Learning Python Sql Statistical Analysis

The Research Engineer, Evaluations role focuses on end-to-end and integration-level model evaluation, emphasizing accuracy, latency, and feature-specific metrics. This position involves building and maintaining competitive benchmarking pipelines, designing systematic experiments, and translating qualitative customer feedback into quantifiable evaluation criteria.

Key Responsibilities
  • Own end-to-end and integration-level model evaluation across various metrics.
  • Build and maintain competitive benchmarking pipelines.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets.
  • Create evaluation subsets to stress-test specific capabilities and edge cases.
  • Define evaluation metrics for real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Work with customer-facing teams to understand pain points and convert them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation.
  • Identify evaluation gaps proactively and propose solutions.
Required Skills & Qualifications
  • Strong understanding of ML fundamentals.
  • Strong Python skills, including writing clean evaluation scripts.
  • Proficiency with data pipelines, SQL, and cloud infrastructure.
  • Intuition for good evaluation metrics and statistical rigor.
  • Familiarity with voice agent stack components like VAD, ASR, LLM, and TTS systems.
  • Tinkerer mentality with a preference for shipping and iterating quickly.
  • Excellent communication skills to explain technical results and summarize findings.
  • Ownership mindset to proactively fill evaluation gaps.
  • Ability to work at least 3-4 hours overlapping with Eastern US Time Zone.
  • Experience in maintaining clear documentation.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: 12 Months

Share this job:

Similar Jobs

Strategic Partner Development

New

Architect alliances with hardware partners.

Identify decision-makers within partner organizations.

Cloud infrastructure Market Analysis Mentoring And Coaching Project Management

Principal Software Engineer

New

Design and build backend services and APIs.

Lead the architecture of distributed systems and databases.

API Design Backend Development Cloud infrastructure Data Pipelines

Strategic Partner Development

New

Architect strategic alliances with hardware partners.

Influence decision-makers within partner organizations.

Cloud infrastructure

Senior Penetration Tester

Posted 11 days ago

Lead advanced penetration testing projects

Provide expert guidance on security

Apis AWS Azure Cloud infrastructure

Senior Penetration Tester

Posted 11 days ago

Lead and conduct white-box penetration testing

Define offensive security roadmap

Apis AWS Azure Cloud infrastructure

Senior Penetration Tester

Posted 12 days ago

Lead advanced penetration testing and security assessments.

Define offensive security roadmap and adopt cutting-edge techniques.

Apis AWS Azure Cloud infrastructure

Senior Software Engineer (US)

Posted 12 days ago

Architect and deliver features of AI Platform ecosystem

Leverage AI tools to boost productivity and quality

AWS Cloud infrastructure Databricks Javascript

AI Solutions Engineer

Posted 13 days ago

Implement AI goals and craft solutions

Guide engineering teams and set standards

API Design Client Communication Cloud infrastructure Cross-functional Collaboration

Senior Penetration Tester

Posted 14 days ago

Lead advanced penetration testing activities

Perform security assessments on Cloud Infrastructure

AWS Azure Cloud infrastructure Kubernetes

Forward Deployed Engineer, Deepgram for Restaurants

Posted 14 days ago

Embed with customer engineering teams

Own end-to-end technical implementation for enterprise deployments

Apis Cloud infrastructure Conversational AI

Senior Pen Tester

Posted 15 days ago

Lead and conduct advanced penetration testing

Provide expert guidance on security

Apis AWS Azure Cloud infrastructure

Senior Sales Engineer

Posted 15 days ago

Drive expansion opportunities across Commercial accounts

Lead technical discovery for upsell and cross-sell motions

Bi tools Cloud infrastructure Consultative Selling

Staff Product Manager, Flink Cloud Platform

Posted 15 days ago

Own the Flink Cloud Platform roadmap

Drive platform capabilities

Auditing Cloud infrastructure Distributed systems Encryption

AI Infrastructure Architect

Posted 15 days ago

Architect cloud infrastructure strategy

Design and implement infrastructure-as-code foundations

Ai Bash CI/CD Cloud infrastructure

Senior Software Engineer, Inference Project

Posted 15 days ago

Build and maintain large-scale distributed inference systems.

Develop routing and fleet orchestration across accelerators.

AWS Cloud infrastructure Distributed systems Gcp

Senior Penetration Tester

Posted 16 days ago

Lead and conduct advanced penetration testing

Define offensive security roadmap

AWS Azure Cloud infrastructure Kubernetes

Senior Penetration Tester

Posted 17 days ago

Lead advanced penetration testing of web applications and APIs.

Perform security assessments on Cloud Infrastructure.

Apis AWS Azure Cloud infrastructure

Analyst Relations Manager

Posted 19 days ago

Coordinate analyst engagements effectively.

Manage day-to-day relationships with analysts from multiple firms.

Cloud infrastructure Communications Cybersecurity Product marketing

Customer Engineer

Posted 19 days ago

Own complex customer escalations

Collaborate across teams

Ai Cloud infrastructure Distributed systems NoSQL

Senior Penetration Tester

Posted 20 days ago

Lead and conduct advanced penetration testing

Define offensive security roadmap

Apis AWS Azure Cloud infrastructure

Security Software Engineer

Posted 20 days ago

Shape identity and authorization controls

Design and implement IAM software

AWS Cloud infrastructure Containers Kubernetes

Software Engineer, Developer Platform

Posted 21 days ago

Building and supporting shared infrastructure and tooling for SaaS apps

Improving reliability with instrumentation and monitoring of TS services

AWS CI/CD Cloud infrastructure Communication

Site Reliability Engineer, IT

Posted 21 days ago

Extend CI/CD for IT services

Embed surveillance tooling for security/compliance

Automation Aws Services CI/CD Cloud infrastructure

Sr. Escalation Engineer

Posted 21 days ago

Maintain Illumio deployments for leading customers.

Provide technical support for complex problems.

Cloud infrastructure Kubernetes OpenShift Rest Api

Software Engineer, Trust & Safety

Posted 21 days ago

Analyze threat actor behavior and evolving abuse patterns

Research, prototype, and implement LLM-driven techniques

AWS Cloud infrastructure Javascript Prompt Engineering

Software Engineer

Posted 21 days ago

Deliver software for millions of players

Own projects from start to finish

CI/CD Cloud infrastructure Code reviews Customer Support

Senior Penetration Tester

Posted 22 days ago

Lead and conduct advanced penetration testing

Perform security assessments on Cloud Infrastructure

Apis AWS Azure Cloud infrastructure

Senior Product Manager - AI Cloud

Posted 22 days ago

Lead and manage software programs for AI Cloud platform

Engage with engineers on design decisions

Ai Cloud infrastructure Distributed systems Kubernetes

Site Reliability Engineer

Posted 22 days ago

Ensure platform reliability and scalability

Implement observability and drive faster response

AWS Cloud infrastructure Gcp Go

Senior Full Stack Engineer

Posted 22 days ago

Build and enhance React front-end app

Develop and maintain backend services

AWS C# Cloud infrastructure GraphQL

Advanced Penetration Testing Expert

Posted 22 days ago

Lead and conduct advanced penetration testing

Exploit security flaws and misconfigurations

Apis AWS Azure Cloud infrastructure

Senior Trust & Safety Engineer

Posted 23 days ago

Analyze threat actor behavior and evolving abuse patterns

Research, prototype, and implement LLM-driven techniques for abuse detection

AWS Cloud infrastructure Javascript/typescript LLMs

Senior Software Engineer - Authentication

Posted 23 days ago

Design and maintain scalable identity security solutions

Stay ahead of identity and security technologies

Angular Cloud infrastructure Java OAuth

Senior Software Engineer, Trust & Safety

Posted 23 days ago

Analyze threat actor behavior and abuse patterns

Develop state-of-the-art techniques for abuse detection

AWS Cloud infrastructure Data Analysis Javascript

Senior Software Engineer Trust & Safety

Posted 24 days ago

Analyze and understand threat actor behavior and abuse patterns.

Research and implement state-of-the-art abuse detection techniques.

AWS Cloud infrastructure Data Analysis Javascript

Senior Penetration Tester

Posted 24 days ago

Lead and conduct advanced white-box penetration testing

Perform security assessments on Cloud Infrastructure

Apis AWS Azure Cloud infrastructure

Agentic AI Systems Lead

Posted 24 days ago

Define and lead technical vision for AI systems

Architect scalable LLMs with retrieval and real-time decisions

Cloud infrastructure Pytorch Rag TensorFlow

Senior Pen Tester

Posted 24 days ago

Lead and conduct advanced white-box penetration testing.

Define offensive security roadmap and adopt new testing techniques.

Apis AWS Azure Cloud infrastructure

Senior Software Engineer, Trust & Safety

Posted 25 days ago

Analyze threat actor behavior and abuse patterns.

Implement state-of-the-art LLM-driven techniques for abuse detection.

AWS Cloud infrastructure Javascript LLMs

Senior Penetration Tester

Posted 25 days ago

Lead and conduct advanced penetration testing

Perform security assessments on Cloud Infrastructure

Apis Cloud infrastructure Kubernetes Penetration Testing

Senior Software Engineer, Trust & Safety

Posted 25 days ago

Analyze threat actor behavior and abuse patterns

Design and develop systems for abuse detection and prevention

AWS Cloud infrastructure Javascript LLMs

Senior Pen Tester

Posted 25 days ago

Perform advanced penetration testing on web applications, APIs, and cloud infrastructure.

Lead and define offensive security strategies and roadmaps.

Apis AWS Azure Cloud infrastructure

Senior Penetration Tester

Posted 26 days ago

Lead and conduct advanced penetration testing

Define offensive security roadmap

Apis AWS Azure Cloud infrastructure

Senior Software Engineer, Trust & Safety

Posted 26 days ago

Analyze threat actor behavior and evolving abuse patterns.

Research, prototype, and implement state-of-the-art detection techniques.

AWS Cloud infrastructure Javascript LLMs

DevOps Engineer III

Posted 26 days ago

Build scalable, secure cloud infrastructure

Design and implement CI/CD pipelines and tooling

AWS CI/CD Cloud infrastructure Containerization

Trust & Safety Engineer

Posted 26 days ago

Analyze threat actor behavior and evolving abuse patterns.

Research, prototype, and implement abuse detection techniques.

AWS Cloud infrastructure Data Analysis Javascript

Senior Datacenter Engineer

Posted 26 days ago

Lead automation build-out for datacenter servers

Manage end-to-end machine lifecycle

Automation AWS Cloud infrastructure Go

Lead Penetration Tester

Posted 27 days ago

Lead and conduct advanced penetration testing projects

Exploit security vulnerabilities in Cloud Infrastructure

Apis AWS Azure Cloud infrastructure

Senior Trust & Safety Engineer

Posted 27 days ago

Analyze threat actor behavior and abuse patterns

Research and implement abuse detection techniques

AWS Cloud infrastructure Javascript LLMs

Trust & Safety Software Engineer

Posted 28 days ago

Analyze threat actor behavior and abuse patterns

Implement state-of-the-art techniques for abuse detection

AWS Cloud infrastructure Data Analysis Javascript
overtime