Role Overview

Evaluate and improve frontier AI coding models by completing structured technical assessments that mirror realistic machine learning engineering workflows, model training and inference systems, MLOps, and LLM application scenarios.

Key Responsibilities

Use frontier AI coding agents to complete and evaluate complex ML and AI engineering tasks.
Review model-generated implementations across model training, inference systems, deployment infrastructure, and LLM applications.
Identify bugs, edge cases, performance regressions, and failure modes in model outputs and implementations.
Compare outputs from multiple frontier models and assess their relative strengths, weaknesses, and tradeoffs.
Apply professional engineering judgment to realistic ML engineering scenarios, documenting findings and recommendations.

Qualifications

At least 2 years of professional machine learning engineering experience.
Experience building production ML systems, model deployment infrastructure, LLM applications, or AI-powered products.
Regular use of AI coding agents such as Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or similar tools.
Ability to evaluate model-generated machine learning implementations and reason about technical tradeoffs.
Experience deploying ML systems to production is preferred.

Work Terms

Location: Remote.
Employment type: hourly.
Sprint-based engagement, with work organized into 12-24 hour stretches based on client requirements.
Spots are limited and are filled on a first-come, first-serve basis.

Compensation

$400 per accepted task.
Typical tasks take approximately 2-3 hours after ramp-up.
Compensation is tied to accepted work.
Hourly rate (metadata): $85 per hour.

Eligibility

This role is intended for engineers with 2+ years of ML engineering experience who regularly use AI coding agents and can assess model-generated ML solutions. Preference is given to candidates with experience deploying ML systems to production. Compensation is contingent on accepted deliverables.

Machine Learning Engineer for AI Model Evaluation

About this role

Related Jobs

MLOps Engineer for AI Model Training

Java Developer for AI System Training

Performance Engineer for AI Model Training

Python Developer for AI Model Training

Frontend Software Engineer for AI Training