Role Overview

Contribute to benchmark-driven evaluation projects that focus on real-world machine learning systems as a Machine Learning Engineer (MLE Bench). This role involves hands-on engagement with production-grade ML codebases, model training and evaluation pipelines, and deployment-oriented workflows to assess and enhance the capabilities of advanced AI systems. The ideal candidate will bridge research and engineering, working closely with models, data, and infrastructure in realistic ML environments.

Key Responsibilities

Work with real-world ML codebases to support MLE Bench-style evaluation tasks.
Build, run, and modify model training, evaluation, and inference pipelines.
Prepare datasets, features, and metrics for ML benchmarking and validation.
Debug, refactor, and improve production-like ML systems for correctness and performance.
Evaluate model behavior, failure modes, and edge cases relevant to benchmark tasks.
Write clean, reproducible, and well-documented Python code for ML workflows.
Participate in code reviews to ensure high standards of engineering quality.
Collaborate with researchers and engineers to design challenging, real-world ML engineering tasks for AI system evaluation.

Qualifications

Minimum 3+ years of overall experience as a Machine Learning Engineer or Software Engineer (ML-focused).
Strong proficiency in Python for machine learning and data workflows.
Hands-on experience with model training, evaluation, and inference pipelines.
Solid understanding of machine learning fundamentals (supervised/unsupervised learning, evaluation metrics, optimization).
Experience working with ML frameworks (e.g., PyTorch, TensorFlow, JAX, or similar).
Ability to understand, navigate, and modify complex, real-world ML codebases.
Experience writing readable, reusable, and maintainable production-quality code.
Strong problem-solving and debugging skills.
Excellent spoken and written English communication skills.

Work Terms

Commitments Required: At least 4 hours per day and minimum 20 hours per week with overlap of 4 hours with PST.
Engagement Type: Contractor assignment (no medical/paid leave).
Duration of Contract: 3 months (adjustable based on engagement).
Location: India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, Brazil, Mexico.

Compensation

Compensation details will be discussed during the interview process.

Eligibility

This position is open to candidates located in specified countries only.

Perks of Freelancing

Work in a fully remote environment.
Opportunity to work on cutting-edge AI projects with leading LLM companies.

Machine Learning Engineer for Benchmark Evaluation

About this role

Related Jobs

Cloud Architect for AI Model Training

Competitive Programming Checker for AI Training

Software Engineer, New Grad

Audio Engineer for AI Model Training

Senior Software Engineer for AI Systems