Role Overview

This position offers an exciting opportunity for experienced Software Engineers specializing in Data Engineering and Data Science to engage in benchmark-driven evaluation projects. You will work with production-like datasets, data pipelines, and data science tasks aimed at evaluating and enhancing the performance of advanced AI systems. The ideal candidate will possess a solid foundation in both data engineering and data science, with the capability to navigate data preparation, analysis, and model-related workflows in real-world codebases.

Key Responsibilities

Work with structured and unstructured datasets to support SWE Bench-style evaluation tasks.
Design, build, and validate data pipelines used in benchmarking and evaluation workflows.
Perform data processing, analysis, feature preparation, and validation for data science use cases.
Write, run, and modify Python code to process data and support experiments locally.
Evaluate data quality, transformations, and outputs for correctness and reproducibility.
Create clean, well-documented, and reusable data workflows suitable for benchmarking.
Participate in code reviews to ensure high standards of code quality and maintainability.
Collaborate with researchers and engineers to design challenging, real-world data engineering and data science tasks for AI systems.

Qualifications

Minimum 3+ years of overall experience as a Data Engineer, Data Scientist, or Software Engineer (data-focused).
Strong proficiency in Python for data engineering and data science workflows.
Demonstrable experience with data processing, analysis, and model-related workflows.
Solid understanding of machine learning and data science fundamentals.
Experience working with structured and unstructured data.
Ability to understand, navigate, and modify complex, real-world codebases.
Experience writing readable, reusable, maintainable, and well-documented code.
Strong problem-solving skills, including experience with algorithmic or data-intensive problems.
Excellent spoken and written English communication skills.

Work Terms

Commitments Required: At least 4 hours per day and a minimum of 20 hours per week with 4 hours of overlap with PST.
Engagement Type: Contractor assignment (no medical/paid leave).
Duration of Contract: 3 months (adjustable based on engagement).

Compensation

Compensation details will be discussed during the interview process.

Eligibility

This is a fully remote position.
Opportunity to work on cutting-edge AI projects with leading LLM companies.

Data Engineer/Data Scientist for AI Benchmark Evaluation

About this role

Related Jobs

Cloud Architect for AI Model Training

Competitive Programming Checker for AI Training

Software Engineer, New Grad

Audio Engineer for AI Model Training

Senior Software Engineer for AI Systems