Role Overview: This position offers an exciting opportunity to engage in the development of large language model (LLM) evaluation and training datasets, specifically designed to tackle realistic software engineering challenges. You will play a pivotal role in creating verifiable software engineering tasks based on public repository histories, utilizing a synthetic approach that incorporates human feedback, while also expanding dataset coverage across various programming languages and difficulty levels.

Key Responsibilities:

Analyze and triage GitHub issues across trending open-source libraries.
Set up and configure code repositories, including Dockerization and environment setup.
Evaluate unit test coverage and quality.
Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
Collaborate with researchers to design and identify repositories and issues that are challenging for LLMs.
Lead a team of junior engineers to collaborate on projects.

Qualifications:

Minimum 3+ years of overall experience.
Strong experience with at least one of the following languages: Ruby.
Proficiency with Git, Docker, and basic software pipeline setup.
Ability to understand and navigate complex codebases.
Comfortable running, modifying, and testing real-world projects locally.
Experience contributing to or evaluating open-source projects is a plus.

Nice to Have:

Previous participation in LLM research or evaluation projects.
Experience building or testing developer tools or automation agents.

Work Terms:

Commitments Required: At least 4 hours per day and a minimum of 20 hours per week, with 4 hours of overlap with PST. Options for time commitment include 20 hrs/week, 30 hrs/week, or 40 hrs/week.
Employment Type: Contractor assignment (no medical/paid leave).
Location: Candidates must be located in India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, or Mexico.

Compensation: Competitive compensation based on experience.

Eligibility:

Must have the legal right to work in the specified locations.

Evaluation Process:

Two rounds of interviews (60 min technical + 30 min technical & cultural discussion).

Why Join Us? Join a rapidly growing AI company at the forefront of evaluating how LLMs interact with real code, influencing the future of AI-assisted software development. This role uniquely combines practical software engineering with AI research, offering a chance to work on cutting-edge projects with leading LLM companies in a fully remote environment.

Senior Software Engineer, Ruby for LLM Evaluation

About this role

Related Jobs

Cloud Architect for AI Model Training

Competitive Programming Checker for AI Training

Software Engineer, New Grad

Audio Engineer for AI Model Training

Senior Software Engineer for AI Systems