Senior Software Engineer, Ruby for LLM Evaluation
from $40/hour
About this role
Role Overview: In this role, you will contribute to the development of LLM evaluation and training datasets aimed at addressing realistic software engineering challenges. Your expertise will be vital in building verifiable software engineering tasks based on public repository histories, utilizing a synthetic approach with human-in-the-loop methodologies while expanding dataset coverage across various programming languages and difficulty levels.
Key Responsibilities:
- Analyze and triage GitHub issues across trending open-source libraries.
- Set up and configure code repositories, including Dockerization and environment setup.
- Evaluate unit test coverage and quality.
- Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
- Collaborate with researchers to identify challenging repositories and issues for LLMs.
- Lead a team of junior engineers on collaborative projects.
Qualifications:
- Minimum of 3 years of overall experience.
- Strong experience with Ruby.
- Proficiency in Git, Docker, and basic software pipeline setup.
- Ability to understand and navigate complex codebases.
- Comfortable running, modifying, and testing real-world projects locally.
- Experience contributing to or evaluating open-source projects is a plus.
Preferred Qualifications:
- Previous participation in LLM research or evaluation projects.
- Experience building or testing developer tools or automation agents.
Work Terms:
- Commitment of at least 4 hours per day and a minimum of 20 hours per week, with 4 hours of overlap with PST. Options for time commitment include 20 hrs/week, 30 hrs/week, or 40 hrs/week.
- Contractor assignment (no medical/paid leave).
- Location: Candidates must be based in India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, or Mexico.
Compensation: Competitive compensation commensurate with experience.
Eligibility: Open to candidates with the required qualifications and located in specified countries.
Evaluation Process: The evaluation process will include two rounds of interviews: a 60-minute technical interview followed by a 30-minute technical and cultural discussion.