Senior Software Engineer, Python for LLM Evaluation
from $40/hour
About this role
Role Overview: This position involves contributing to the development of LLM evaluation and training datasets aimed at addressing realistic software engineering challenges. You will play a crucial role in building verifiable software engineering tasks based on public repository histories, utilizing a synthetic approach with human-in-the-loop methodologies, while also expanding the dataset coverage across various programming languages and difficulty levels.
Key Responsibilities:
- Analyze and triage GitHub issues across trending open-source libraries.
- Set up and configure code repositories, including Dockerization and environment setup.
- Evaluate unit test coverage and quality.
- Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
- Collaborate with researchers to design and identify repositories and issues that pose challenges for LLMs.
- Lead a team of junior engineers in collaborative project efforts.
Qualifications:
- Minimum of 3 years of overall experience in software engineering.
- Strong proficiency in Python or a similar programming language.
- Experience with Git, Docker, and basic software pipeline setup.
- Ability to understand and navigate complex codebases effectively.
- Comfortable running, modifying, and testing real-world projects locally.
- Experience contributing to or evaluating open-source projects is a plus.
Nice to Have:
- Previous involvement in LLM research or evaluation projects.
- Experience in building or testing developer tools or automation agents.
Work Terms:
- Commitment of at least 4 hours per day and a minimum of 20 hours per week, with 4 hours of overlap with PST.
- Contractor assignment (no medical/paid leave).
- Contract duration is 3 months, with an expected start date next week.
- Open to candidates located in India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, and Mexico.
Compensation: Competitive compensation based on experience.
Eligibility: Candidates must be located in the specified countries and meet the outlined qualifications.
Perks of Freelancing:
- Fully remote work environment.
- Opportunity to engage in cutting-edge AI projects with leading LLM companies.
Evaluation Process: The evaluation process will consist of two rounds of interviews, including a 60-minute technical interview followed by a 30-minute technical and cultural discussion.