Role Overview: This position involves contributing to the development of LLM evaluation and training datasets aimed at addressing realistic software engineering challenges. You will play a crucial role in building verifiable software engineering tasks based on public repository histories, utilizing a synthetic approach with human-in-the-loop methodologies, while also expanding the dataset coverage across various programming languages and difficulty levels.

Key Responsibilities:

Analyze and triage GitHub issues across trending open-source libraries.
Set up and configure code repositories, including Dockerization and environment setup.
Evaluate unit test coverage and quality.
Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
Collaborate with researchers to design and identify repositories and issues that pose challenges for LLMs.
Lead a team of junior engineers in collaborative project efforts.

Qualifications:

Minimum of 3 years of overall experience in software engineering.
Strong proficiency in Python or a similar programming language.
Experience with Git, Docker, and basic software pipeline setup.
Ability to understand and navigate complex codebases effectively.
Comfortable running, modifying, and testing real-world projects locally.
Experience contributing to or evaluating open-source projects is a plus.

Nice to Have:

Previous involvement in LLM research or evaluation projects.
Experience in building or testing developer tools or automation agents.

Work Terms:

Commitment of at least 4 hours per day and a minimum of 20 hours per week, with 4 hours of overlap with PST.
Contractor assignment (no medical/paid leave).
Contract duration is 3 months, with an expected start date next week.
Open to candidates located in India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, and Mexico.

Compensation: Competitive compensation based on experience.

Eligibility: Candidates must be located in the specified countries and meet the outlined qualifications.

Perks of Freelancing:

Fully remote work environment.
Opportunity to engage in cutting-edge AI projects with leading LLM companies.

Evaluation Process: The evaluation process will consist of two rounds of interviews, including a 60-minute technical interview followed by a 30-minute technical and cultural discussion.

Senior Software Engineer, Python for LLM Evaluation

About this role

Related Jobs

Cloud Architect for AI Model Training

Competitive Programming Checker for AI Training

Software Engineer, New Grad

Audio Engineer for AI Model Training

Senior Software Engineer for AI Systems