About this role
This role offers a unique opportunity to contribute to a large-scale benchmark project aimed at assessing the capabilities of advanced AI systems in solving complex scientific and engineering problems. As a task designer, you will create original, graduate-level computational challenges that evaluate AI''s ability to utilize real scientific software for research-level tasks, including running simulations, interpreting results, designing experiments, and extracting hidden insights from data.
Key Responsibilities- Design problems that require proficient use of specialized scientific software, ensuring they test AI''s ability to perform complex, multi-step workflows.
- Create challenges where AI must strategically plan queries or experiments to uncover non-visible information, necessitating careful measurement and analysis.
- Engage in a testing loop with state-of-the-art AI models, refining problems to achieve the appropriate level of difficulty.
We are particularly interested in candidates with extensive, hands-on experience in:
- Structural & Mechanical Engineering, familiarity with scikit-fem or similar finite element libraries for beam analysis, elasticity problems, and computational mechanics. Knowledge of Timoshenko beam theory, mesh convergence studies, or variational formulations is advantageous.
- Experience with other specialized software in this domain will also be considered.
The ideal candidate possesses graduate-level expertise (MS or PhD preferred) in the relevant domain, with practical experience using the specified tools. You have a proven track record of writing code with these libraries to solve real research problems, understanding their limitations and the nuances that make a problem genuinely challenging.
Moreover, you approach problem design like a puzzle creator, focusing on challenges that require intelligent reasoning rather than mere computation. You craft problems where multiple plausible solutions exist, but only thorough analysis leads to the correct answer.
Requirements- Graduate-level training in a relevant STEM field (MS, PhD, or equivalent research experience).
- Demonstrated proficiency with at least one of the specified scientific software libraries, evidenced by research publications, open-source contributions, or professional experience.
- Strong Python programming skills for writing problem setups, oracle functions, and solution validators.
- Ability to work independently and iteratively refine problem designs based on feedback.
- Comfortable operating in a Linux/terminal environment with remote compute sandboxes.
- Availability for at least 15, 20 hours per week.
- Experience across multiple listed domains or tools.
- Familiarity with benchmark or evaluation design.
- Background in scientific teaching or exam/problem-set design.
- Experience with computational reproducibility and containerized environments.
Please note that this application includes a coding assessment as part of the evaluation process.