About this role
Role Overview
Join a dynamic team focused on enhancing the safety of AI systems through rigorous testing and evaluation. This role is essential in identifying vulnerabilities in AI models by simulating adversarial attacks, ultimately contributing to the development of safer AI technologies.
Key Responsibilities- Red team conversational AI models and agents, including jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation.
- Generate high-quality human data by annotating failures, classifying vulnerabilities, and flagging systemic risks.
- Apply structured methodologies by following taxonomies, benchmarks, and playbooks to ensure consistent testing.
- Document findings reproducibly, producing reports, datasets, and attack cases that customers can act on.
- Prior experience in red teaming, including AI adversarial work, cybersecurity, or socio-technical probing.
- Curiosity and an adversarial mindset, with a tendency to push systems to their limits.
- Structured approach to testing, utilizing frameworks or benchmarks rather than random hacks.
- Strong communication skills to clearly explain risks to both technical and non-technical stakeholders.
- Adaptability to thrive in a fast-paced environment with diverse projects and customers.
- Experience with adversarial machine learning techniques such as jailbreak datasets, prompt injection, RLHF/DPO attacks, and model extraction.
- Background in cybersecurity, including penetration testing, exploit development, and reverse engineering.
- Knowledge of socio-technical risks, including harassment/disinformation probing, abuse analysis, and conversational AI testing.
- Creative probing skills in psychology, acting, or writing that foster unconventional adversarial thinking.
- Identifying vulnerabilities that automated tests may overlook.
- Delivering reproducible artifacts that enhance the robustness of customer AI systems.
- Expanding evaluation coverage by testing more scenarios and reducing surprises in production.
- Building trust with customers in the safety of their AI systems through thorough adversarial probing.
Gain valuable experience in human data-driven AI red teaming at the forefront of safety, while playing a direct role in making AI systems more robust, safe, and trustworthy.