About this role
Role Overview
This position focuses on creating credible, large-scale benchmarks for blue-team activities such as detection engineering, threat hunting, incident triage, malware analysis, and incident response. The role is designed for a practitioner with firsthand blue-team experience who can translate that expertise into effective evaluation design.
Key Responsibilities- Design and build benchmark tasks based on real Security Operations Center (SOC) and detection engineering workflows.
- Construct realistic evaluation environments, including multi-host networks, Active Directory, and cloud control planes, moving beyond simplistic scenarios.
- Define success criteria for blue-team AI reasoning and develop infrastructure to measure it reproducibly at scale.
- Hands-on blue-team experience in at least one of the following areas: detection engineering, threat hunting, incident response, or malware analysis.
- Ability to recognize effective analyst judgment and create evaluations that accurately test it.
- Strong scripting skills and experience in cloud and enterprise environments.
- Clear opinions on what is important in blue-team evaluation and why.
This is a remote, hourly position.
CompensationHourly rate ranges from $85 to $140.
EligibilityApplicants must have relevant experience and skills as outlined in the qualifications.