About this role
As an AI Quality Analyst, you will play a crucial role in evaluating a new personalization feature for Gemini. Your primary responsibility will be to assess how effectively the model utilizes information from your past Gemini conversations, Gmail, Google Search, and YouTube activity to enhance the relevance and helpfulness of its responses. This position requires a unique combination of creativity and analytical skills, as you will design prompts based on your personal experiences and rigorously analyze the quality of the model''s personalized responses across various dimensions.
Key Responsibilities:- Design and execute multi-turn conversational prompts (typically 1-5 turns) that leverage your personal information and experiences.
- Analyze responses for Grounding issues, ensuring that claims made by the model are supported by evidence and not flawed inferences or hallucinations.
- Evaluate and stack-rank two model responses side-by-side (SxS) to determine which is more helpful, user-friendly, and enjoyable.
- Write clear, defensible rationales for your comparisons, explicitly referencing where issues or positive aspects occurred in the conversation.
- Extract and verify "Debug Info" from the model to ensure that chat summaries and data sources were utilized correctly.
- Maintain strict data hygiene by deleting evaluation conversations to prevent them from affecting your future chat history.
- Chinese Proficiency: Ability to read and write in Chinese with a high degree of competence, as Chinese is the focus language for this project.
- Creative Prompt Engineering: Experience in designing creative, multi-turn starting prompts based on personal context to thoroughly test the model''s capabilities.
- Strong Evaluation Acumen: Understanding of personalization concepts, including the ability to identify incorrect personalization, poor inferences, and forced connections.
- Meticulous Attention to Detail: Ability to review Side-by-Side (SxS) model responses and identify subtle differences in naturalness and overnarrating.
- Excellent Written Communication: Superior ability to write clear, concise, and structured rationales for model rankings, explicitly referencing specific turn numbers.
- Feedback: Ability to provide constructive feedback and detailed annotations.
- Commitment: Required to work at least 4 hours per day and up to 40 hours per week, with 4 hours of overlap with PST.
- Engagement Type: Contractor
- Engagement Length: 3 months
The offered rate for this project is $15 per hour.
Eligibility:- BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field).
- Experience in data annotation, AI quality evaluation, content moderation, or a related role is strongly preferred.