Generalist – English & French (LLM Evaluator)
Job description
Why This Role Exists
Mercor partners with leading AI teams to improve the quality, usefulness, and reliability of general-purpose conversational AI systems.
This project focuses on evaluating and improving general chat behavior in large language models (LLMs). You will assess AI-generated responses across diverse topics and provide structured human feedback to ensure outputs are accurate, well-reasoned, and aligned with real-world expectations.
What You’ll Do
- Evaluate LLM-generated responses for clarity, correctness, and completeness.
- Conduct fact-checking using trusted public sources and verification tools.
- Annotate strengths, weaknesses, and factual inaccuracies.
- Assess reasoning quality, tone, and conversational alignment.
- Ensure outputs comply with system guidelines and expected behavior.
- Apply consistent annotations using structured taxonomies and evaluation rubrics.
Who You Are
- Bachelor’s degree holder.
- Native French speaker or ILR 5 / C2 proficiency.
- Fluent in English.
- Experienced user of large language models (LLMs).
- Strong writing skills with ability to articulate nuanced feedback.
- Highly detail-oriented and analytical.
- Comfortable working across diverse domains and topics.
- Strong college-level mathematics skills.
Nice-to-Have
- Experience with RLHF, model evaluation, or annotation workflows.
- Experience comparing multiple outputs and making fine-grained qualitative judgments.
- Familiarity with evaluation rubrics and benchmarking systems.
- Background in research, analytics, linguistics, or engineering.
What Success Looks Like
- You consistently identify factual inaccuracies and reasoning gaps.
- Your evaluation artifacts are clear, consistent, and reproducible.
- Your feedback leads to measurable improvements in AI response quality.
- AI systems improve before public deployment due to your evaluations.
Contract & Payment
- Independent contractor engagement.
- Fully remote with flexible schedule.
- Weekly payments via Stripe or Wise.
- Geography restricted to Europe, Canada (Quebec), and USA.
- $36.16 per hour.
About Mercor
Mercor partners with leading AI labs and enterprises to train frontier models using human expertise. Contributors collaborate with researchers to improve advanced AI systems used globally.
You will be redirected to the company's website to complete your application.
Mercor
Discover more opportunities that match your skills and interests.