LLM Evaluation, Benchmarking

Only for registered members Canada

1 week ago

Default job background

Job summary

We're looking for an LLM Evaluation, Benchmarking & Experimentation Engineer to rigorously test our proprietary LLM API and build the infrastructure for systematic model improvement.


Lorem ipsum dolor sit amet
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.

Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.

Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Get full access

Access all high-level positions and get the job of your dreams.



Similar jobs

  • Only for registered members Remote job $50 - $150 (USD) per hour

    +I'm seeking a technical mentor to help deepen my understanding of LLM evaluation and benchmarking, with particular attention to high-stakes applications (e.g., mental health), while developing a generalizable framework for reasoning about model performance across domains. · ...

  • Only for registered members Remote job $12 - $55 (USD) per hour

    We are seeking a skilled Technical Creative Writing Benchmark Developer to help us benchmark large language models (LLMs) with 30 hours per week. · Mandatory skills: · Creative Writing · Content Writing · Search Engine OptimizationWriting ...

  • AI/ML Engineer

    2 weeks ago

    Only for registered members Remote job $85 - $125 (USD) per hour

    We're looking for an ML engineer to help us evaluate and benchmark language models using proprietary datasets. · Assess how existing models perform against our specialized datasets · ...

  • Only for registered members Montreal

    Write and refine prompts to guide model behavior in physics contexts. · Evaluate LLM-generated responses to physics-related queries for conceptual accuracy, · mathematical correctness, and reasoning quality. · Conduct fact-checking using authoritative public sources and domain kn ...

  • Only for registered members Montreal

    Evaluate LLM-generated responses for effectiveness in answering user queries. Conduct fact-checking using trusted public sources and external tools. Generate high-quality human evaluation data by annotating response strengths areas for improvement and factual inaccuracies. · Eval ...

  • Only for registered members Montreal

    Mercor conecta talento creativo y técnico con laboratorios de investigación de IA. Se buscan evaluadores para evaluar respuestas generadas por modelos LLM. · ...

  • Data Annotator

    2 weeks ago

    Only for registered members Montreal

    Evaluate LLM-generated responses on their ability to effectively answer user queries. Conduct fact-checking using trusted public sources and external tools. · ...

  • Data Annotator

    3 weeks ago

    Only for registered members Montreal

    Mercor connects elite creative and technical talent with leading AI research labs. · Bachelor's degree · Significant experience using large language models (LLMs) · ...

  • Only for registered members Montreal

    +We are seeking a Research Physicist to join our team of elite creative and technical talent. As a Physics AI Evaluator, you will write and refine prompts to guide model behavior in physics contexts. · + ...

  • Data Annotator

    1 month ago

    Only for registered members Montreal

    Evaluate LLM-generated responses for effectiveness in answering user queries. · Evaluate model responses align with expected conversational behavior and system guidelines. ...

  • Only for registered members Montreal

    Evaluate LLM-generated responses on their ability to effectively answer user queries. Conduct fact-checking using trusted public sources and external tools. Generate high-quality human evaluation data by annotating response strengths, areas for improvement, and factual inaccuraci ...

  • Content Reviewer

    2 weeks ago

    Only for registered members Montreal

    +Mercor connects elite creative and technical talent with leading AI research labs. · +Bachelor's degree · Native speaker or ILR 5/primary fluency (C2 on the CEFR scale) in French · +,valid_job:1} ...

  • Only for registered members Montreal

    About Mercor connects elite creative and technical talent with leading AI research labs. We are looking for an experienced Conversational AI Evaluator to join our team. · Evaluate LLM-generated responses for effectiveness in answering user queries. Conduct fact-checking using tru ...

  • Only for registered members Montreal

    Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco. · ...

  • Only for registered members Montreal

    Write and refine prompts to guide model behavior in engineering scenarios. · Evaluate LLM-generated responses to engineering-related queries for technical accuracy and applied reasoning. · Conduct fact-checking and verify technical claims using authoritative public sources and do ...

  • Only for registered members Montreal

    Mercor connects elite creative and technical talent with leading AI research labs. · ...

  • Only for registered members Montreal

    Mercor connects elite creative and technical talent with leading AI research labs. · Headquartered in San Francisco, · our investors include Benchmark, · General Catalyst, · Peter Thiel, · Adam D'Angelo, · Larry Summers, · and Jack Dorsey. · ...

  • Linguist

    1 month ago

    Only for registered members Montreal

    Evaluate LLM-generated responses for effectiveness in answering user queries. · Evaluate LLM-generated responses for effectiveness in answering user queries. · Conduct fact-checking using trusted public sources and external tools. · Generate high-quality human evaluation data by ...

  • Only for registered members Montreal

    +Mercor connects elite creative and technical talent with leading AI research labs. · +Evaluate LLM-generated responses for effectiveness in answering user queries. · Conduct fact-checking using trusted public sources and external tools. · +Bachelor's degreeNative speaker or ILR ...

  • Only for registered members Montreal

    +Job summary · Write and refine prompts to guide model behavior in financial contexts.ResponsibilitiesWrite and refine prompts to guide model behavior in financial contexts. · Evaluate LLM-generated responses to finance-related user queries for accuracy, reasoning quality, and cl ...