Literary Critic for LLM Benchmarks
1 month ago

Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
We're looking for an LLM Evaluation, Benchmarking & Experimentation Engineer to rigorously test our proprietary LLM API and build the infrastructure for systematic model improvement. · ...
1 month ago
Expert On-Device LLM Engineer needed for STT Chunking
Only for registered members
Overview: · We are looking for an experienced AI/ML Engineer specializing in On-Device inference (Edge AI) to optimize the AI pipeline of our React Native tablet application. · Currently, our app uses a Speech-to-Text (STT) system that feeds transcriptions into a local LLM (gemma ...
2 weeks ago
We are seeking a skilled professional to create a comprehensive Discovery and Architecture Document for LLM Robot Workers. · This project will include cost and quality benchmarking, · along with guidelines for secure deployment. · The ideal candidate will have experience in LLM t ...
1 month ago
Creative Writer with Statistical Expertise Needed for LLM Evaluation
Only for registered members
We are seeking a talented creative writer who possesses a strong statistical background to develop benchmarks for large language model output evaluation.Develop benchmarks for LLM output evaluation · ...
1 month ago
Data Scientist – II (4+ Years Experience) · Mandatory Requirements (Non-Negotiable): · * Strong background in NLP, LLMs, prompt engineering, and deep learning · * Strong proficiency in Python · * Experience with LangChain, PyTorch, and Pandas · * Hands-on experience with LLMs (an ...
1 week ago
AI/ML Data Scientist — Compliance AI Evaluation
Only for registered members
We need a data scientist to prove our AI works — with data, not marketing. You'll design and run evaluations that compare ZeroDrift's compliance detection against raw LLMs (GPT-4, Claude). ...
1 month ago
Bond Studio makes software that lets users capture their space with their phone and see a visualization of how that room could look if it were remodeled. Users can browse products, select them and see them visualized in their space. Part of this experience is search where users c ...
1 month ago
We are seeking an experienced Legal AI Evaluator to join our team at Mercor. The ideal candidate will have a strong background in law and experience working with large language models. · ...
2 months ago
About The Job · Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey. · Position: Legal AI Evaluator · Type: ...
6 days ago
We are building a serious AI product focused on transforming real-world business conversations into structured intelligence insights and automation. · AI pipelines that analyze recorded conversations speech text structured insights · LLM-based systems for summarization classifi ...
1 month ago
The developer will be working within a machine learning team/squad. The team is working on developing Artificial Intelligence solutions including ML and Gen AI. The candidate should be familiar with python development and prompt engineering. The candidate should be able to work w ...
1 day ago
We are seeking a highly skilled GenAI / AI Engineer to design build and deploy cutting-edge generative AI solutions that address real-world business challenges. · ...
1 month ago
This is a freelance opportunity to build an AI workflow for strategy reports. · The client has existing assessment logic and narrative frameworks in place. · ...
1 month ago
Write and refine prompts to guide model behavior in engineering scenarios. · Evaluate LLM-generated responses to engineering-related queries for technical accuracy and applied reasoning. · Conduct fact-checking and verify technical claims using authoritative public sources and do ...
1 month ago
Evaluate LLM-generated responses for effectiveness in answering user queries. Conduct fact-checking using trusted public sources and external tools. Generate high-quality human evaluation data by annotating response strengths areas for improvement and factual inaccuracies. · Eval ...
1 month ago
About The Job · Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey. · Position: AI Model Evaluator · Type: ...
2 weeks ago
+Mercor connects elite creative and technical talent with leading AI research labs. · +Bachelor's degree · Native speaker or ILR 5/primary fluency (C2 on the CEFR scale) in French · +,valid_job:1} ...
1 month ago
About The Job · Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey. · Position: AI Model Evaluator · Type: ...
2 weeks ago
Evaluate LLM-generated responses on their ability to effectively answer user queries. Conduct fact-checking using trusted public sources and external tools. · ...
1 month ago
Mercor connects elite creative and technical talent with leading AI research labs. · Bachelor's degree · Significant experience using large language models (LLMs) · ...
1 month ago