- Design and build evaluation frameworks that measure agent accuracy, faithfulness, safety, and voice fidelity
- Create automated and human‑in‑the‑loop pipelines for continuous agent quality assessment
- Define alignment criteria specific to expert‑backed AI agents, not generic LLM benchmarks
- Work with the agent building team to identify failure modes and build systematic defenses against them
- Own the metrics that tell us whether an agent is ready to ship
- Deep experience with LLM evaluation: designing benchmarks, building scoring pipelines, analyzing failure modes
- Strong Python skills and familiarity with ML tooling (model APIs, embedding systems, vector stores, evaluation frameworks)
- Understanding of alignment techniques: RLHF, constitutional AI, red teaming, adversarial evaluation
- Experience working with retrieval‑augmented generation systems and evaluating grounded outputs
- Comfort reading research papers and translating ideas into practical systems
- Obsessed with correctness. You lose sleep over an agent giving subtly wrong advice
- A systems thinker. You build frameworks, not one‑off tests
- Skeptical of benchmarks that do not measure what matters. You ask "what are we actually testing?"
- Self‑directed. You identify the gaps in quality before anyone tells you to look
- You want to publish papers, not ship product
- Your evaluation experience is limited to running standard benchmarks on public models
- You need a large research team to be productive
- You are not comfortable making judgment calls about quality in ambiguous domains
- You want a remote job
- Every agent ships with a clear evaluation report and a quantified confidence level
- Failure modes are caught in evaluation, not by users
- The eval framework scales to new experts without starting from scratch each time
- You can explain to a non‑technical team member exactly why an agent passed or failed review
- The quality of our agents is measurably better than anything built with prompt engineering alone
- Describe an evaluation system you built for an LLM‑based product. What did you measure, and what did the metrics miss?
- How would you evaluate whether an AI agent faithfully represents a specific human expert's views, not just general domain knowledge?
- What is the most common way you have seen teams fool themselves into thinking their AI is working well when it is not?
- Work in company Remote job
Member of Technical Staff, Pretraining evaluations
Only for registered members
+We are training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, · +You will play a key role in helping us make modelling decisions based on experimental outcomes for our large language ...
Montreal, Quebec2 weeks ago
- Work in company Remote job
Member of Technical Staff, Data Analysis and Evaluation
Only for registered members
We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation semantic search RAG agents. · ...
Montreal, Quebec2 weeks ago
-
· Assistant Gerant (40 heures par semaine) · Salaire: $47,250 · En tant qu'Assistant Gerant, vous intégrez les pratiques de leadership de Lush · dans toutes les facettes de vos interactions avec la clientèle, le reste du personnel et les · opérations : soyez authentique, faites ...
Montreal, Quebec, Canada $55,000 - $90,000 (CAD) per year4 days ago
-
· Assistant Gerant (40 heures par semaine) · En tant que gestionnaire en formation (MIT), vous intégrez les pratiques de leadership de Lush · dans toutes les facettes de vos interactions avec la clientèle, le reste du personnel et les · opérations : soyez authentique, faites pre ...
Montreal, Quebec, Canada $55,000 - $90,000 (CAD) per year1 week ago
-
· Assistant Gerant (40 heures par semaine) · En tant que gestionnaire en formation (MIT), vous intégrez les pratiques de leadership de Lush · dans toutes les facettes de vos interactions avec la clientèle, le reste du personnel et les · opérations : soyez authentique, faites pre ...
Montreal, Quebec, Canada $48,000 (CAD) per year1 week ago
-
L'assistant manager est responsable de soutenir la gestionnaire dans toutes les facettes de la boutique et d'assurer une expérience client unique et inclusive. · ...
Montreal, Quebec, Canada1 month ago
-
L'Assistant Manager soutient la gestionnaire dans toutes les facettes de ses interactions avec la clientèle, le personnel et les opérations : être authentique, faire preuve de curiosité, diriger avec assurance, s'adapter et évoluer. · ...
Montréal, QC1 month ago
-
En calidad de asistente gerente, se integran las prácticas de liderazgo de Lush en todas las facetas de la interacción con clientes, el personal del establecimiento y las operaciones: Ser real, ser curioso, liderar con seguridad adaptarse y evolucionar. · ...
Montréal, QC1 month ago
- Work in company
Responsable régional·e des opérations et des équipes – Ouest-de-l'Île
Only for registered members
MIS renforce sa structure de leadership terrain afin d'assurer la cohérence, la responsabilisation et l'excellence opérationnelle dans ses régions. · ACTION1 · ACTION N° 1 · ACTIONNÉE DANS LE TEXTE DU JOB (voir ci-dessus) · ...
Greater Montreal Metropolitan Area3 weeks ago
-
Lush North America a créé des cosmétiques frais et faits à la main pendant les 20 dernières années - gardant les baignoires et les douches de nos clients un peu plus magiques grâce à ses boutiques de vente en ligne au Canada et aux États-Unis. Nous sommes dédiés aux pratiques de ...
Montreal, Quebec, Canada1 month ago
-
Un rôle stratégique au cœur de la logistique Nous recherchons un.e Développeur(euse) logiciel Staff hautement motivé pour rejoindre notre équipe. · ...
Montreal, Quebec1 month ago
-
Responsable des activités du camp junior ILSC Montréal-McGill. · ...
Montreal, Quebec2 weeks ago
-
Nous recherchons un(e) Responsable de boutique pour assurer la gestion complète des opérations quotidiennes d'un point de service situé à Montréal. La personne en poste jouera un rôle clé dans la satisfaction de la clientèle, la performance opérationnelle et la gestion d'équipe. ...
Montreal, Quebec1 month ago
- Work in company
Gestionnaire - équipe administrative et juridique | Manager - Legal and Administrative Team 26-0104P
Only for registered members
Situé au cœur du centre-ville, cet employeur de choix en croissance constante recherche un gestionnaire d'équipes de soutien administratif et juridique. · Saisissez l'occasion de rejoindre une organisation agile et dynamique et de jouer un rôle clé dans la fluidité de ses opérati ...
Montreal, Quebec1 month ago
-
L'Associé(e) principal(e) de recherche clinique participe à la préparation et à l'exécution des essais cliniques de phase I à IV. Il/elle supervise l'avancement des investigations cliniques en effectuant des visites d'évaluation intermédiaire, initiale et finale. · ...
Montréal, QC1 month ago
-
Education: College, CEGEP or other non-university certificate or diploma from a program of 3 months to less than 1 year · Experience: 7 months to less than 1 year · Tasks · Plan and organize daily operations · Plan, develop, implement and evaluate human resources policies and pro ...
Montreal, Quebec $48,000 - $78,000 (CAD) per year1 week ago
-
Description · Nous recrutons actuellement un Expert en sinistres bilingue - en réclamations automobile niveau 2 pour rejoindre notre équipe au Québec. L'emplacement du poste est flexible, et nous pouvons offrir des arrangements de travail en bureau, hybrides ou entièrement à dist ...
Montreal, QC, Canada5 hours ago
-
Arrange and co-ordinate seminars conferences etc Assist with staff consultation and grievance procedures Direct and control daily operations Evaluate daily operations Motivate staff Determine and establish office procedures and routines Respond to employee questions and complaint ...
Montreal, Quebec1 month ago
-
Nous recrutons actuellement un Expert en sinistres bilingue - en réclamations automobile niveau 2 pour rejoindre notre équipe au Québec. L'emplacement du poste est flexible et nous pouvons offrir des arrangements de travail en bureau hybrides ou entièrement à distance. · L'expert ...
Montreal, Quebec1 month ago
-
Nous recrutons actuellement un expert en sinistres bilingue pour rejoindre notre équipe au Québec. L'emplacement du poste est flexible et nous pouvons offrir des arrangements de travail en bureau, hybrides ou entièrement à distance. · ...
Montreal, Quebec3 weeks ago
-
Gestionnaire de Cuisine · Services alimentaires · Poste permanent en tournée · Nous sommes nés d'une troupe éclectique. Viens vivre l'expérience d'être toi-même au quotidien pour créer l'extraordinaire. · NOTRE MISSION · Depuis 1984, Le Groupe Cirque du Soleil mise sur un travail ...
Montreal, Quebec $38,000 - $62,000 (CAD) per year5 days ago
Member of Technical Staff, Evaluation - Montreal - Onixai
Description
You will own the quality bar for every AI agent Onix ships.
Onix is building Personal Intelligence: AI that belongs to you, protects your data, and helps you grow with guidance from real experts. We work with world‑class physicians, researchers, and practitioners. Their knowledge gets turned into AI agents that users trust with real decisions about their health and performance. If an agent hallucinates, gives outdated advice, or drifts from what the expert actually believes, that is a serious problem. Your job is to make sure it does not happen.
You will build the evaluation infrastructure that catches failures before users do. You will define what "aligned" means for an agent that represents a specific human expert, not a generic chatbot. This is not traditional ML safety work. This is applied alignment in a domain where accuracy has real consequences and the ground truth is a living, breathing expert with opinions.
The hard part is not writing evals. The hard part is knowing what to eval for. You need to understand the difference between a confident wrong answer and a nuanced right one. Between an expert's actual position and a plausible‑sounding summary. Between safe and useful.
What You Will Do
Who You Are
You have spent real time thinking about how to measure whether an LLM is actually doing what it should. Not in the abstract, not as a research topic you follow on Twitter, but as something you have built systems around. You understand that evaluation is the hardest part of shipping reliable AI, and you are frustrated by how many teams treat it as an afterthought.
You are technical enough to build eval pipelines in code and conceptual enough to define what "good" looks like when the answer is not in a test set. You have opinions about where RLHF falls short, why automated evals need human calibration, and how to measure things like tone and nuance that do not fit neatly into a metric.
You have:
You are:
This Role is NOT For You If
What Success Looks Like
How to Apply
Submit your application and answer:
#J-18808-Ljbffr
-
Member of Technical Staff, Pretraining evaluations
Only for registered members Montreal, Quebec
-
Member of Technical Staff, Data Analysis and Evaluation
Only for registered members Montreal, Quebec
-
Assistant.e Gerant.e
Only for registered members Montreal, Quebec, Canada
-
Assistant Gerant
Only for registered members Montreal, Quebec, Canada
-
Assistant Gerant
Only for registered members Montreal, Quebec, Canada
-
Assistant Gerant
Only for registered members Montreal, Quebec, Canada
-
Assistant Gerant
Only for registered members Montréal, QC
-
Assistant Gerant
Only for registered members Montréal, QC
-
Responsable régional·e des opérations et des équipes – Ouest-de-l'Île
Only for registered members Greater Montreal Metropolitan Area
-
Assistant Gerant
Only for registered members Montreal, Quebec, Canada
-
Développeur(se) logiciel Staff
Only for registered members Montreal, Quebec
-
Responsable d'activité
Only for registered members Montreal, Quebec
-
Responsable boutique
Only for registered members Montreal, Quebec
-
Gestionnaire - équipe administrative et juridique | Manager - Legal and Administrative Team 26-0104P
Only for registered members Montreal, Quebec
-
Associé de recherche clinique, – francophone
Only for registered members Montréal, QC
-
Human resources coordinator
Only for registered members Montreal, Quebec
-
Bilingual Auto Desk Adjuster 2
Only for registered members Montreal, QC, Canada
-
administrative assistant
Only for registered members Montreal, Quebec
-
Bilingual Auto Desk Adjuster 2
Only for registered members Montreal, Quebec
-
Bilingual Auto Desk Adjuster 2
Only for registered members Montreal, Quebec
-
Gestionnaire de Cuisine
Only for registered members Montreal, Quebec
