AI SRE - Montréal, QC
1 day ago

Job description
Montréal, Quebec H1A 0A1 Posted February 21st, 2026
Looking for more job opportunities? Click here
Job Type: Full Time
Job Category: IT
Job Description
AI SRE / AI Ops engineer
Montreal, QC - Hybrid
Skills Required :
- Production experience in SRE / Infrastructure / ops for large-scale systems
- Strong programming/scripting skills (Python, Go, Java, or equivalent)
- Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
- Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
- Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
- Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
- Production experience in SRE / Infrastructure / ops for large-scale systems
- Strong programming/scripting skills (Python, Go, Java, or equivalent)
- Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
- Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
- Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
- Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
- Networking & systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage)
- Solid experience in capacity planning, performance tuning, scaling, and incident response
- Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements
- Experience in regulated environments (financial services, compliance, audit, security) is a strong plus
- Excellent communication, documentation, and cross-team collaboration skills
- Proven track record of reducing operational toil via automation
Required Skills
DEVOPS ENGINEER
SENIOR EMAIL SECURITY ENGINEER
Similar jobs
L'Ingénieur SRE Senior est le garant de la conception, de l'évolution et de la supervision des plateformes. · Ton rôle est structurant et s'articule autour de trois piliers majeurs : une maîtrise totale d'Observabilité. · ...
1 month ago
L'ingénieur SRE senior est le garant de la conception, de l'évolution et de la supervision des plateformes. Son rôle est structurant autour trois piliers majeurs : une maîtrise totale Observabilité , expertise Kubernetes (OKD) et solide compréhension infrastructures réseau Junipe ...
1 month ago
We are looking for an AI SRE / AI Ops engineer in Montreal, QC. · Production experience in SRE / Infrastructure / ops for large-scale systems · Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures · ...
2 days ago
We're looking for an AI SRE / AI Ops engineer to join our team in Montreal, QC. The ideal candidate will have experience in production environments, strong programming skills, and knowledge of containerization and orchestration tools. · This is a full-time position that requires ...
2 days ago
Mandatory skills include experience working on Google Cloud and using GCP Data stack with Terraform, SQL and Python. · Preferred skills include any experience with automation and leading or working in SRE/Ops teams. · ...
3 weeks ago
Job Title: Azure SRE · Location: Montreal, QC · Duration: 12-month contract · Scheduled Type: Onsite · Job Description: · Strong focus on Python – this is the top required skill · Must have solid understanding of Python fundamentals · Knowledge of data structures and object-orien ...
1 day ago
We are seeking a skilled Site Reliability Engineer (SRE) to enhance the reliability, scalability and performance of our systems and applications. · ...
1 month ago
L'Ingînieur SRE Senior est le garant de la conception, êtlisation et supervision des plateformes. Ton rôle est structurant... · ...
1 month ago
The ideal candidate will develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas and is proficient with various Object-Oriented development tools and techniques. · The position requires attention to detail, coup ...
1 week ago
We are seeking a skilled Site Reliability Engineer (SRE) to enhance the reliability, scalability, · and performance of our systems and applications. · Automation & Configuration Management · ...
1 month ago
The ideal candidate will develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas. · ...
1 week ago
The ideal candidate will develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas. · Hands-on development and design of Python applications. · Enhance and integrate the CSP automation framework with in-house tool ...
1 week ago
The ideal candidate will develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas and is proficient with various Object-Oriented development tools and techniques.The individual should be experienced with Python a ...
1 week ago
+MaintainX est la plateforme mondiale leader dans le domaine du management des actifs et intelligence du travail. Nous sommes un outil moderne d'IdO basé sur le réseau informatique en nuage pour la fiabilité, sécurité et opérations des équipements physiques et installations. · Vo ...
1 week ago
Le DevOps/SRE intervient au sein de l escouade CGI dédiée au projet Assurance Numérique de client. · Mettre en place, configurer et maintenir les pipelines CI/CD pour les API et services développés. · Supporter l automatisation des déploiements dans les environnements DEV, QA et ...
1 month ago
WorkJam solves common problems faced by global frontline enterprises through scheduling tools task management communication learning within a single app leading to recognition as 2024 World Future Award winner for Innovation in Workforce Management. We're proud of our dedicated t ...
1 week ago
Job summary: To foster agility and DevOps practices within the bank. · ...
1 month ago
Puesto: Site Reliability Engineer (SRE) · SITE RELIABILITY ENGINEER SRE est un développeur Java avec une majeur en DevOps (pipelines, monitoring-alerting-tracing as-code), expérience avec Github action et Argo CD · Fonctions et responsabilités :Assurer la disponibilité et la rés ...
1 month ago
Join our Azure Platform Squad as a Cloud SRE Specialist in Enterprise Computing to work on public cloud projects with opportunities to work on both Azure and AWS in a global financial organization. · Collaborate with vendors to develop and deploy Cloud services to meet customer e ...
1 month ago
SRE (Site Reliability Engineer) con experiencia en DevOps (pipelines, monitoring-alerting-tracing), Java y Github Action/Argo CD. · Disponibilidad y resiliencia de servicios: mécanismos de tolerancia a fallas, load balancing, redondancia. · ...
1 month ago