Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving)
Design and build automation for core platform capabilities, reducing manual toil
Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
Establish, monitor, and enforce SLOs/SLIs/SLAs, error budgets, alerting, and dashboards
Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting
Optimize cost vs. performance tradeoffs in large-scale compute environments
Harden systems for security, compliance, auditability, and data governance
Collaborate across teams (cloud engineers, data engineers, infrastructure, secu-rity) to ensure safe deployment, rollout, rollback, and integration of new systems
Define disaster recovery (DR) strategies, backup/restore practices, fault toler-ance mechanisms
Maintain runbooks, operational playbooks, documentation, and training materials
Participate in on-call rotations and respond to production incidents 24/7 as needed
Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability
Production experience in SRE / Infrastructure / ops for large-scale systems
Strong programming/scripting skills (Python, Go, Java, or equivalent)
Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
Networking & systems engineering knowledge (TCP/IP, DNS, routing, load bal-ancing, distributed storage)
Solid experience in capacity planning, performance tuning, scaling, and incident response
Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improve-ments
Experience in regulated environments (financial services, compliance, audit, se-curity) is a strong plus
Excellent communication, documentation, and cross-team collaboration skills
Proven track record of reducing operational toil via automation
-
About Highspring · Highspring is a modern consulting and professional services firm specializing in data, AI, engineering, and digital transformation. We partner with organizations to architect, build, and scale technology solutions that drive meaningful business outcomes. Our te ...
Montreal, Quebec6 days ago
-
We are · Synechron is a leading global digital transformation consulting firm focused on financial services and technology organizations. Our specialties include end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering. Our 13 FinLab ...
Montreal, Quebec1 day ago
-
English will follow · QUI NOUS SOMMES · Sid Lee est un collectif créatif multidisciplinaire qui cherche à faire une différence et à célébrer la culture avec ses idées audacieuses. Ses 700 virtuoses travaillent avec cœur comme une seule équipe à partir des bureaux de Montréal, Tor ...
Montreal, Quebec2 days ago
-
· About Alexa Translations · Alexa Translations provides translation services in the legal, financial, and securities sectors by leveraging proprietary A.I. technology and a team of highly specialized linguistic experts. Unmatched in speed and quality, our machine translation en ...
Montreal, QC, Canada $70,000 - $115,000 (CAD) per year4 days ago
-
We are looking for a Senior Generative AI Engineer to develop our next-generation intelligent translation and translation-related service engine, · Research and implement state-of-the-art LLM techniques including continued pre-training, supervised fine-tuning... · ...
Montréal, QC1 month ago
-
We are looking for a Senior Generative AI Engineer to develop our next-generation intelligent translation and translation-related service engine, using Generative AI (GenAI) and Large Language Model (LLM) technologies. · ResponsibilitiesResearch and implement state-of-the-art LLM ...
Montreal1 month ago
-
We are looking for a Senior Generative AI Engineer to develop our next-generation intelligent translation and translation-related service engine using Generative AI GenAI and Large Language Model LLM technologies. · Research and implement state-of-the-art LLM techniques including ...
Montreal2 weeks ago
-
· About Alexa Translations · Alexa Translations provides translation services in the legal, financial, and securities sectors by leveraging proprietary A.I. technology and a team of highly specialized linguistic experts. Unmatched in speed and quality, our machine translation en ...
Montreal, QC, Canada1 week ago
-
Ce spécialiste sera responsable des modèles et du déploiements de solutions IA. · ...
Montreal, Quebec1 month ago
-
We are seeking a Hands-On GenAI Lead to serve as the technical authority for the architecture, design, delivery, and evolution of enterprise-scale generative AI platforms and solutions. · This role represents the highest individual contributor level-combining deep hands-on engine ...
Montreal6 days ago
-
Nous recherchons un ingénieur logiciel principal exceptionnellement compétent et visionnaire…. · ...
Montreal, Quebec2 weeks ago
-
Job Requisition ID # · 25WD94061 · 25WD94061, Software Architect, Applied AI · French translation to follow/Traduction française à suivre · Position Overview · If you love building real systems that real customers use—and you get genuinely excited about LLMs, RAG, MCP, and agenti ...
Montreal, Quebec1 week ago
-
Faites Carrière Avec Nous · *Note · : · À cet instant, Ericsson Canada Inc. ne fournit pas d'aide ou de parrainage en matière d'immigration pour ce poste. · À propos de cette opportunité · Nous renforçons nos capacités en architecture d'infrastructure pour soutenir la transformat ...
Montreal, Quebec4 days ago
-
About Highspring · Highspring is a modern consulting and professional services firm specializing in data, AI, engineering, and digital transformation. We partner with organizations to architect, build, and scale technology solutions that drive meaningful business outcomes. Our te ...
Montreal6 days ago
-
We are seeking a highly skilled GenAI Lead with deep expertise in AI architecture, solution design, and hands-on development. This role is ideal for someone who thrives on solving complex problems, driving innovation, and leading end-to-end implementation of Generative AI solutio ...
Montreal, Quebec2 weeks ago
-
We are · Synechron is a leading global digital transformation consulting firm focused on financial services and technology organizations. Our specialties include end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering. Our 13 FinLab ...
Montreal $140,000 - $145,000 (CAD)1 day ago
-
English will follow · QUI NOUS SOMMES · Sid Lee est un collectif créatif multidisciplinaire qui cherche à faire une différence et à célébrer la culture avec ses idées audacieuses. Ses 700 virtuoses travaillent avec cœur comme une seule équipe à partir des bureaux de Montréal, Tor ...
Montreal2 days ago
-
Plusgrade est à la recherche d'un Staff Data Developer pour agir en tant que leader technique senior au sein de notre équipe Ingénierie des données ML. · ...
Montreal, Quebec1 month ago
-
· HIRING – Java Developers / Full Stack Engineers / AI Engineers · Montreal, QC (Hybrid) · Financial Services / Investment Banking Client · Multiple Positions Open · We're expanding our engineering team and looking for · experienced developers who enjoy building scalable enterpr ...
Montreal, Quebec $70,000 - $115,000 (CAD) per year3 hours ago
-
About Us · Xsolla is a global commerce company with robust tools and services to help developers solve the inherent challenges of the video game industry. From indie to AAA, companies partner with Xsolla to help them fund, distribute, market, and monetize their games. Grounded in ...
Montreal, Quebec $80,000 - $120,000 (USD) per year2 days ago
Site Reliability Engineer – GenAI Platform - MONTREAL & MIRABEL - Astra North Infoteck Inc.
Description
Experience: 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineer-ing knowledge.
Roles and Responsibilities:
Skills:
-
GenAI Lead
Only for registered members Montreal, Quebec
-
GenAI Architect
Only for registered members Montreal, Quebec
-
Technologue Créatif·ve
Only for registered members Montreal, Quebec
-
Generative AI Engineer
Only for registered members Montreal, QC, Canada
-
Senior Generative AI Engineer
Only for registered members Montréal, QC
-
Senior Generative AI Engineer
Only for registered members Montreal
-
Generative AI Engineer
Only for registered members Montreal
-
Senior Generative AI Engineer
Only for registered members Montreal, QC, Canada
-
Spécialiste en Déploiement de Solutions IA
Only for registered members Montreal, Quebec
-
GenAI Lead
Only for registered members Montreal
-
Principal Software Developer
Only for registered members Montreal, Quebec
-
Software Architect, Applied AI
Only for registered members Montreal, Quebec
-
Architecte TI
Only for registered members Montreal, Quebec
-
GenAI Lead
Only for registered members Montreal
-
Lead Gen AI
Only for registered members Montreal, Quebec
-
GenAI Architect
Only for registered members Montreal
-
Technologue Créatif·ve
Only for registered members Montreal
-
Dé de Données- Staff
Only for registered members Montreal, Quebec
-
Java Developer
Only for registered members Montreal, Quebec
-
Full Stack AI Engineer
Only for registered members Montreal, Quebec