Site Reliability Engineer, AI/ML Infrastructure - Toronto
3 days ago

Job summary
We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs,- Manage and optimize HPC cluster operations
- Deploy and maintain infrastructure-as-code solutions
Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
AI/ML Infrastructure Engineer
1 month ago
The team is dedicated to developing a robust platform for training and serving machine learning models. This platform streamlines the productionization of AI and ML models by mitigating the incidental complexities involved in creating backend services for serving predictions and ...
AI/ML Infrastructure Engineer
1 month ago
The team is dedicated to developing a robust platform for training and serving machine learning models. · This platform streamlines the productionization of AI and ML models by mitigating the incidental complexities involved in creating backend services for serving predictions an ...
Network Engineer, AI/ML Infrastructure
1 month ago
We're seeking an experienced Network Engineer to design, build and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. · We'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics th ...
Network Engineer, AI/ML Infrastructure
3 days ago
We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. · Configure and maintain InfiniBand and high-speed Ethernet fabrics · Optimize network performance for RDMA, GPU-t ...
Network Engineer, AI/ML Infrastructure
1 month ago
We're seeking an experienced Network Engineer to design build and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that ...
Network Engineer, AI/ML Infrastructure
1 month ago
We're seeking an experienced Network Engineer to design, build and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. · Configure and maintain InfiniBand and high-speed Ethernet fabrics · Optimize network performance for RDMA, and GP ...
Network Engineer, AI/ML Infrastructure
3 days ago
We're seeking an experienced Network Engineer to design, build and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. · ...
Network Engineer, AI/ML Infrastructure
1 month ago
We're seeking an experienced Network Engineer to design, build and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto.You'll work at the cutting edge of network technology managing InfiniBand and ultra-high-speed Ethernet fabrics that ...
We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, · over 20PB of Ceph storage, terabit networking, and hundreds of servers. · Manage and optimize HPC clust ...
Site Reliability Engineer, AI/ML Infrastructure
4 weeks ago
We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around. · We'll be hands-on with the full lifecycle of HPC infrastructure: planning, building, testing, deploying, · and keeping everything running smoothly. · You'll also he ...
Site Reliability Engineer, AI/ML Infrastructure
2 weeks ago
We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs. · ...
AI/ML Architect
1 month ago
Iris's Fortune 100 direct client is looking for an AI/ML Architect to define AI/ML architecture roadmaps aligned with business objectives. · ...
AI/ML Cloud
4 weeks ago
We are seeking an experienced Cloud Engineer with expertise in designing and building robust infrastructure on AWS. · Working with team of Cloud engineers focused on designing and building robust infrastructure on AWS · Architect scalable low-latency systems, implement capacity p ...
AI/ML Architect
4 weeks ago
Iris's Fortune 100 direct client is looking AI/ML Architect in Toronto. · ...
AI/ML Engineer
1 month ago
We are seeking a highly skilled and motivated AI/ML Engineer with expertise in Machine Learning, Statistics, and Generative AI to join our team. · As an AI/ML Engineer you will be responsible for developing cutting-edge AI/ML solutions, · particularly in the field of Anomaly Dete ...
AI/ML Engineer
1 week ago
We are looking for an AI/ML Engineer with hands-on experience in Generative AI, intent classification, and prompt engineering. · ...
AI/ML Engineer
3 weeks ago
We are seeking a talented AI / Machine Learning Engineer to design build and deploy scalable machine learning solutions that drive real-world business impact · This is a long term contract opportunity with onsite work mode. ...
AI/ML Architect
1 month ago
Our Client one of the leading Bank is looking to hire for the following role . Please share resume if interested · ...
AI ML Architect
1 week ago
We are seeking an AI Architect to lead the design of next-generation enterprise AI systems centered on Agentic AI. The architect will drive standards for multi-agent orchestration tool calling governance and design patterns across business engineering teams. · Define enterprise A ...
AI/ML Engineer
4 weeks ago
+ Develop and implement AI solutions for investment risk analytics · + Extract, transform, and analyze data from SQL and large risk data sets · + Build and operationalize LLM-based and AI-assisted analytics solutions · We are seeking a AI/ML Developer to join a key project immedi ...