Senior SRE: AI/ML HPC Infrastructure - Toronto - Boson AI

    Boson AI
    Boson AI Toronto

    11 hours ago

    Full time
    Description

    A leading technology company in Toronto is seeking a Senior Site Reliability Engineer to manage and optimize their high-performance computing (HPC) cluster.

    The ideal candidate will have over 5 years of experience in SRE or HPC operations, proficiency in Linux, and expertise in Kubernetes and automation.

    This role involves deploying infrastructure-as-code solutions and supporting research teams. A competitive salary ranging from $150,000 to $250,000 per year is offered along with opportunities for continuous learning.
    #J-18808-Ljbffr

  • Only for registered members Toronto, ON

    We are seeking an experienced DevOps Engineer to join our team and champion the evolution of our HPC infrastructure. · This role is pivotal in transforming our configuration management into a robust, scalable GitOps architecture. ...

  • Only for registered members Toronto, Ontario Remote job

    We are seeking a Senior GCP DevOps – HPC Engineer to support a large-scale Pharmaceuticals / Life Sciences initiative. · Lead the migration of on-premises SLURM-based HPC clusters to Google Cloud Platform . · Design, implement, and manage scalable and secure HPC infrastructure on ...

  • Only for registered members Toronto

    We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around. · We'll be hands-on with the full lifecycle of HPC infrastructure: planning, building, testing, deploying, · and keeping everything running smoothly. · You'll also he ...

  • Only for registered members Toronto, Ontario

    GEMINI Systems is at the forefront of medical research and innovation. We are seeking an experienced DevOps Engineer to join our team and champion the evolution of our HPC infrastructure. · We operate a 100% Linux environment and are deeply committed to automating our infrastruct ...

  • Only for registered members Toronto

    Were looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs over 20PB of Ceph storage terabit networking and hundreds of servers. · Manage and optimize HPC cluster ope ...

  • Only for registered members Toronto, Ontario

    We are shaping the future of compute-intensive engineering, science, and AI. Our mission is to make HPC more accessible, efficient, and intelligent for users and administrators across the world's leading industries. We are hiring a Product Manager to lead our intuitive portal for ...

  • Only for registered members Toronto, Ontario

    We're hiring a Senior Sales Executive with deep experience selling servers, GPU systems, cloud/compute infrastructure, or data-center hardware into complex accounts. · ...

  • Only for registered members Toronto, Ontario

    +Sell into one of the fastest-growing infrastructure markets in the world; GPU compute. Work directly with founders and engineering. Influence GTM and hardware roadmap. · ...

  • Product Manager

    1 month ago

    Only for registered members Toronto, Ontario

    We are seeking a Product Manager to evolve HPCWorks as next-generation workloads integrate AI and quantum computing.HPCWorks enables the world's most demanding compute workloads across industries including semiconductor design, automotive, aerospace, life sciences, · and research ...

  • Only for registered members Toronto, Ontario

    This position involves building and operating mission-critical platform infrastructure using DevOps practices. You'll design, scale and automate platforms to improve productivity. · ...

  • Only for registered members Toronto

    We're seeking an experienced Network Engineer to design build and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that ...

  • Only for registered members Toronto

    We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics th ...

  • Freelance AI

    1 day ago

    Only for registered members Toronto, Ontario Remote job

    +We are seeking a technical content writer skilled in AI hardware computing infrastructure who can create content translating complex topics into clear narratives for both technical business audiences supporting enterprise use cases across finance private equity hedge funds healt ...

  • Only for registered members Toronto, Ontario

    We're seeking an experienced Network Engineer to design, build and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. · We'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics th ...

  • Only for registered members Toronto, Ontario

    This is one of the best opportunities for a passionate Linux infrastructure enthusiast out there working on the newest and best tech around with a chance to make your mark on a growing organsisation · ...

  • Only for registered members Toronto, Ontario

    We are seeking a Site Reliability Engineer (SRE) with experience spanning cloud and data center environments to drive infrastructure reliability, observability, and scalability. · 3-5 years in a Site Reliability Engineering (SRE) or DevOps role. · Strong software development back ...

  • Only for registered members Toronto, ON

    The Faculty of Arts & Science is the heart of Canada's leading university and one of the most comprehensive and diverse academic divisions in the world. The strength of Arts & Science derives from our combined teaching and research excellence in the humanities, sciences and socia ...

  • Only for registered members Toronto, Ontario

    We're looking for a senior engineer to help build, maintain and evolve the training framework that powers our frontier-scale language models. This role sits at the intersection of large-scale training, distributed systems, and HPC infrastructure.Build and own the training framewo ...

  • Only for registered members Toronto, Ontario

    We are seeking a strategic and driven AI Solutions Specialist to join our national sales team. · ...

  • Only for registered members Toronto, Ontario

    NVIDIA is seeking hardworking and motivated Senior Verification Engineer for Tegra SoC Memory Subsystem IP verification Team. · Develop verification infrastructure (testbenches, BFMs, checkers, monitors). · Craft and implement verification test plans. · ...

  • Only for registered members Toronto, Ontario

    We are hiring an Intern for our Software team at Lightmatter We are a photonic computer company redefining what computers and human beings are capable of by building the engines that will power discoveries and drive progress in a sustainable way. · ...

Jobs
>
Toronto