Senior Site Reliability Engineer - Toronto - Tubi

    Tubi
    Tubi Toronto

    4 days ago

    $115,000 - $170,000 (CAD) per year *
    Description

    About The Role

    Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems. Our mission is to engineer resilience from the ground up, enabling our product teams to innovate rapidly while ensuring our users have a stellar experience. We own the availability, latency, performance, and capacity of our platform, and we achieve our goals through a culture of data-driven decision-making, blameless learning, and relentless automation.

    What You'll Do

    • System Architecture & Design: Design, build, and maintain scalable, highly available, and fault-tolerant distributed systems. Partner with development teams as a reliability consultant, reviewing designs and influencing architectural decisions to ensure new services are built with reliability, observability, and performance as core principles, not afterthoughts.
    • Automation & Software Development: Write robust, performant, and maintainable code to automate operational tasks, and CI/CD pipelines. Build the internal tools, libraries, and frameworks that enable engineering teams to self-service their observability needs, reducing cognitive load and increasing their velocity.
    • Incident Response & Post-Mortem Analysis: Participate in a 24/7 on‑call rotation, acting as a key technical leader and incident commander during critical service disruptions. Conduct deep, blameless root cause analyses (RCAs) that go beyond immediate fixes to identify and address systemic issues. Drive the implementation of corrective actions to prevent the recurrence of incidents.
    • Performance & Capacity Planning: Proactively monitor, measure, and optimize system performance to ensure low latency and high efficiency. Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding. Analyze usage patterns and historical data to forecast capacity needs, ensuring our platform stays ahead of customer demand.

    Your Background

    • Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.
    • 5+ years of professional experience in a Site Reliability Engineering, DevOps, or Software Engineering role with a focus on infrastructure and operations.
    • Strong programming proficiency in one or more high-level languages such as Rust, Go, Python, or Typescript. You should be comfortable writing, testing, and deploying production-grade code.
    • Deep knowledge of AWS services (especially networking, IAM, EKS, ALBs/NLBs, Route 53, CloudWatch).
    • Proven experience with Kubernetes in production (EKS preferred), including service exposure, networking, and availability engineering.
    • A solid understanding of Linux/Unix operating systems, networking fundamentals (TCP/IP, DNS, HTTP), and the architecture of modern distributed systems.

    Preferred Qualifications (Nice‑to‑Haves)

    • Experience building and managing large‑scale monitoring and observability systems using tools like Datadog, Prometheus, Grafana, etc.
    • Expertise in designing and implementing CI/CD pipelines using tools such as GitHub Actions, ArgoCD, etc.
    • Experience with distributed storage technologies (e.g., Amazon S3) and databases (e.g., PostgreSQL, ScyllaDB, ClickHouse, etc.).
    • Contributions to open‑source projects in the SRE, DevOps, or cloud‑native ecosystem.

    The AI Mandate: Building the Future of Observability with AI

    Responsibilities

    As a Senior SRE, you will be at the forefront of applying AI to solve our most critical reliability challenges. This is a hands‑on software development role where the "product" you build is an intelligent, automated reliability platform. Your responsibilities will include:

    • Building AI‑Driven Automation: Building and integrating solutions that leverage our AIOps platform. This involves writing the code that consumes signals from the AI system, correlates disparate data sources, automates responses to AI‑detected anomalies, and builds self‑healing systems triggered by predictive alerts. You will transform AI insights into concrete reliability improvements.
    • Leveraging AI for Code Development: Utilizing AI‑assisted coding tools (e.g., Claude Code, Cursor) as a force multiplier in your daily workflow. You will leverage these assistants to write high‑quality automation scripts, Terraform modules, Kubernetes manifests, and observability dashboards faster and more efficiently, while applying your expertise to validate and refine their output.
    • Enriching our AI Knowledge Base: Developing and enriching our observability platform's internal knowledge base. You will be responsible for creating and documenting high‑quality runbooks and procedural guides that can be ingested and used by AI assistants to provide context‑aware troubleshooting guidance to the on‑call engineer during an incident.
    • Applying Data Science to Reliability: Treating reliability as a data science problem. You will analyze vast sets of telemetry data to identify trends, build predictive models for system capacity, and proactively identify performance bottlenecks and potential failure modes before they can impact our users.

    Pursuant to local pay disclosure requirements, the pay range for this role, with final offer amount dependent on education, skills, experience, and location is as listed annually below.

    This role is also eligible for an annual discretionary bonus, long‑term incentive plan, and various benefits including medical/dental/vision, insurance, vacation/paid time off and other benefits in accordance with applicable plan documents.

    Toronto, Canada

    $137,200 - $196,000 CAD

    Benefits

    Tubi Media Group is a division of Fox Corporation, and the FOX Employee Benefits summarized here, covers the majority of employee benefits. The following distinctions below outline the differences between the Tubi and FOX benefits:

    For all salaried employees, in lieu of the FOX Vacation policy, Tubi offers a Flexible Time Off Policy to manage all personal matters.

    For all full‑time, regular employees, in lieu of FOX Paid Parental Leave, Tubi offers a generous Parental Leave Program, which allows parents twelve (12) weeks of paid bonding leave (top up in Canada) within the first year of birth, adoption, surrogacy, or foster placement of a child in addition to applicable government leave program(s) and FOX's short‑term disability policy (if applicable). This time is 100% paid through a combination of any applicable government leaves and wage‑replacement programs in addition to contributions made by Tubi.

    For all full‑time, regular employees, Tubi offers a monthly wellness reimbursement.

    About Tubi

    Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users. Tubi offers the world's largest collection of Hollywood movies and TV shows, thousands of creator‑led stories and hundreds of Tubi Originals made for the most passionate fans. Headquartered in San Francisco and founded in 2014, Tubi is part of Tubi Media Group, a division of Fox Corporation.

    We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, gender identity, disability, protected veteran status, or any other characteristic protected by law. We will consider for employment qualified applicants with criminal histories consistent with applicable law.


    #J-18808-Ljbffr
    * This salary range is an estimation made by beBee
  • Work in company

    Site Reliability Engineer

    Only for registered members

    WHAT MAKES US, US · Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn and pursue outcomes with our prestigious financial c ...

    Toronto, ON $90,000 - $145,000 (CAD) per year

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Tata Consultancy Services (TCS) is an equal opportunity employer and embraces diversity in race, nationality, ethnicity, gender, age and sexual orientation to create a workforce that reflects the societies they operate in. · Good years of relevant application production support e ...

    Toronto, ON

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The DevOps and Automation team is looking for a Site Reliability Engineer with strong expertise in Dynatrace to ensure the reliability, performance and observability of large scale, distributed systems. · ...

    Toronto, Ontario

    1 month ago

  • Work in company Remote job

    Site Reliability Engineer

    Only for registered members

    A long-standing privacy-focused technology company operating large-scale global infrastructure is hiring a Site Reliability Engineer to help strengthen and scale its VPN and DNS platforms. · ...

    Toronto, Ontario

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a skilled Site Reliability Engineer (SRE) to enhance the reliability scalability performance our systems applications The ideal candidate will have strong experience in automation cloud platforms observability incident management DevOps practices This role involves ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The TeamGlobal Banking and Markets Engineering (GBME) is the fast-moving, award-winning technology engine that powers Scotiabank's Corporate, · In this exciting role, you'll apply your analytical skills to design and develop applications that deliver excellence,effectiveness,and ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    As a Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc., where we design, create, deploy, maintain and grow industry-leading Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services. As a SRE/DevOps team member, you ...

    Toronto, Ontario $90,000 - $145,000 (CAD) per year

    1 day ago

  • Work in company

    Hiring__Site Reliability Engineer

    Only for registered members

    Site Reliability Engineer (SRE) – Observability role in Banking Domain. · 10+ Years of exp and 5+ years of experience in Observability, or SRE · Working knowledge of metrics, logs, and basic tracing concepts · ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Site Reliability Engineer will ensure the reliability and availability of software systems by designing resilient architectures, automating infrastructure management, and implementing effective incident response processes. · ...

    Toronto, Ontario

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Choose a workplace that empowers your impact.  · Join a global workplace where employees thrive. One that embraces diversity of thought, expertise and experience. A place where you can personalize your employee journey to be — and deliver — your best.   · We are a purpose-driven, ...

    Toronto $90,000 - $145,000 (CAD) per year

    4 hours ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Are you passionate about ensuring the reliability and performance of large-scale distributed systems? This role calls for a Site Reliability Engineer (Dynatrace Specialist), who will play a crucial part in maintaining and enhancing the observability, stability, and efficiency of ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    This is a position for a Site Reliability Engineer to ensure the reliability performance scalability of our systems. · Monitor and maintain the health performance availability of our systems services. · ...

    Toronto, Ontario

    2 months ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a Site Reliability Engineer to ensure the reliability availability and performance of enterprise platforms through strong observability monitoring and incident management practices. · ...

    Greater Toronto Area $90,000 - $145,000 (CAD) per year

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're seeking an experienced Site Reliability Engineer (SRE) to join our team, focusing on designing, implementing, and maintaining scalable CI/CD pipelines. The ideal candidate will have expert-level knowledge of Kubernetes, AWS, and CI/CD tools like Jenkins and Argo. Proficienc ...

    Toronto, Ontario $90,000 - $145,000 (CAD) per year

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a Site Reliability Engineer IV to support trade finance change initiatives and BAU operations within a large financial technology environment. · Strong understanding of system and system integration patterns (REST APIs, MQ, etc.) · Advanced technical troubleshootin ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The DevOps and Automation department is looking for a Site Reliability Engineer with strong expertise in Dynatrace to ensure the reliability, performance and observability of large scale, distributed systems. · Monitoring application flow (transactions) to check on anomalies and ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Site Reliability Engineer will ensure the reliability and availability of software systems by designing resilient architectures, automating infrastructure management, and implementing effective incident response processes. · ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    · At BuildOps, we're building a software platform that empowers today's commercial contractors. From service management to project execution, we're reimagining how our customers operate. Our team thrives on ambition, innovation, and collaboration – qualities we look for in every ...

    Toronto, Ontario, Canada $90,000 - $145,000 (CAD) per year

    4 hours ago

  • Work in company

    Site Reliability Engineer

    Astra North Infoteck Inc.

    Site Reliability Engineer - Dynatrace & Ansible · Required Skills & Experience (Mandatory) · 5–8 years of experience in SRE | DevOps | or Platform Engineering roles · Strong hands-on experience with Dynatrace for observability and monitoring · Strong hands-on experience with Ansi ...

    Toronto $90,000 - $145,000 (CAD) per year

    1 day ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a Site Reliability Engineer to ensure the reliability performance and scalability of our systems. · ...

    Toronto

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    · As a Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc., where we design, create, deploy, maintain and grow industry-leading Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services. As a SRE/DevOps team member, ...

    Toronto $90,000 - $145,000 (CAD) per year

    1 week ago

Jobs
>
Senior site reliability engineer
>
Jobs for Senior site reliability engineer in Toronto