Lead Site Reliability Engineer - Toronto - Movable Ink

    Movable Ink
    Movable Ink Toronto

    18 hours ago

    Description

    Movable Ink scales content personalization for marketers through data-activated content generation and AI decisioning. The world's most innovative brands rely on Movable Ink to maximize revenue, simplify workflow and boost marketing agility. Headquartered in New York City with close to 600 employees, Movable Ink serves its global client base with operations throughout North America, Central America, Europe, Australia, and Japan.

    As one of our Lead Site Reliability Engineers, you will combine hands‑on technical expertise with strategic technical leadership across infrastructure and software development. You will own the design and evolution of major systems within our multi‑cloud, multi‑region, active‑active content serving platform that serves upwards of 25 Billion requests daily. Through a combination of architectural vision, cross‑team collaboration and mentorship, you will help drive the reliability initiatives and define the technical strategy that scales our platform to 50 Billion requests per day and beyond.

    Responsibilities

    • Define and drive the automation strategy for infrastructure tooling, establishing standards that minimize manual work, increase performance and reduce incident frequency and severity of incidents
    • Own the design, reliability and evolution of core platform applications, mentoring team members on best practices and ensuring systems meet long‑term business objectives
    • Architect and lead the logging platform strategy, driving its design and balancing availability, retention and cost optimization
    • Establish capacity planning and performance management frameworks, proactively identifying scaling opportunities and guiding teams through complex troubleshooting scenarios
    • Lead cross‑functional reliability initiatives with SRE and service engineering teams, influencing architectural decisions and championing practices that ensure resilient service delivery
    • Demonstrate a high level of autonomy in anticipating, identifying, and addressing systemic weaknesses and opportunities for platform improvement without direct supervision.

    Qualifications

    • Proven track record in Site Reliability or Software Engineering, designing, building, and owning scalable, resilient services with a focus on long‑term reliability strategy
    • Deep expertise in architecting and operating complex distributed systems such as Apache Pulsar, Apache Kafka, Grafana Loki, ScyllaDB/Cassandra, with the ability to guide teams through distributed system challenges
    • Designing and owning automation strategies to manage services at scale, with expertise in establishing performance analysis frameworks and mentoring others on diagnostics and resolution
    • Deep, hands‑on experience (6+ years) in Site Reliability or Software Engineering, specifically leading and shaping multi‑cloud architecture and strategy (AWS and GCP).
    • Experience architecting and leading large‑scale observability platforms, including defining observability standards and SLO frameworks. We use Prometheus and Thanos with Grafana Alloy, Loki and Tempo
    • Experience leading on‑call excellence, including driving improvements to monitoring and alerting strategies, automating runbooks and mentoring team members on incident response best practices. Every member of the SRE team does a week long on‑call rotation
    • Expert‑level proficiency with infrastructure as code, including defining IaC standards and patterns across teams. We use Terraform and Chef
    • Advanced Kubernetes expertise, including cluster architecture design, multi‑tenancy strategies, and guiding teams on container orchestration best practices. We use EKS and GKE
    • Proficiency in multiple programming languages with the ability to design and review code that meets reliability standards. We use NodeJS, Golang, Ruby, Python and shell scripting
    • Advanced Linux systems expertise, with the ability to diagnose complex system‑level issues and mentor others on performance tuning and troubleshooting

    Studies have shown that women, communities of color, and historically underrepresented people are less likely to apply to jobs unless they meet every single qualification. We are committed to building a diverse and inclusive culture where all Inkers can thrive. If you're excited about the role but don't meet all of the abovementioned qualifications, we encourage you to apply. Our differences bring a breadth of knowledge and perspectives that makes us collectively stronger.

    We welcome and employ people regardless of race, color, gender identity or expression, religion, genetic information, parental or pregnancy status, national origin, sexual orientation, age, citizenship, marital status, ethnicity, family or marital status, physical and mental ability, political affiliation, disability, Veteran status, or other protected characteristics. We are proud to be an equal opportunity employer.


    #J-18808-Ljbffr

  • Work in company

    Site Reliability Engineer

    Only for registered members

    WHAT MAKES US, US · Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn and pursue outcomes with our prestigious financial c ...

    Toronto, ON $90,000 - $145,000 (CAD) per year

    3 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Tata Consultancy Services (TCS) is an equal opportunity employer and embraces diversity in race, nationality, ethnicity, gender, age and sexual orientation to create a workforce that reflects the societies they operate in. · Good years of relevant application production support e ...

    Toronto, ON

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The DevOps and Automation department is looking for a Site Reliability Engineer with strong expertise in Dynatrace to ensure the reliability, performance and observability of large scale, distributed systems. · Monitoring application flow (transactions) to check on anomalies and ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    · At BuildOps, we're building a software platform that empowers today's commercial contractors. From service management to project execution, we're reimagining how our customers operate. Our team thrives on ambition, innovation, and collaboration – qualities we look for in every ...

    Toronto, Ontario, Canada $90,000 - $145,000 (CAD) per year

    4 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a Site Reliability Engineer IV to support trade finance change initiatives and BAU operations within a large financial technology environment. · Strong understanding of system and system integration patterns (REST APIs, MQ, etc.) · Advanced technical troubleshootin ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    · SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams -- SRE teams are anchored to iManage offices across the globe. Tuesdays and Fridays are dedicated to in-o ...

    Toronto, ON, Canada $90,000 - $145,000 (CAD) per year

    4 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The DevOps and Automation team is looking for a Site Reliability Engineer with strong expertise in Dynatrace to ensure the reliability, performance and observability of large scale, distributed systems. · ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a Site Reliability Engineer to ensure the reliability availability and performance of enterprise platforms through strong observability monitoring and incident management practices. · ...

    Greater Toronto Area $90,000 - $145,000 (CAD) per year

    1 week ago

  • Work in company Remote job

    Site Reliability Engineer

    Only for registered members

    A long-standing privacy-focused technology company operating large-scale global infrastructure is hiring a Site Reliability Engineer to help strengthen and scale its VPN and DNS platforms. · ...

    Toronto, Ontario

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a skilled Site Reliability Engineer (SRE) to enhance the reliability scalability performance our systems applications The ideal candidate will have strong experience in automation cloud platforms observability incident management DevOps practices This role involves ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The TeamGlobal Banking and Markets Engineering (GBME) is the fast-moving, award-winning technology engine that powers Scotiabank's Corporate, · In this exciting role, you'll apply your analytical skills to design and develop applications that deliver excellence,effectiveness,and ...

    Toronto, Ontario

    4 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're seeking an experienced Site Reliability Engineer (SRE) to join our team, focusing on designing, implementing and maintaining scalable CI/CD pipelines. · Design, implement and maintain scalable CI/CD pipelines using tools like Jenkins, Argo and GitOps. · Collaborate with dev ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Site Reliability Engineer will ensure the reliability and availability of software systems by designing resilient architectures, automating infrastructure management, and implementing effective incident response processes. · ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Hiring__Site Reliability Engineer

    Only for registered members

    Site Reliability Engineer (SRE) – Observability role in Banking Domain. · 10+ Years of exp and 5+ years of experience in Observability, or SRE · Working knowledge of metrics, logs, and basic tracing concepts · ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're seeking an experienced Site Reliability Engineer (SRE) to join our team, focusing on designing, implementing, and maintaining scalable CI/CD pipelines. The ideal candidate will have expert-level knowledge of Kubernetes, AWS, and CI/CD tools like Jenkins and Argo. Proficienc ...

    Toronto, Ontario $90,000 - $145,000 (CAD) per year

    4 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    This is a position for a Site Reliability Engineer to ensure the reliability performance scalability of our systems. · Monitor and maintain the health performance availability of our systems services. · ...

    Toronto, Ontario

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a Site Reliability Engineer to ensure the reliability performance and scalability of our systems. · ...

    Toronto

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    · As a Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc., where we design, create, deploy, maintain and grow industry-leading Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services. As a SRE/DevOps team member, ...

    Toronto $90,000 - $145,000 (CAD) per year

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Your Moneris Career - The Opportunity · As the Site Reliability Engineer (SRE), you will play a crucial role in ensuring the reliability, performance, and scalability of our systems. You will work closely with development and operations teams to build and maintain robust infrastr ...

    Toronto $90,000 - $145,000 (CAD) per year

    4 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Site Reliability Engineer will ensure the reliability and availability of software systems by designing resilient architectures, automating infrastructure management, and implementing effective incident response processes. · ...

    Toronto, Ontario

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Are you passionate about ensuring the reliability and performance of large-scale distributed systems? This role calls for a Site Reliability Engineer (Dynatrace Specialist), who will play a crucial part in maintaining and enhancing the observability, stability, and efficiency of ...

    Toronto, Ontario

    1 month ago

Jobs
>
Lead site reliability engineer
>
Jobs for Lead site reliability engineer in Toronto