Lead Site Reliability Engineer - Ottawa, Canada - British Council

    Default job background
    Description

    We support peace and prosperity by building connections, understanding and trust between people in the UK and countries worldwide.

    We work directly with individuals to help them gain the skills, confidence and connections to transform their lives and shape a better world in partnership with the UK. We support them to build networks and explore creative ideas, to learn English, to get a high-quality education and to gain internationally recognised qualifications.

    Working with people in over 200 countries and territories, we are on the ground in more than 100 countries. In 2021–22 we reached 650 million people.

    Location: Warsaw
    Department: Engineering
    Contract type: Indefinite (permanent)
    Closing Date: 28th April 2024 at 23:59

    Some of our benefits are:

  • Enjoy a standard workday of 7.45 hours and a weekly total of 38.75 hours, allowing for a healthy work-life balance.
  • Benefit from additional holidays, including a minimum of 35 days of annual leave per year, which includes public holidays. Our holiday calendar is meticulously crafted based on the public holiday lists for Poland, Belgium, and adheres to the British Council policy.
  • Stay active and healthy with our Corporate Sports Card - MultiSport Light 10, which is partially funded by our Social Fund (ZFŚS).
  • Avail significant discounts on our English courses and exams for both you and your immediate family members.
  • Take advantage of various financial grants and holiday refunds through our Social Fund (ZFŚS), which offers both random and social assistance.
  • Access our Employee Assistance Programme, providing confidential psychological support services to ensure your overall wellbeing.
  • Explore diverse avenues for professional development and career advancement within our organizational structure.
  • This is a new role in Digital & Technology, and the successful applicant will be pioneering how we take site reliabity on. We are excited to introduce a new position within the Digital & Technology team where the selected candidate will play a pioneering role in advancing our site reliability initiatives.


    You must have the legal right to work in Poland at the time of application. There is no relocation or sponsorship support.

    British Council supports working in new ways such as hybrid working, subject to full approval by line management and conditional upon our ability to provide the appropriate level of service. This may not be appropriate for all roles but can be explored at interview.

    Role purpose

    As a Lead Site Reliability Engineer at the British Council, you will be instrumental in executing and supporting our reliability and operational excellence strategy across our digital platforms and systems. Collaborating with our outsourcing partner, TCS, you'll ensure that our platforms maintain high availability and resilience. By leveraging your advanced knowledge in system architecture, automation, and domain-specific areas such as cloud infrastructure, performance optimisation, or monitoring, you will consistently meet the evolving business needs of the British Council.

    Role Context

    The Digital and Technology (D&T) team collaborate with other parts of the British Council to foster a digitally enabled, customer centric organisation. As a digitally inclusive team, we strive to ensure our people, and those we work with, are enabled to feel digitally confident. We do this by empowering our people with digital skills and support to help them rise to important challenges, regardless of role or location. Our people steer our ambition, drive our successes and manage us through periods of great change. Together, we strive to build an inclusive environment that upholds our values and supports us in our work.

    We are:

  • We share the right information, with the right people, at the right time.
  • We define and validate value to empower our experts to make high-impact decisions.
  • We reduce digital inequalities and reduce the impact of digital disadvantage for our people, our stakeholders, and our customers.
  • The Engineering team brings together capabilities which include Architecture, Software Engineering, QA and Delivery expertise to ensure excellence in the development of our global products and services. Encompassing reuse, buy and build strategies, it champions community of practices, define solutions and core-capabilities and create amazing products.

    Accountabilities/Responsibilities:

  • Contribute significantly to the British Council's reliability strategy and vision for digital platforms and systems, ensuring alignment with the broader organisational goals.
  • Collaborate to ensure that reliability initiatives align with the strategic aims of the organisation, driving consistent operational excellence.
  • Innovate and implement reliability best practices across a diverse set of digital platforms and systems.
  • Engage with stakeholders to provide insights into the importance of site reliability and its impact on user satisfaction.
  • Pinpoint areas of potential enhancement in digital platforms' reliability and performance based on feedback, system metrics, and detailed analysis.
  • Implement solutions that bolster system resilience, improve performance, and ensure optimal user experience.
  • Uphold and advocate for best practices, methodologies, and industry standards in site reliability engineering.
  • Demonstrate expertise in core areas of site reliability engineering, such as automation, cloud infrastructure, or performance optimisation.
  • Collaborate with product and technology teams to outline system requirements and performance expectations based on user needs and organisational objectives.
  • Establish and nurture relationships with key stakeholders across Digital and Technology departments, and other relevant teams.
  • Maintain transparent communication channels, aligning objectives, expectations, and securing support for reliability initiatives.
  • Serve as a connector between SRE teams and other departments, championing the importance of reliability and ensuring collaborative efforts towards shared goals.
  • Offer guidance and insights to other SREs, promoting a positive and teamwork-oriented environment.
  • Contribute significantly to the reliability engineering processes, ensuring timelines, resource allocations, and quality standards are met.
  • Assist in the professional development of peers, sharing knowledge and best practices.
  • Requirements of the role:

  • Proven Experience: Background in roles like Systems Administrator, Cloud Engineer, DevOps Engineer, or a mid-to-senior Site Reliability Engineer position.
  • Leadership: 5+ years in an engineering role, guiding and mentoring junior engineers.
  • Production Systems: Hands-on experience with supporting and scaling high-traffic, production-grade systems.
  • System Design Principles: Solid grasp of system and design principles to create scalable and resilient systems that meet current business needs.
  • Incident Handling: Demonstrated ability to efficiently handle and resolve critical incidents, ensuring minimized downtime and impact.
  • Cross-functional Collaboration: A history of working closely with product, development, QA, and operations teams to bolster system reliability.
  • Continuous Improvement: Consistent effort in system optimizations, refining processes, and pushing for efficiency improvements.
  • Problem Solving: Strong diagnostic and troubleshooting skills, especially in ahigh-pressure environment.
  • User Experience: Understanding the connection between system reliability, performance, and the end-user experience.
  • Automated Testing: Knowledge of automated testing methodologies and tools to ensure the reliability of infrastructure as code deployments.
  • Written and verbal proficiency (CEF 1) in English is required.
  • Microsoft Certified Azure DevOps Engineer Expert or AWS Certified DevOps Engineer – Professional.
  • A connected and trusted UK in a more connected and trusted world.

    Equality, Diversity, and Inclusion (EDI) Statement