- Collaborate with cross-functional teams to design, deploy, and maintain reliable and scalable services
- Implement best practices for monitoring, logging, and alerting to ensure rapid detection and resolution of issues
- Troubleshoot and resolve incidents related to the infrastructure, applications, and network to minimize downtime and improve system reliability
- Participate in capacity planning and performance optimization efforts to handle increasing user demands and traffic growth
- Develop and maintain automation tools for configuration management, deployment, and continuous integration/continuous deployment (CI/CD) pipelines
- Conduct thorough post-incident reviews and work towards preventing similar incidents in the future
- Perform regular security assessments and ensure compliance with industry standards and regulations
- Stay up-to-date with the latest technologies and industry trends to propose innovative solutions and improvements
- Completion of a degree or diploma program in computer science or a related discipline plus 5 years of related experience, or an equivalent combination of training and experience
- ITIL Foundation v3 or later accreditation preferred
- Sound experience (5+ years) of running services in a large scale enterprise environment
- Experience in one of the leading cloud platforms such as AWS, Azure or Google Cloud
- Experience with distributed monitoring and logging solutions (such as Prometheus, Thanos, Splunk, Elasticsearch, Grafana, Dynatrace, New Relic, Honeycomb)
- Experience with containers and container orchestration (such as docker, podman, kubernetes)
- Experience with DevOps platform (such Gitlab, Github, Azure DevOps, Teamcity, Octopus)
- Knowledge of application performance monitoring (such as Dynatrace, New Relic, Appdynamics)
- Knowledge of Scaling, Capacity Planning and Disaster Recovery
- Knowledge of Chaos Engineering
- Ability to design, author, and release code in any language (Go, Python, Ruby or Java would be a plus)
- Collaborate with cross-functional teams to design, deploy, and maintain reliable and scalable services
- Implement best practices for monitoring, logging, and alerting to ensure rapid detection and resolution of issues
- Troubleshoot and resolve incidents related to the infrastructure, applications, and network to minimize downtime and improve system reliability
- Participate in capacity planning and performance optimization efforts to handle increasing user demands and traffic growth
- Develop and maintain automation tools for configuration management, deployment, and continuous integration/continuous deployment (CI/CD) pipelines
- Conduct thorough post-incident reviews and work towards preventing similar incidents in the future
- Perform regular security assessments and ensure compliance with industry standards and regulations
- Stay up-to-date with the latest technologies and industry trends to propose innovative solutions and improvements
- Completion of a degree or diploma program in computer science or a related discipline plus 5 years of related experience, or an equivalent combination of training and experience
- ITIL Foundation v3 or later accreditation preferred
- Sound experience (5+ years) of running services in a large scale enterprise environment
- Experience in one of the leading cloud platforms such as AWS, Azure or Google Cloud
- Experience with distributed monitoring and logging solutions (such as Prometheus, Thanos, Splunk, Elasticsearch, Grafana, Dynatrace, New Relic, Honeycomb)
- Experience with containers and container orchestration (such as docker, podman, kubernetes)
- Experience with DevOps platform (such Gitlab, Github, Azure DevOps, Teamcity, Octopus)
- Knowledge of application performance monitoring (such as Dynatrace, New Relic, Appdynamics)
- Knowledge of Scaling, Capacity Planning and Disaster Recovery
- Knowledge of Chaos Engineering
- Ability to design, author, and release code in any language (Go, Python, Ruby or Java would be a plus)
-
Site Reliability Engineer
1 week ago
Stafflink Vancouver, BC, CanadaJob Description · Position: Site Reliability Engineer · Duration: 12 Months · Location: Principally remote, with at least one day per month in office for applicants in the lower mainland. Local candidates are given preference. · Work hours: Monday – Friday, 9:00 am – 5:00 ...
-
Site Reliability Engineer
1 week ago
T-Net British Columbia Vancouver, BC, CanadaSite Reliability Engineer Co-op (Sept May 2025) Job Overview · Our innovative technology transforms the way that organisations make decisions, allowing them to elevate their employees and drive better business outcomes. Embarking on an exciting new chapter in our growth story, w ...
-
Site Reliability Engineer
5 days ago
Dapper Labs Vancouver, Canada Full timeWe're looking for a Site Reliability Engineer who wants to be at the technical core of an organization that's completely reshaping how distributed applications on blockchains can reach massive audiences. · You will join a Site Reliability Engineering team that has the ability t ...
-
Site Reliability Engineer
1 week ago
Axiom Zen Vancouver, CanadaWe're looking for a Site Reliability Engineer who wants to be at the technical core of an organization that's completely reshaping how distributed applications on blockchains can reach massive audiences. · You will join a Site Reliability Engineering team that has the ability to ...
-
Site Reliability Engineer
1 week ago
Visier Inc. Vancouver, BC, CanadaOur co-op experience is unique and designed to prepare you for professional success as you work on real, impactful work from the beginning. Our ultimate goal is to give you the mentorship, training, and work experience you need to start your career. A number of our students retur ...
-
Site Reliability Engineer
1 week ago
Visier, Inc Vancouver, BC, CanadaVisier Co-op Opportunity · Our innovative technology transforms the way that organisations make decisions, allowing them to elevate their employees and drive better business outcomes. Embarking on an exciting new chapter in our growth story, we are looking for talented individua ...
-
Site Reliability Engineer
1 week ago
Visier, Inc Vancouver, BC, CanadaOur innovative technology transforms the way that organizations make decisions, allowing them to elevate their employees and drive better business outcomes. Embarking on an exciting new chapter in our growth story, we are looking for talented individuals who can help both Visier ...
-
Senior Site Reliability Engineer
2 weeks ago
Razr Marketing Vancouver, BC, CanadaSenior Site Reliability Engineer · These values have made RAZR what it is for years, and today, they are more important than ever. You can't wait to get out of bed in the morning & get on with your day · We are seeking a skilled and motivated Site Reliability Engineer (SRE) to ...
-
Senior Site Reliability Engineer
1 week ago
Sentry Vancouver, BC, CanadaAbout the role · The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1 ...
-
Senior Site Reliability Engineer
1 week ago
RAZR Marketing, Inc. Vancouver, BC, CanadaYou will be required to be in our office In Vancouver, BC three times per week. · These values have made RAZR what it is for years, and today, they are more important than ever. You can't wait to get out of bed in the morning & get on with your day · We are seeking a skilled an ...
-
Senior Site Reliability Engineer
1 week ago
Red Hat British Columbia, CanadaAbout the job · Red Hat is seeking a Senior Site Reliability Engineer (SRE) to develop, scale, and operate our OpenShift managed cloud services. OpenShift is Red Hat's enterprise Kubernetes distribution. As an SRE you will contribute to running OpenShift at scale by enabling cus ...
-
Site Reliability Engineer Vancouver
12 hours ago
Taurus SA Vancouver, BC, CanadaAre you ready to take on an entrepreneurial challenge in the digital asset industry? Taurus, a global leader in digital asset infrastructure, has an exciting opportunity for you. · Founded in April 2018, Taurus provides enterprise-grade solutions to issue, custody, and trade dig ...
-
Site Reliability Performance Engineer
1 week ago
Stafflink Vancouver, BC, CanadaPosition: Site Reliability Engineer · Location: Principally remote, with at least one day per month in office for applicants in the lower mainland. Local candidates are given preference. · Monday - Friday, 9:00 am - 5:00 pm PST · Serve as the subject matter expert (SME) for Dynat ...
-
Site Reliability Performance Engineer
5 days ago
Dapper Labs Vancouver, BC, CanadaWe're looking for a Site Reliability Engineer who wants to be at the technical core of an organization that's completely reshaping how distributed applications on blockchains can reach massive audiences. · You will join a Site Reliability Engineering team that has the ability to ...
-
Site Reliability Engineer Vancouver
4 days ago
Taurus SA Vancouver, Canada CDIAre you ready to take on an entrepreneurial challenge in the digital asset industry? Taurus, a global leader in digital asset infrastructure, has an exciting opportunity for you. · Founded in April 2018, Taurus provides enterprise-grade solutions to issue, custody, and trade dig ...
-
Senior Site Reliability Engineer
1 week ago
Red Hat, Inc. British Columbia, CanadaAbout the job · Red Hat is seeking a Senior Site Reliability Engineer (SRE) to develop, scale, and operate our OpenShift managed cloud services. OpenShift is Red Hat's enterprise Kubernetes distribution. As an SRE you will contribute to running OpenShift at scale by enabling cu ...
-
Senior Site Reliability Engineer
1 week ago
Red Hat, Inc. British Columbia, CanadaAbout the job · Red Hat is seeking a Senior Site Reliability Engineer (SRE) to develop, scale, and operate our OpenShift managed cloud services. . OpenShift is a cloud native application platform for the enterprise, powered by Kubernetes. As an SRE you will contribute to runnin ...
-
Site Reliability Engineer III
1 week ago
Electronic Arts Vancouver, CanadaEA's Digital Platform (EADP) organization drives important technology decisions and investments for EA on a global basis, across all divisions and studio teams. Technology and engineering leadership at EA is essential to making the industry's best games and services and the EADP ...
-
Site Reliability Engineer II
5 days ago
Electronic Arts Vancouver, Canada RegularResponsibilities · : You will create monitoring, alerting and dashboarding solutions that improve visibility into EA's application performance and business metrics. · You will help design and develop robust, supportable tools to automate the deployment and management of distrib ...
-
Site Reliability Engineer
2 weeks ago
New Value Solutions Richmond, CanadaNew Value Solutions, a national IT consulting company, is seeking a Site Reliability Engineer for our client. · Responsibilities: · Serve as the subject matter expert (SME) for Dynatrace, responsible for configuring, optimizing, and managing Dynatrace monitoring solutions. · Des ...
Senior Site Reliability Engineer - Vancouver, Canada - TEEMA
Description
MUST LIVE IN CANADA NEAR AN AIRPORT
Looking for a technical lead with 10+ years of DevOps/SRE experience
MUST HAVE - 5+ years permanent residence or Citizenship (cant have lived out of Canada for the last 5 years)
MUST LIVE IN CANADA NEAR AN AIRPORT
Looking for a technical lead with 10+ years of DevOps/SRE experience
Monitoring and logging services are a must 2 or 3 of them that are listed and Orchestration
Close to a city with the ability to traveling up to 4 X a year to Vancouver.
1st - technical interview
Team Size - 2 team members already onboarded plus manager
Work is very meaningful - province wide for public safety - big project roll out.
Pensioned position - municipality pension plan is better. Stable and room to grow.
Our client is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join their dynamic and innovative team. As an SRE, you will play a critical role in maintaining and enhancing the reliability, availability, and performance of their systems. Your expertise in both software engineering and systems administration will be key in building and automating scalable infrastructure solutions. In this role, you will be responsible for improving the reliability and performance of production applications and infrastructure with a focus of automation, system design and improvements to system resilience. We are seeking a technical expert who understands the criticality of our systems and who is able to manage risk and support the improvement of more resilient and reliable technological capabilities.
What you will be doing:
What you must have:
Monitoring and logging services are a must 2 or 3 of them that are listed and Orchestration
Close to a city with the ability to traveling up to 4 X a year to Vancouver.
1st - technical interview
Team Size - 2 team members already onboarded plus manager
Work is very meaningful - province wide for public safety - big project roll out.
Pensioned position - municipality pension plan is better. Stable and room to grow.
Our client is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join their dynamic and innovative team. As an SRE, you will play a critical role in maintaining and enhancing the reliability, availability, and performance of their systems. Your expertise in both software engineering and systems administration will be key in building and automating scalable infrastructure solutions. In this role, you will be responsible for improving the reliability and performance of production applications and infrastructure with a focus of automation, system design and improvements to system resilience. We are seeking a technical expert who understands the criticality of our systems and who is able to manage risk and support the improvement of more resilient and reliable technological capabilities.
What you will be doing:
What you must have: