Site Reliability Analyst - Pickering, Canada - Ontario Power Generation

Sophia Lee

Posted by:

Sophia Lee

beBee Recruiter


Description

Location:

Pickering, ON, CA, L1W 4A7
Req ID:45525
Status: Regular Full time
Working Conditions:Hybrid Work Environment (3 days in office)
Education Level:4 years of University degree in business administration or computer science, or by having the equivalent level of education


Location:

Pickering, Ontario
Shifts(s): Days
Travel:10%
Deadline to Apply: February 9, 2024.
Electrify your career and help build a brighter tomorrow.

  • Every generation has a challenge that defines them. At OPG, we are calling on all innovators, disruptors, thought leaders and changemakers. Join us to electrify life in one generation and build a sustainable future powered by our electricity, our ideas, and our people. Join OPG and make history.
  • Whether you work in the skilled trades or are a business professional, a career at OPG is an opportunity to electrify your life on and off the job.
    ACCOMMODATIONS
    _

NEW CAMPUS:
This position is moving to OPG

Corporate Headquarters:

_:


  • In Summer 2025, OPG will officially welcome employees to our new Corporate Headquarters located at 1908 Colonel Sam Drive, Oshawa, Ontario. This new space will enable teamwork, collaboration and innovation that will help us to achieve our mission to electrify life in one generation_

.

JOB OVERVIEW

  • Ontario Power Generation (OPG) is looking for a dynamic, strategic, and resultsdriven professional to join our team in the role of

-
Site Reliability Analyst.

  • This is an exciting opportunity to work in an environment where you will contribute to OPG's public outreach, engagement, and education efforts as part of the company's commitment to growing its social license.
    KEY ACCOUNTABILITIES
  • Incident Management: Respond to support service disruptions in a timely manner, ensuring mínimal impact on business operations. Coordinate with relevant teams to resolve incidents and restore normal service operation as quickly as possible. Participate in postincident reviews to identify root causes and preventive measures.
  • Performance Analysis: Regularly analyze system performance data to identify potential issues and areas for improvement. Make recommendations to improve system performance and reliability based on analysis findings.
  • System Maintenance and Improvement: Participate in the design, development, and maintenance of systems to ensure they meet business needs and maintain high levels of reliability and performance. Regularly review and update system documentation to reflect current operating procedures and system configurations.
  • Reporting and KPI Management: Develop and maintain key performance indicators (KPIs) to measure service reliability and performance. Regularly report on these KPIs to provide visibility into service performance and identify areas for improvement. Use these reports to drive continuous improvement initiatives and demonstrate the impact of these initiatives on service performance and reliability.
  • Compliance and Security: Ensure all activities comply with relevant regulations and security standards. This includes participating in audits, maintaining documentation, and implementing necessary security measures.
  • Continuous Learning and Improvement: Keep up to date with the latest industry trends and technologies related to site reliability. Regularly review and update skills and knowledge to improve work effectiveness and stay current with industry standards.
  • Working Relationship with

Front Line Operations Team:

Ability to work closely with front line operations teams to understand their needs and challenges, and to provide them with the support they need to ensure high service reliability and performance.

This includes the ability to communicate effectively with these teams and to build strong working relationships with them.

  • Other duties as required.

EDUCATION

  • 4 years of University degree in business administration or computer science, or by having the equivalent level of education.

QUALIFICATIONS
-
Requires a minimum of 6 years' experience and up to and including 8 years' experience as a Site Reliability Engineer or similar role, with a focus on system monitoring and reliability.

  • Proficiency in system monitoring tools and technologies, as well as a strong understanding of system architecture and performance metrics. Knowledge of programming languages, such as Python or Java, can also be beneficial. Familiarity with PowerShell, KQL, Power BI, and fullstack monitoring tools like Splunk, Grafana, Datadog, Dynatrace, Azure Monitor, Instana, or Opsview is essential.
  • Successfully implemented and managed a comprehensive instrumentation monitoring system for a largescale IT environment.
  • Led a crossfunctional team in the integration of distributed tracing systems for microservices, enhancing the ability to diagnose and troubleshoot system issues.
  • Proficient in using advanced diagnostic tools and approach to support troubleshooting of critical s

More jobs from Ontario Power Generation