- Monitoring our cloud and Customer On-Premise infrastructure: Assessing its health to offer 24/7 service to our customers.
- Detecting potential issues : Configure monitoring to intercept them before an outage occurs.
- Participating in system troubleshooting: and recommend improvements to our platform and tools, regular and systematic code testing, and deployment.
- Supporting our public cloud deployments : Research, propose and participate in the implementation of security best practices for public cloud deployments and data management.
- Prioritizing and escalating: Raising problems to Development, collaborating with our Operations lead and on-call engineer to investigate operational issues impacting users and identify root causes.
- Driving automation development: Build configuration management tools and scripts to address operational incidents.
- Improving our Security posture: Enforce policies for environment security and their application to our DevOps tools.
- 8 years of related experience as a Software Engineer, DevOps Engineer, Site Reliability Engineer or a role in a related field.
- Experience administering Cloud or Virtualized environments using UNIX/LINUX command line and scripting.
- IT support experience focused on handling and troubleshooting system-wide solutions.
- Demonstrated experience deploying multi-service applications on cloud platforms such as AWS, Google Cloud, or Azure using a modern toolset.
- Experience in developing continuous monitoring and automated alerting systems to ensure the stability and reliability of IT systems.
- Experience with configuration management tools such as Ansible, Salt, Puppet, Chef, or similar.
- Bachelors in a STEM related discipline.
- A deep understanding of Docker containerization and orchestration, with Kubernetes experience.
- Knowledge of IP networking, VPNs, DNS, load balancing, and firewall management.
- Familiarity with infrastructure management solutions; experience with HashiCorp Terraform and HashiCorp Vault is.
- Experience in setting up and maintaining continuous integration and deployment pipelines.
- Ability to write and speak French.
-
Stingray Montreal, CanadaDépartement IT Lieu Montréal Chez Stingray, la créativité, la collaboration et la technologie innovante sont les piliers de notre ADN. Es-tu prêt.e à rocker ta carrière en rejoignant une entreprise en pleine croissance, une équipe de passionnés.es de musique dans un environnemen ...
-
Site Reliability Engineer
7 hours ago
LanceSoft, Inc. Montreal, CanadaJob Title: Production Reliability & Support Expert (SRE) · Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week) · Years of experience : 3 to 5 years · • Ensure Production Management is closely aligned/embedded in the Agile software development process and ...
-
Site Reliability Engineer
6 days ago
LanceSoft, Inc. Montreal, CanadaJob Description: · We are growing our team globally. It's a unique opportunity to work on leading edge projects leveraging the latest technologies such as Cloud solutions and Analytics. The primary objective of the team is to ensure reliability across the production plant by deve ...
-
Site Reliability Engineer
3 days ago
LanceSoft, Inc. Montreal, CanadaJob Title: Production Reliability & Support Expert (SRE)Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week)Years of experience : 3 to 5 years · • Ensure Production Management is closely aligned/embedded in the Agile software development process and our ...
-
Site Reliability Engineering
3 days ago
Cisco Montreal, CanadaWho We Are · As a part of Cisco, Accedian is a leader in performance analytics and end user experience solutions for service providers and mid-to-large size enterprises. The Accedian Skylight service assurance platform offers granular end-to-end visibility within "the massive m ...
-
Principal Site Reliability Engineer
5 days ago
Lightspeed Montréal, QC, CanadaWe're looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supportin ...
-
Principal Site Reliability Engineer
3 days ago
Lightspeed Montreal, Canada Full timeHi there Thanks for stopping by · Are you actively looking for a new opportunity? Or just checking the market? Well... you might just be in the right place · We're looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER ...
-
Junior Site Reliability Engineer
1 day ago
Plexia Montreal, Canada Full timeJob Description · As a Junior Site Reliability Engineer (SRE) you will play a crucial role within the R&D and Innovation department. You will be called upon to collaborate with the Plexia product-aligned and core architecture team. The highly sensitive nature of health and medica ...
-
Site Reliability Engineer 3
3 days ago
Behavox Montreal, CanadaAbout the Role · The Behavox Platform is a scalable, fault-tolerant and highly performant storage and processing system which allows us to manage and analyze massive volumes of data. We have an extensive and flexible set of APIs to develop products that allow our clients to work ...
-
Site Reliability Engineer 3
3 days ago
Behavox Montreal, CanadaAbout Behavox · Behavox is shaping the future for how businesses harness their most important raw material - data. Our mission is bold: Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world. ...
-
Windows Site Reliability Engineer,
5 days ago
Hunter Bond Montréal, QC, CanadaJob Title: Application Support Engineer Client: Fintech · My client are looking to expand their Application Support team, and would like someone with prior front office experience to provide technical support and engineering functions in support of their proprietary and third pa ...
-
Site Reliability Performance Engineer
2 weeks ago
Soho Square Solutions Montréal, QC, CanadaBachelor's degree in Computer Science or related field · • Experience with Service Oriented Architecture, Distributed Systems, Business Intelligence Reporting such as PowerBI, Scripting such as Python or shell, Front end development (HTML, Java Script, AngularJS), Cloud Computing ...
-
Windows Site Reliability Engineer,
1 week ago
Hunter Bond Montréal, QC, CanadaJob Title: Application Support EngineerClient: FintechSalary: Circa $125,000 + Bonuses & PackageLocation: Montreal/HybridMy client are looking to expand their Application Support team, and would like someone with prior front office experience to provide technical support and engi ...
-
CGI Montreal, Canada Full timePosition Description: · CGI is a dynamic and innovative technology firm committed to delivering cutting-edge solutions. We are currently seeking a highly skilled and motivated individual to join our team as a FinOps and Site Reliability Engineer (SRE). This role is pivotal in br ...
-
Site Reliability Engineering Developer SRE
1 week ago
National Bank Montreal, Canada PermanentAttendance Hybrid Job Number 19678 Category Senior Professional Status: Permanent Type of Contract Permanent Schedule: Full-Time Full Time / Part Time? Full-Time Posting date 19-Mar-2024 Location: Montreal, Quebec City Montreal Province/State Quebec Area of Interest: Information ...
-
Site Reliability Engineering Developer SRE
4 days ago
NBC Montreal, Canada Full timeArea of Interest: Information technology A career in technology at National Bank means being part of the transformation to have a direct impact on the client. As a Systems Reliability Specialist, you will be expected to help all IT teams put in place the necessary mechanisms ...
-
Site Reliability Engineering Developer SRE
2 days ago
National Bank Montreal, Canada OTHER· Job Posting · Attendance Hybrid Job Number 19678 · Category: Senior Professional · Status: Permanent · Type of Contract: Permanent · Schedule: Full-Time · Full Time / Part Time? Full-Time · Posting date: 19-Mar-2024 · Location: Montreal, Quebec City Montreal · Province/State: ...
-
Tecsys Inc. Montreal, Canada Full timeLa version française suit ci-dessous · Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we inve ...
-
Stingray Montreal, CanadaDepartment IT Location Montreal At Stingray, creativity, collaboration, and cutting-edge technology are the pillars of our DNA. Are you ready to watch your career take off by joining a fast-growing company with a team of tech-savvy music lovers and a stimulating and fun work env ...
-
Reliability Engineer
1 day ago
Contrôles Laurentide Kirkland, QC, CanadaDescription : · RELIABILITY ENGINEER – VIBRATION SPECIALIST · Come join the largest supplier of automation and reliability solutions in our region. Discover what we can offer you and be the voice that cultivates innovative ideas To Help Industry Thrive in Eastern Canada. · Sou ...
Site Reliability Engineering - Montreal, Canada - Cisco
Description
Who We Are
As a part of Cisco, Accedian is a leader in performance analytics and end user experience solutions for service providers and mid-to-large size enterprises. The Accedian Skylight service assurance platform offers granular end-to-end visibility within "the massive multi" - multi-layer, multi-domain, and multi-vendor networks. Accedian's open and scalable platform removes roadblocks to innovation, enabling cloud-native analytics and empowering customers to launch new assured services based on 5G, SD-WAN, and edge technologies.
Who You Are
You are an expert in deployment and network operations, skilled in using scripts and automation tools to enhance software processes. With a passion for scripting and automation, you contribute to effective software strategies, oversee maintenance, and optimize systems. Proficient with Kubernetes and Docker Swarm, you seek new ways to monitor deployment health and performance. Your proactive nature and dedication to tech excellence make you a valuable team member in operational efficiency and reliability.
Who You'll Work With
Our team prioritizes your growth in technical, business, and soft skills within a culture that values team strength and investment. We adopt a "You build it, you run it" approach, empowering team members to actively manage and improve our software. Committed to continuous learning, we support mastering new technologies and champion a culture of ambition and innovation in cloud computing.
What You'll Do
Our growing team is looking for dedicated Service Reliability Engineering professional (SRE) to work with a small, innovative team of industry experts to help perfect our platform by improving our automation processes around deployment and operations.
You will take charge of enhancing the product life cycle, manage configuration, assist in deployment and scripting for management purposes, and collaborate within a cross-functional team. Your responsibility will be to spearhead the initiatives and orchestrate the DevOps cycle. Your responsibilities will include:
This role includes periodic participation in an on-call rotation approximately once every six weeks.
Minimum Qualifications:
Preferred Qualifications: