Jobs
>
Montréal

    Principal Site Reliability Engineer - Montreal, Canada - Lightspeed

    Default job background
    Full time
    Description

    Hi there Thanks for stopping by

    Are you actively looking for a new opportunity? Or just checking the market? Well... you might just be in the right place

    We're looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more. You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks, and help run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.

    What you'll be doing:

  • Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
  • Design, build and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
  • Develop and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, etc.).
  • Drive incident management process and conduct post-mortem analysis to prevent future outages.
  • Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
  • Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
  • Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
  • Design and build robust, scalable, and highly available systems.
  • Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery
  • Manage infrastructure change through infrastructure as code (IaC)
  • Be part of our on-call rotation.
  • Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.
  • What you need to bring:

  • Bachelor's degree in Computer Science, Engineering, or possess a related level of real-world experience.
  • 9-10+ years of experience across site reliability engineering, systems administration, and/or software engineering.
  • Strong expertise in container orchestration platforms, specifically Kubernetes.
  • Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).
  • Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.
  • Proficiency in programming languages such as Java, Python, Go, etc.
  • Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS or Azure.
  • Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).
  • Strong understanding of security best practices.
  • Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues.
  • Excellent communication skills to effectively collaborate with cross-functional teams.
  • Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.
  • We know that people are more than what's on their CV. If you're unsure that you have the right profile for the role... hit the 'Apply' button and give it a try

    What's in it for you?

    Come live the Lightspeed experience...

  • Ability to do your job in a truly flexible environment;
  • Genuine career opportunities in a company that's creating new jobs everyday;
  • Work in a team big enough for growth but lean enough to make a real impact.
  • ... and enjoy a range of benefits that'll keep you happy, healthy and (not) hungry:

  • Lightspeed share scheme (we are all owners)
  • Lightspeed RSU program (we are all owners)
  • Unlimited paid time off policy
  • Flexible working policy
  • Health insurance
  • Health and wellness benefits
  • Paid leave assistance for new parents
  • Linkedin learning
  • Volunteer day


  • Tecsys Inc. Montreal, Canada Full time

    La version française suit ci-dessous · Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we inve ...


  • Plexia Montreal, Canada Full time

    du poste · A titre de Junior Site Reliability Engineer (SRE), vous jouerez un rôle crucial au sein du département R&D et Innovation. Vous serez invité à collaborer avec l'équipe chargée de développement des logiciels et de l'architecture core de Plexia. Le caractère très sensible ...


  • Synechron Montréal, QC, Canada

    Nous sommes Synechron est un cabinet de conseil leader mondial en transformation numérique, axé sur les services financiers et les organisations technologiques. Nos spécialités incluent l'intelligence artificielle de bout en bout, le conseil, le numérique, le cloud & DevOps, les ...


  • LanceSoft, Inc. Montreal, Canada

    Job Title: Production Reliability & Support Expert (SRE) · Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week) · Years of experience : 3 to 5 years · • Ensure Production Management is closely aligned/embedded in the Agile software development process and ...


  • LanceSoft, Inc. Montreal, Canada

    Job Description: · We are growing our team globally. It's a unique opportunity to work on leading edge projects leveraging the latest technologies such as Cloud solutions and Analytics. The primary objective of the team is to ensure reliability across the production plant by deve ...


  • LanceSoft, Inc. Montreal, Canada

    Job Title: Production Reliability & Support Expert (SRE)Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week)Years of experience : 3 to 5 years · • Ensure Production Management is closely aligned/embedded in the Agile software development process and our ...


  • Cisco Montreal, Canada

    Who We Are · As a part of Cisco, Accedian is a leader in performance analytics and end user experience solutions for service providers and mid-to-large size enterprises. The Accedian Skylight service assurance platform offers granular end-to-end visibility within "the massive m ...


  • Cisco Montreal, Canada

    Who We Are · As a part of Cisco, Accedian is a leader in performance analytics and end user experience solutions for service providers and mid-to-large size enterprises. The Accedian Skylight service assurance platform offers granular end-to-end visibility within "the massive m ...


  • Lightspeed Montréal, QC, Canada

    We're looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supportin ...


  • Plexia Montreal, Canada Full time

    Job Description · As a Junior Site Reliability Engineer (SRE) you will play a crucial role within the R&D and Innovation department. You will be called upon to collaborate with the Plexia product-aligned and core architecture team. The highly sensitive nature of health and medica ...


  • Behavox Montreal, Canada

    About the Role · The Behavox Platform is a scalable, fault-tolerant and highly performant storage and processing system which allows us to manage and analyze massive volumes of data. We have an extensive and flexible set of APIs to develop products that allow our clients to work ...


  • Hunter Bond Montréal, QC, Canada

    Job Title: Application Support Engineer Client: Fintech · My client are looking to expand their Application Support team, and would like someone with prior front office experience to provide technical support and engineering functions in support of their proprietary and third pa ...


  • Behavox Montreal, Canada

    About Behavox · Behavox is shaping the future for how businesses harness their most important raw material - data. Our mission is bold: Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world. ...


  • Hunter Bond Montréal, QC, Canada

    Job Title: Application Support EngineerClient: FintechSalary: Circa $125,000 + Bonuses & PackageLocation: Montreal/HybridMy client are looking to expand their Application Support team, and would like someone with prior front office experience to provide technical support and engi ...


  • Soho Square Solutions Montréal, QC, Canada

    Bachelor's degree in Computer Science or related field · • Experience with Service Oriented Architecture, Distributed Systems, Business Intelligence Reporting such as PowerBI, Scripting such as Python or shell, Front end development (HTML, Java Script, AngularJS), Cloud Computing ...


  • National Bank Montreal, Canada Permanent

    Attendance Hybrid Job Number 19678 Category Senior Professional Status: Permanent Type of Contract Permanent Schedule: Full-Time Full Time / Part Time? Full-Time Posting date 19-Mar-2024 Location: Montreal, Quebec City Montreal Province/State Quebec Area of Interest: Information ...


  • CGI Montreal, Canada Full time

    Position Description: · CGI is a dynamic and innovative technology firm committed to delivering cutting-edge solutions. We are currently seeking a highly skilled and motivated individual to join our team as a FinOps and Site Reliability Engineer (SRE). This role is pivotal in br ...


  • NBC Montreal, Canada Full time

    Area of Interest: Information technology A career in technology at National Bank means being part of the transformation to have a direct impact on the client. As a Systems Reliability Specialist, you will be expected to help all IT teams put in place the necessary mechanisms ...


  • National Bank Montreal, Canada OTHER

    · Job Posting · Attendance Hybrid Job Number 19678 · Category: Senior Professional · Status: Permanent · Type of Contract: Permanent · Schedule: Full-Time · Full Time / Part Time? Full-Time · Posting date: 19-Mar-2024 · Location: Montreal, Quebec City Montreal · Province/State: ...


  • Stingray Montreal, Canada

    Department IT Location Montreal At Stingray, creativity, collaboration, and cutting-edge technology are the pillars of our DNA. Are you ready to watch your career take off by joining a fast-growing company with a team of tech-savvy music lovers and a stimulating and fun work env ...