Senior SRE Engineer - Montreal, Quebec

Only for registered members Montreal, Quebec, Canada

12 hours ago

Default job background
SRE Senior — Ingénieur(e) Observabilité & Monitoring · (Prometheus / métriques / propriétaire SLO) · Mission · En tant qu' · Ingénieur(e) senior en fiabilité des sites (SRE) – Observabilité · , tu seras responsable de la conception et de la mise en œuvre de la stratégie d'observa ...
Job description

SRE Senior — Ingénieur(e) Observabilité & Monitoring

(Prometheus / métriques / propriétaire SLO)

Mission

En tant qu'
Ingénieur(e) senior en fiabilité des sites (SRE) – Observabilité
, tu seras responsable de la conception et de la mise en œuvre de la stratégie d'observabilité de l'entreprise.

Ton objectif est de fournir une visibilité complète, fiable et exploitable sur le comportement, la performance et la fiabilité des systèmes à travers des plateformes distribuées à fort volume.

Tu seras chargé(e) de transformer le monitoring en une véritable capacité d'ingénierie de fiabilité grâce aux métriques, aux alertes et aux opérations pilotées par les SLO.

Responsabilités principales

Observabilité & Monitoring (cœur du rôle)

  • Définir et maintenir la stratégie globale d'observabilité
  • Concevoir des architectures de monitoring évolutives
  • Exploiter et optimiser les plateformes Prometheus, VictoriaMetrics et ClickHouse
  • Créer des dashboards avancés centrés sur le comportement et la performance des systèmes
  • Mettre en œuvre des stratégies d'alertes exploitables
  • Définir et maintenir les SLIs et SLOs
  • Garantir la qualité des données de monitoring et leur fiabilité à long terme
  • Anticiper les problématiques de scalabilité et de cardinalité des métriques

Ingénierie de fiabilité

  • Établir des métriques de fiabilité et des indicateurs de santé opérationnelle
  • Travailler avec les équipes pour adopter des pratiques de développement pilotées par les SLO
  • Analyser la performance des systèmes et identifier les risques de fiabilité
  • Diriger l'analyse post-incident avec une approche basée sur les données
  • Améliorer la détection des incidents et les temps de réponse

Intégration à la plateforme

  • Intégrer le monitoring dans les architectures microservices distribuées
  • Collaborer avec les équipes Kubernetes et développement
  • Déployer les composants d'observabilité via Helm
  • Garantir la couverture de monitoring de tous les services de production

Leadership technique

  • Promouvoir les bonnes pratiques d'observabilité au sein des équipes
  • Définir les standards et guidelines internes de monitoring
  • Former les ingénieurs à l'usage des alertes et du monitoring
  • Maintenir une veille technologique active sur les outils d'observabilité

Environnement technique

  • Prometheus
  • Victoria Metrics
  • ClickHouse
  • Grafana (dashboards implicites)
  • Kubernetes (niveau intégration)
  • Systèmes distribués

Profil recherché

  • Solide expérience pratique sur Prometheus en production
  • Bonne maîtrise des métriques, logs et systèmes d'alerting
  • Expérience dans la mise en place de SLOs et SLIs
  • Expérience des environnements de monitoring à fort volume
  • Expérience en SRE, ingénierie de production ou ingénierie de performance
  • À l'aise pour analyser les incidents et le comportement des systèmes


Similar jobs

  • Work in company

    Senior SRE Engineer

    Only for registered members

    SRE Senior — Spécialiste Plateforme Kubernetes · (Fiabilité des clusters & ingénierie de plateforme) · Mission · En tant qu' · Ingénieur(e) senior en fiabilité des sites (SRE) – Plateforme Kubernetes · , tu seras responsable de la fiabilité, de la scalabilité et de l'évolution de ...

    Montreal, Quebec

    12 hours ago

  • Work in company

    Senior SRE Engineer

    Only for registered members

    SRE Senior — Ingénieur(e) Observabilité & Monitoring · (Prometheus / métriques / propriétaire SLO) · Mission · En tant qu'Ingénieur(e) senior en fiabilité des sites (SRE) – Observabilité, tu seras responsable de la conception et de la mise en œuvre de la stratégie d'observabilité ...

    Montreal

    19 hours ago

  • Work in company

    Senior SRE Engineer

    Only for registered members

    SRE Senior — Spécialiste Plateforme Kubernetes · (Fiabilité des clusters & ingénierie de plateforme) · Mission · En tant qu'Ingénieur(e) senior en fiabilité des sites (SRE) – Plateforme Kubernetes, tu seras responsable de la fiabilité, de la scalabilité et de l'évolution de notre ...

    Montreal

    19 hours ago

  • Work in company

    Azure SRE Engineer

    Only for registered members

    The ideal candidate will develop quality software working with public cloud service provider (CSP) infrastructure across different Public Cloud areas. · Primary Responsibilities · Hands-on development and design of Python applications. · Enhance and integrate the CSP automation f ...

    Montreal

    1 week ago

  • Work in company

    Middleware Integrator and SRE Engineer

    Only for registered members

    We are looking to onboard a middleware integrator and SRE specialist for their Application and Data Engineering (ADE) team. · The successful candidate will be involved in middleware integration, technical troubleshooting of infrastructure and user incidents · ...

    Montreal, Quebec

    4 weeks ago

  • Work in company

    Middleware Integrator and SRE Engineer

    Only for registered members

    MiddleWare Integrator And SRE Engineer Long Term Consulting Opportunity Looking to onboard a middleware integrator and SRE specialist for their Application and Data Engineering ADE team. · ...

    Montreal

    4 weeks ago

  • Work in company

    Site Reliability Engineer (SRE)-- AWADC5704026

    Only for registered members

    We are looking for a Site Reliability Engineer (SRE) to join our team. The ideal candidate would have at least one of: Software development skills in one or more programming language, e.g. Python, ServiceNow administration or development experience. · Delivery of improvements tha ...

    Montreal

    1 month ago

  • Work in company

    Senior Cloud/DevOps/SRE/Systems Engineer

    Only for registered members

    We pride ourselves on a trusting, friendly, · and collegial corporate culture characterized by flat · hierarchies and independent work.Taking technical ownership of core cloud infrastructure components. · Operating and continuously improving our AWS-based production infrastructur ...

    Montreal

    1 month ago

  • Work in company

    Senior Cloud/DevOps/SRE/Systems Engineer

    Only for registered members

    Job summary · We are looking for a Senior Cloud/DevOps/SRE/Systems Engineer to join our team.Take technical ownership of core cloud infrastructure components. · Operate and continuously improve our AWS-based production infrastructure. · ...

    Montreal, Quebec

    1 month ago

  • Work in company

    Senior Cloud/DevOps/SRE/Systems Engineer

    Only for registered members

    We pride ourselves on a trusting, friendly, and collegial corporate culture characterized by flat hierarchies and independent work. · ...

    Montreal, Ottawa

    1 month ago

  • Work in company

    Senior Cloud/DevOps/SRE/Systems Engineer

    Only for registered members

    We are looking for a Senior Cloud/DevOps/SRE/Systems Engineer to join our team. The ideal candidate will have experience operating high-traffic production-grade cloud infrastructure and be familiar with AWS services such as EKS and RDS Aurora. · ...

    Montreal

    1 month ago

  • Work in company

    Senior Cloud/DevOps/SRE/Systems Engineer

    Only for registered members

    We are looking for an exciting challenge as Senior Cloud/DevOps/SRE/Systems Engineer.We pride ourselves on a trusting, · friendly, · and collegial corporate culture, · characterized by flat hierarchies · and independent work. · We believe in close collaboration · and strong team ...

    Montreal

    2 weeks ago

  • Work in company

    Senior Cloud/DevOps/SRE/Systems Engineer

    Only for registered members

    We are looking for an Senior Cloud/DevOps/SRE/Systems Engineer. As a company, we offer a Commerce Advertising Suite that drives growth for both publishers and advertisers through best-in-class solutions in commerce content, performance, · and affiliate marketing which is running ...

    Montréal-Ouest

    1 month ago

  • Work in company

    Senior Cloud/DevOps/SRE/Systems Engineer

    Only for registered members

    We are looking for a Senior Cloud/DevOps/SRE/Systems Engineer to join our team.As a company, we offer a Commerce Advertising Suite that drives growth for both publishers and advertisers through best-in-class solutions in commerce content, performance, and affiliate marketing whic ...

    Montréal-Ouest, QC HX W

    1 month ago

  • Work in company

    Site Reliability Engineer with Python

    Only for registered members

    The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for ServiceNow SaaS implementation. · ...

    Montreal, Quebec

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Application Infrastructure department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. · The idea ...

    Montreal, Quebec

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Roshan Consulting empowers businesses to optimize operations and enhance efficiency through innovative strategies and technologies tailored to their unique needs. · ...

    Montreal, Quebec

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, · operations and customer support services for the Company's ServiceNow SaaS implementation.Delivery of improvements that will maximize the avail ...

    Montreal, Quebec

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Opera ...

    Montreal, Quebec

    1 month ago

  • Work in company

    AI SRE

    Only for registered members

    We're looking for an AI SRE / AI Ops engineer to join our team in Montreal, QC. The ideal candidate will have experience in production environments, strong programming skills, and knowledge of containerization and orchestration tools. · This is a full-time position that requires ...

    Montréal, QC

    1 day ago