- Monitor and support internal platforms and microservices running across server-based and containerized environments.
- Investigate production issues by analyzing logs, metrics, and system health signals to identify root causes.
- Troubleshoot application-level failures, performance issues, and connectivity problems across distributed systems.
- View, manage, and troubleshoot containerized workloads, including application deployments and configuration changes.
- Understand service lifecycles, health checks, and when corrective actions (such as restarts or escalations) are required.
- Leverage AI-assisted tools to accelerate troubleshooting, analysis, and documentation while maintaining sound engineering judgment
- Review and maintain application configuration using centralized configuration management approaches.
- Validate application health endpoints and diagnostic signals to ensure services are operating as expected.
- Support application deployments to servers and platform environments following established processes.
- Investigate issues related to platform dependencies such as caching or in-memory data stores (e.g., Redis).
- Identify common failure modes such as configuration errors, resource exhaustion, or network-related issues that impact application behavior.
- Test, validate, and troubleshoot APIs using industry-standard tools to confirm expected behavior.
- Work with development teams to reproduce issues and verify fixes before and after deployment.
- Partner with developers, platform engineers, and operations teams to resolve incidents and improve platform stability.
- Document troubleshooting steps, findings, and operational runbooks to improve team knowledge and response time.
- 1–4 years of relevant experience in systems engineering, platform engineering, application support, DevOps support, or a related technical role.
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
- Experience troubleshooting applications in enterprise or distributed system environments.
- Familiarity with containerized platforms and microservices architectures.
- Experience using logging, monitoring, and observability tools to diagnose issues.
- Coding or scripting knowledge (e.g., Java, Python, Bash) to assist with troubleshooting and automation.
- Experience testing and validating APIs.
- Working knowledge of networking concepts as they relate to applications and containerized workloads.
- Strong analytical, problem-solving, and communication skills.
- Exposure to cloud-native environments and CI/CD pipelines.
- Experience troubleshooting caching systems or in-memory data stores (e.g., Redis) and using logging tools like Kibana (ElasticSearch)
- Familiarity with application health checks, diagnostics, and service monitoring patterns.
- Experience working in large-scale enterprise environments with multiple teams and shared platforms.
- Site Reliability Engineering (SRE) experience
- Work model: Remote / work-from-home in Canada or the U.S., with a preference for candidates based in Ontario (Canada) or North America time zones.
- Schedule: Occasional non-standard hours or overtime may be required based on business needs; some on-call availability may also be necessary for critical production support.
- Travel: May include local (in-country) and global travel for key meetings or team events.
Surveiller et soutenir les plateformes internes et les microservices fonctionnant dans des environnements basés sur serveurs ou conteneurs.
Enquêter sur les problèmes de production en analysant les journaux, les métriques et les signaux de santé du système afin d'identifier les causes profondes.
Dépanner les défaillances applicatives, les problèmes de performance et les enjeux de connectivité dans des systèmes distribués.
Visualiser, gérer et dépanner les charges de travail conteneurisées, incluant les déploiements applicatifs et les changements de configuration.
Comprendre les cycles de vie des services, les vérifications de santé et déterminer quand des actions correctives (comme des redémarrages ou des escalades) sont nécessaires.
Utiliser des outils assistés par l'IA pour accélérer le dépannage, l'analyse et la documentation tout en conservant un jugement d'ingénierie solide.
Examiner et maintenir la configuration des applications à l'aide d'approches centralisées de gestion de configuration.
Valider les points de terminaison de santé et les signaux diagnostiques pour s'assurer que les services fonctionnent comme prévu.
Soutenir les déploiements d'applications sur les serveurs et les environnements de plateforme selon les processus établis.
Enquêter sur les problèmes liés aux dépendances de plateforme comme les systèmes de cache ou les magasins de données en mémoire (ex. Redis).
Identifier les modes de défaillance courants tels que les erreurs de configuration, l'épuisement des ressources ou les problèmes réseau affectant le comportement des applications.
Tester, valider et dépanner les API à l'aide d'outils standard de l'industrie pour confirmer le comportement attendu.
Travailler avec les équipes de développement pour reproduire les problèmes et vérifier les correctifs avant et après les déploiements.
Travailler en partenariat avec les développeurs, les ingénieurs de plateforme et les équipes d'exploitation pour résoudre les incidents et améliorer la stabilité des plateformes.
Documenter les étapes de dépannage, les constats et les guides opérationnels afin d'améliorer les connaissances de l'équipe et les temps de réponse.
1 à 4 ans d'expérience pertinente en ingénierie des systèmes, ingénierie de plateforme, soutien applicatif, soutien DevOps ou rôle technique connexe.
Baccalauréat en informatique, technologies de l'information ou domaine connexe, ou expérience pratique équivalente.
Expérience en dépannage d'applications dans des environnements d'entreprise ou de systèmes distribués.
Familiarité avec les plateformes conteneurisées et les architectures microservices.
Expérience avec des outils de journalisation, de surveillance et d'observabilité pour diagnostiquer des problèmes.
Connaissances en programmation ou en scripts (ex. Java, Python, Bash) pour aider au dépannage et à l'automatisation.
Expérience en test et validation d'API.
Connaissances de base en réseautique appliquées aux applications et charges de travail conteneurisées.
Excellentes aptitudes analytiques, de résolution de problèmes et de communication.
Exposition aux environnements infonuagiques et aux pipelines CI/CD.
Expérience en dépannage de systèmes de cache ou de magasins de données en mémoire (ex. Redis) et utilisation d'outils de journalisation comme Kibana (ElasticSearch).
Familiarité avec les vérifications de santé applicative, les diagnostics et les modèles de surveillance de services.
Expérience dans des environnements d'entreprise à grande échelle avec plusieurs équipes et plateformes partagées.
Expérience en ingénierie de fiabilité des sites (SRE).
Modèle de travail : Télétravail au Canada ou aux États‑Unis, avec préférence pour les candidats basés en Ontario (Canada) ou dans les fuseaux horaires nord‑américains.
Horaire : Des heures non standard ou du temps supplémentaire peuvent être requis selon les besoins d'affaires; une disponibilité en rotation (on‑call) peut aussi être nécessaire pour le soutien critique en production.
Déplacements : Possibilité de déplacements locaux (dans le pays) ou internationaux pour des réunions clés ou des événements d'équipe.
- Elective Benefits: Our programs are tailored to your country to best accommodate your lifestyle.
- Grow Your Career: Accelerate your path to success (and keep up with the future) with formal programs on leadership and professional development, and many more on-demand courses.
- Elevate Your Personal Well-Being: Boost your financial, physical, and mental well-being through seminars, events, and our global Life Empowerment Assistance Program.
- Diversity, Equity & Inclusion: It's not just a phrase to us; valuing every voice is how we succeed. Join us in celebrating our global diversity through inclusive education, meaningful peer-to-peer conversations, and equitable growth and development opportunities.
- Make the Most of our Global Organization: Network with other new co-workers within your first 30 days through our onboarding program.
- Connect with Your Community: Participate in internal, peer-led inclusive communities and activities, including business resource groups, local volunteering events, and more environmental and social initiatives.
-
Application Observability Engineer
3 days ago
Only for registered members Mississauga $85,000 - $105,000 (CAD)· En tant Ingénieur en observabilité des applications, vous travaillerez à l'intersection des applications, de l'infrastructure et de la fiabilité. · Soutenir et dépanner les plateformes internes et les microservices fonctionnant dans des environnements basés sur serveurs ou con ...
-
Application Observability Engineer
3 days ago
Only for registered members Mississauga Full time $85,000 - $105,000 (CAD)En tant Ingénieur en observabilité des applications, vous travaillerez à l'intersection des applications, de l'infrastructure et de la fiabilité. · ...
-
Application Observability Engineer
2 days ago
Only for registered members Mississauga, Ontario, Canada+ Ingénier en Observabilité des Applications · + Soutenir et êt trendre les plateformes internes · + Travailler avec des environnements conteneurisés · + Maintenir la configuration des applications ...
-
Senior Staff Engineer, Network Observability
1 month ago
Only for registered members Mississauga, OntarioWe're looking for a Senior Staff Engineer, Network Observability, who will lead the design, · development and implementation of enterprise-scale network automation solutions. · ...
-
Observability Engineer
1 week ago
Sepal TorontoObselvability engineer helps understand debug operate complex production systems at scale. · Design complex distributed queries over massive log telemetry datasets. · Explore creative ways to challenge AI's reasoning ability log analysis skills. · ...
-
Senior Staff Engineer, Network Observability
1 month ago
Only for registered members Mississauga $99,960 - $151,368 (CAD)+Job summary · We're looking for a Senior Staff Engineer Network Observability who will lead the design development and implementation of enterprise-scale network automation solutions.Qualifications8+ years of network engineering experience with 5+ years focused on network automa ...
-
Senior Staff Engineer, Network Observability
1 month ago
Only for registered members Mississauga Full time $99,960 - $151,368 (CAD)We're looking for a Senior Staff Engineer, Network Observability who will take a hands‑on lead role in administering, maintaining, and enhancing TJX's enterprise network tools portfolio, · Oversee Administration and maintenance of NetOps, Thousand Eyes, NetBrain, NetScout, · Main ...
-
Observability SRE Engineer
2 weeks ago
Only for registered members TorontoWe are seeking an Observability SRE Engineer to join our team on a 6-12 month contract basis in Toronto. The ideal candidate will have 5+ years of experience in Observability or SRE and working knowledge of metrics, logs, and basic tracing concepts. · Hands-on experience with at ...
-
Staff Software Engineer, Observability
6 days ago
Only for registered members Toronto Full timeWe are seeking a talented Platform Software Engineer to join the team building the Cerebras Inference Platform. · You will be instrumental in designing developing and operating the core backend services APIs that power the Inference platform You'll build the software that allows ...
-
Dynatrace / Observability / APM Engineer
1 month ago
Only for registered members TorontoDynatrace APM Engineer position involves creating dashboards and charts within the Dynatrace platform and utilizing visualizations to deliver application and infrastructure monitoring information. · ...
-
Dynatrace / Observability / APM Engineer
1 month ago
Only for registered members Toronto, OntarioThis APM engineer will be responsible for leading the development and implementation of Dynatrace in monitoring applications, cloud and on-premises servers, and databases. They will analyze performance, establish baselines, create alerts using Dynatrace expertise. · ...
-
Lead Site Reliability Engineer, Observability
1 month ago
Only for registered members TorontoWe are building technology that changes how people work, collaborate and succeed together. Join us in shaping the future of intelligent sales. · Position Summary · We're seeking a Lead Site Reliability Engineer to rebuild and own our observability strategy across both agentic sys ...
-
Lead Observability Engineer – Sumo Logic
3 weeks ago
Only for registered members TorontoWe are seeking a highly skilled Lead Observability Engineer to lead a critical implementation of Sumo Logic for a client migrating from Dynatrace. · Lead the end-to-end implementation of Sumo Logic observability platform for AWS and EKS environments. · Migrate monitoring and aler ...
-
Senior Site Reliability Engineer, Observability
1 month ago
Only for registered members TorontoWe are looking for a Senior Site Reliability Engineer to join our Observability Team. As an SRE you will help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load.This job would be perfect for someone who has a strong DevOps me ...
-
Lead Observability Engineer – Sumo Logic
3 weeks ago
Only for registered members Toronto, OntarioWe are seeking a highly skilled Lead Observability Engineer to lead a critical implementation of Sumo Logic for a client migrating from Dynatrace. This role requires deep expertise in Sumo Logic, Site Reliability Engineering (SRE) practices, and Kubernetes (EKS) observability. · ...
-
Senior Site Reliability Engineer, Observability
1 month ago
Only for registered members Toronto, Ontario Remote jobWe are looking for a Senior SRE to help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load. · ...
- Only for registered members Toronto $148,000 - $249,000 (USD)
Waabi is looking for a Senior Staff Software Engineer to design and lead the architecture and development of Waabi's monitoring and observability stack.We are constantly expanding our compute footprint in the cloud, and need to expand our observability and monitoring capabilities ...
- Only for registered members Toronto Full time $148,000 - $249,000 (USD)
Waabi is the leader in Physical AI and we're unlocking the next era of autonomous transportation with technology that's powering commercial autonomous trucks and robotaxis. · ...
-
Site Reliability Engineer
1 month ago
Only for registered members Mississauga, OntarioSRE L2 Support Engineer with 5+ years of experience in SRE, AWS, Dynatrace (Observability tools), and Production Support. · ...
-
Senior Backend Engineer with AWS
1 month ago
Only for registered members Mississauga, OntarioSenior Backend Engineer with AWS OpenSearch Vector Index DB. · ...
-
Big Data Architect
1 month ago
Only for registered members Mississauga, OntarioWe are seeking a highly skilled and experienced personnel to join our team. · ...
Application Observability Engineer - Mississauga - TD SYNNEX
Description
Actual annual compensation offered will be based on several variables including geographic location, work experience, education, and skills/ achievements, and will be mutually agreed upon at the time of offer. The average compensation for this role is $85,000-105,000 CAD
About the Role
As an Application Observability Engineer, you'll operate at the intersection of applications, infrastructure, and reliability. You will support and troubleshoot the internal platforms and microservices that power critical business systems, ensuring services are healthy, observable, and performant in a large-scale enterprise environment. You'll partner closely with application developers, platform engineers, operations teams, and system administrators to investigate production issues, validate deployments, and maintain stable environments. This role is ideal for someone early to mid-career (1–4 years' experience) who enjoys hands-on troubleshooting, learning how distributed systems work in practice, and supporting modern, containerized platforms using tools like Kibana, Grafana, Jaegar, VictoriaMetrics, Redis and Kubernetes.
What You'll Do
Support and troubleshoot enterprise platforms
Work with containerized and microservices environments
Maintain configuration and application health
Troubleshoot supporting platform services
Test and validate APIs and services
Collaborate across engineering and operations teams
What We're Looking For
Required:
Preferred:
Working Conditions & Flexibility
La rémunération annuelle réelle offerte sera déterminée en fonction de plusieurs facteurs, notamment la région géographique, l'expérience de travail, la formation ainsi que les compétences et réalisations. Elle sera convenue mutuellement au moment de l'offre. La rémunération moyenne pour ce poste se situe entre 85 000 $ et 105 000 $ CAD.
À propos du poste
En tant Ingénieur en observabilité des applications, vous travaillerez à l'intersection des applications, de l'infrastructure et de la fiabilité. Vous soutiendrez et dépannerez les plateformes internes et les microservices qui alimentent des systèmes d'affaires essentiels, en veillant à ce que les services demeurent stables, observables et performants dans un environnement d'entreprise à grande échelle. Vous collaborerez étroitement avec les développeurs d'applications, les ingénieurs de plateformes, les équipes d'exploitation et les administrateurs systèmes pour enquêter sur les problèmes en production, valider les déploiements et maintenir des environnements stables. Ce rôle convient parfaitement à une personne en début ou milieu de carrière (1 à 4 ans d'expérience) qui aime résoudre des problèmes concrets, comprendre le fonctionnement réel des systèmes distribués et soutenir des plateformes modernes et conteneurisées à l'aide d'outils comme Kibana, Grafana, Jaeger, VictoriaMetrics, Redis et Kubernetes.
Vos responsabilités
Soutenir et dépanner les plateformes d'entreprise
Travailler avec des environnements conteneurisés et microservices
Maintenir la configuration et la santé des applications
Dépanner les services de plateforme de soutien
Tester et valider les API et services
Collaborer avec les équipes d'ingénierie et d'exploitation
Ce que nous recherchons
Exigences :
Atouts :
Conditions de travail et flexibilité
Key Skills
Application Monitoring, ElasticSearch, Grafana, IT Production Support, Kubernetes, Site Reliability EngineeringAt TD SYNNEX, our values guide everything we do: Together, We Own It, We Dare to Go, We Grow and Win, and above all, We Do the Right Thing. These principles shape how we work with each other, our partners, and our communities as we drive innovation and create lasting impact.
What's In It For You?
Don't meet every single requirement? Apply anyway.
At TD SYNNEX, we're proud to be recognized as a great place to work and a leader in the promotion and practice of diversity, equity and inclusion. If you're excited about working for our company and believe you're a good fit for this role, we encourage you to apply. You may be exactly the person we're looking for
-
Application Observability Engineer
Only for registered members Mississauga
-
Application Observability Engineer
Full time Only for registered members Mississauga
-
Application Observability Engineer
Only for registered members Mississauga, Ontario, Canada
-
Senior Staff Engineer, Network Observability
Only for registered members Mississauga, Ontario
-
Observability Engineer
Sepal- Toronto
-
Senior Staff Engineer, Network Observability
Only for registered members Mississauga
-
Senior Staff Engineer, Network Observability
Full time Only for registered members Mississauga
-
Observability SRE Engineer
Only for registered members Toronto
-
Staff Software Engineer, Observability
Full time Only for registered members Toronto
-
Dynatrace / Observability / APM Engineer
Only for registered members Toronto
-
Dynatrace / Observability / APM Engineer
Only for registered members Toronto, Ontario
-
Lead Site Reliability Engineer, Observability
Only for registered members Toronto
-
Lead Observability Engineer – Sumo Logic
Only for registered members Toronto
-
Senior Site Reliability Engineer, Observability
Only for registered members Toronto
-
Lead Observability Engineer – Sumo Logic
Only for registered members Toronto, Ontario
-
Senior Site Reliability Engineer, Observability
Only for registered members Toronto, Ontario
-
Senior / Staff Software Engineer (Observability / SRE)
Only for registered members Toronto
-
Senior / Staff Software Engineer (Observability / SRE)
Full time Only for registered members Toronto
-
Site Reliability Engineer
Only for registered members Mississauga, Ontario
-
Senior Backend Engineer with AWS
Only for registered members Mississauga, Ontario
-
Big Data Architect
Only for registered members Mississauga, Ontario