Automation Infrastructure System Admin - Markham, Canada - Advanced Micro Devices, Inc

Sophia Lee

Posted by:

Sophia Lee

beBee Recruiter


Description

Overview:

WHAT YOU DO AT AMD CHANGES EVERYTHING
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world.

Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded.

Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. This is who we are at our best. One Company. One Team.

AMD together we advance_


Responsibilities:


Automation

Infrastructure System Admin

THE ROLE:

Our automation & tools code base runs from pre-silicon environments, prototype lab systems to the fastest supercomputers in existence. Join a team using modern industry best practices across the full stack spectrum. Make a difference by helping us accelerate AMD's pace of innovation.


As part of the Datacenter GPU/AP Infrastructure team, you will be involved in the development of automation tools, content, and infrastructure to validate datacenter GPU/CPU hardware and software.

Your work will enable validation teams to improve their processes through developing new automation features, or by helping debug or co-create their automated test content.


THE PERSON:


  • Linux Systems Administrator with background with modern best practices and stack understanding
  • Strong problemsolving and troubleshooting skills
  • Eagerness to learn, adapt to new technologies, and stay uptodate with industry trends
  • Customer service mindset for providing support to lab teams
  • Detail oriented close attention to the finer details of systems and processes to identify potential issues and areas for improvement
  • Excellent written and verbal communication skills

KEY RESPONSIBILITIES:


  • Support inhouse automation and infrastructure solutions that can scale across multiple sites and geographies
  • Respond to and troubleshoot incidences reported by internal users or infrastructure alerts
  • Perform postmortem analysis as well as improve processes or add solutions to prevent future outages
  • Help with capacity planning, performance tuning and optimization of solutions

PREFERRED EXPERIENCE:


  • Experience working in a technical support and/or operations role
  • Understanding of network, OSI model, and troubleshooting
  • Strong understanding of Linux, Virtualization, and proficiency in Windows
  • Understanding of incident management, including incident response, and postmortem analysis
  • Experience with Python programming and Ansible
  • Awareness of emerging trends and technologies in the reliability and infrastructure space, such AI/MLbased monitoring solutions
  • Basic understanding of Kubernetes and containers
  • Database knowledge of Postgres / MySQL
  • Experience with tools and techniques for collecting, analyzing, and monitoring log data, such as ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk
  • Azure cloud knowledge for managing infrastructure and services
  • Familiarity with Agile to effectively participate in the team's work process
  • Proficiency in using version control systems like Git and Infrastructure as Code
  • Familiarity with CI/CD tools and processes, like Jenkins, GitLab CI, or Azure DevOps,
  • Familiarity with lab environments is an asset
  • Familiarity with SRE best practices, such as the Google SRE handbook and other industry standards
  • Nice to have: certifications in relevant technologies and/or methodologies (e.g., Azure/Cloud, Kubernetes, or SCRUM/Agile)

ACADEMIC CREDENTIALS:


  • A background in computer science, engineering, or a related field

LOCATION:

Markham, Ontario


Qualifications:

  • Benefits offered are described: _AMD benefits at a glance.

More jobs from Advanced Micro Devices, Inc