Senior Scientific Computing Analyst - Victoria, Canada - University of Victoria

    Default job background
    Description

    About this

    Opportunity

    We operate OpenStack as our IaaS cloud platform, with multiple deployments including the Arbutus Cloud - Canada's largest research cloud service, operated in partnership with the Digital Research Alliance of Canada. We also operate a growing number of Kubernetes clusters above OpenStack, including large deployments serving national and international research projects in physics and astronomy.

    RCS is looking for candidates with experience in building scientific and research computing systems based on large scale self-hosted cloud compute environments, with particular focus on OpenStack and Kubernetes. This position will also have the opportunity to work on other exciting technologies like Ceph, Prometheus, Grafana, Elasticsearch, and more - including the opportunity to help determine what new technologies will join that list

    We are looking for motivated individuals that want to be part of a collaborative team supporting Research Computing, with a demonstrated ability to develop strong working relationships between technology and research stakeholders. This position will work with a highly skilled, collaborative team of experts across many knowledge domains, helping to shape the design and creation of leading edge systems for some of the largest and most consequential research initiatives in the world.

    * This position is eligible for a Hybrid Work Arrangement*

    The salary range for this position is:

    Recruitment range: $92,863- $102,403 starting salary determined by the PEA Collective Agreement.

    Performance range: starting salary to max of $120,822 is available through annual performance increases.

    Job Summary

    Mandate:

    Reporting to the Manager and Architect, ARC Infrastructure, the Senior Scientific Computing Analyst works as part of the Research Computing Services Infrastructure team to ensure the operational effectiveness of the university's research servers and storage. Members of this team design, build, and maintain critical research computing systems: web and database servers; large, high-performance research computing systems (HPC); cloud infrastructure and container orchestration. These systems are depended upon by researchers both at UVic, from institutions across the country, and across international collaborations. These systems are required to be in operation 24 hours per day, 365 days of the year and decisions regarding these systems can impact UVic's obligations to other parties beyond the institution.

    This position brings a scientific perspective to ARC systems design and operation, ensuring that the priorities and requirements of researchers are represented in the team's work and its strategic plans. By maintaining specialized knowledge of the science-specific techniques researchers are applying to their interactions with ARC infrastructure, as well as a broad and forward-looking understanding of computing, storage, and networking technologies, the incumbent proposes, advocates for, and delivers system designs which are highly functional for researchers, as well as scalable, reliable, and maintainable by the ARC Infrastructure team.

    Objectives:

    This position has an extremely varied set of objectives: technical, interpersonal, and scientific. The incumbent maintains relationships with key researcher communities, including but not limited to astronomy and high energy physics, and contributes directly to the success of major scientific collaborations like the ATLAS, BELLE-II, and CANFAR projects. They also act as a practice leader among their technical teammates, demonstrating and advocating for best practices in leveraging emerging technologies, and balancing the pursuit of cutting-edge designs with the team's mission to deliver stable and reliable infrastructure. As a respected contributor and practitioner, the incumbent champions the work of the team, representing it in relationships with external partners and funders, contributing to regional and national digital research infrastructure collaborations, and presenting at technology and scientific conferences.

    This role may need to work outside of normal work hours on an emergency or pre-scheduled basis. The role may need to travel out of town/country.

    Job Requirements

    This position requires a level of education, training, and experience equivalent to an advanced degree in either a relevant science (physics or astronomy) or computing (computer science or computer engineering) field, plus a minimum of 5 years of relevant experience in an academic or research environment. An equivalent combination of education, training, and experience may be considered.

    Knowledge, skills, and abilities include:

  • Expert knowledge of research computing practices, tools, and patterns, in both high-performance computing (HPC) and cloud environments. Specific examples include grid and batch computing systems like ATLAS Harvester or Condor, and batch schedulers like Slurm; federated data access services like dCache, EOS, CVMFS, Rucio, and VOMS; computational accelerator libraries like CUDA and ROCM; and accounting and performance benchmarking tools like APEL and HEPSpec.
  • In-depth knowledge and experience with computing, storage, and network infrastructure systems and services, with a specific focus on Kubernetes, OpenStack, and Ceph, as well as multi-tenant technology patterns such as network encapsulation (VXLAN), distributed filesystems (CephFS, Lustre, GPFS), and classical and container virtualization (KVM/QEMU, docker/containerd/podman).
  • Working knowledge of provisioning and configuration management tools such as Ansible, Terraform, Cobbler, and OpenStack Ironic.
  • Substantial experience with scripting languages like Python and Bash, as well as data encapsulation formats like YAML and JSON.
  • High degree of attention to detail is required, as is the ability to understand complex technical concepts and the need to maintain broad and in-depth technical knowledge of all aspects of Advanced Research Computing infrastructure.
  • High level of problem solving abilities; must be able to effectively identify and resolve unusual and highly complex technical problems.
  • Ability to effectively manage multiple tasks and priorities and work under pressure to meet time sensitive and mission critical deadlines in a complex environment.
  • Ability to take initiative and work with limited direction.
  • Ability to mentor and coach technical staff and teams, and act as a resource.
  • Ability to successfully contribute to complex projects: developing project work plans; monitoring and directing the activities of a project team.
  • Excellent written and oral communications skills.
  • Ability to collaborate, build and maintain positive relationships with diverse individuals and work effectively in a team environment.
  • Commitment to valuing diversity and contributing to an inclusive and respectful working and learning environment.
  • Assets or Preferences:

  • Knowledge and direct experience of a scientific project or collaboration with a focus on research computing such as ATLAS, BELLE-II, SKA, or CANFAR.
  • Experience leading and advocating for technical or scientific computing practices across multiple groups of stakeholders.
  • Direct experience operating Kubernetes in production environments, and/or deploying scientific or research computing workloads on Kubernetes.
  • We acknowledge and respect the Lək̓wəŋən (Songhees and Esquimalt) Peoples on whose territory the university stands, and the Lək̓wəŋən and WSÁNEĆ Peoples whose historical relationships with the land continue to this day.

    Accessibility Statement If you anticipate needing accommodations for any part of the application and hiring process contact: Any personal information provided will be maintained in confidence.