High-Performance Cluster Administrator - School of Computer Science - MLD
Carnegie Mellon University is a private, global research university that stands among the world’s most renowned education institutions. With ground-breaking brain science, path-breaking performances, creative start-ups, big data, big ambitions, hands-on learning, and a whole lot of robots, CMU doesn’t imagine the future, we invent it. If you’re passionate about joining a community that challenges the curious to deliver work that matters, your journey starts here!
The Machine Learning Department (MLD) is one of the academic departments at the School of Computer Science. MLD educates the leaders of tomorrow and performs groundbreaking research in all areas of Machine Learning and related fields. We are seeking a High-Performance Cluster Administrator to join our team! This is an excellent opportunity for those who thrive in an exciting and creative work environment. In this role, you will lead all aspects of server management within a central group supporting enterprise-wide systems or within a college or division overseeing multi-user systems, both in terms of hardware and software support.
Your core responsibilities will include:
- System administration: Proactively manages/configures/troubleshoots software; develop/maintain policies, perform monitoring for the stable and fair operation of shared computing resources; use scripts to ensure optimal performance of systems; provide technical problem solving for Linux based systems and high-performance computing (HPC) environments
- Computing resource strategist: Managing existing, heterogeneous compute infrastructure within the department, and understand the use cases for each - shared GPU resources, individual development servers, and AWS resources
- Course technology administration: Determine best practices for the use of AWS resources as part of classes, distribute documentation, and facilitate the efficient student use of AWS.
- User communications: Communicate with users regarding maintenance plans and upgrades of HPC Clusters and Unix servers
- Training and documentation: Developing training materials for users to understand cluster computing and its common bottlenecks, such as improper use of head nodes and disks
- Department liaison: Serve as the primary point of contact between MLD and existing compute infrastructure groups at CMU, including the Pittsburgh Supercomputer Center and the SCS Computing Facilities Team; ensure smooth integration of MLD shared compute resources with CMU/SCS Computing practices.
- Other duties as assigned
Inclusion and cultural sensitivity are valued competencies at CMU. Therefore, we are in search of a team member who can effectively interact with a varied population of diverse audiences. We are looking for someone who shares our values and who will support the mission of the university through their work.
This is a great opportunity for someone to work in a creative, dedicated, driven team, in a collaborative environment committed to technical innovation, inclusion, and work-life balance.
Qualifications:
- Bachelor’s degree in Computer Science
- 3-5 years’ experience of systems administration, and operating system administration experience
- Proficiency with UNIX system administration, networking and network storage, including distributed NFS filesystems such as NFS, ZFS and Lustre
- Experience with at least one scripting language for process automation such as Ansible, bash, or python
- Familiarity with academic workload schedulers, such as Slurm
- Experience with containerization technology, including specifically both Docker and Singularity
- Experience maintaining machines with NVIDIA GPUs, including driver updates, and debugging
- Experience developing infrastructure solutions and policies for multiuser systems
- Experience with AWS or similar cloud vendors, and the ability to codify and document best practices for their usage
- Customer service skills; Effectively communicate in verbal and written fashions in a manner that contributes to a positive experience for all.
- Or a combination of education and relevant experience from which comparable knowledge is demonstrated may be considered
Requirements:
- Successful background check
Are you interested in this exciting opportunity?! Apply today!
Location
Pittsburgh, PA
Job Function
Software/Applications Development/Engineering
Position Type
Staff – Regular
Full Time/Part time
Full time
Pay Basis
Salary
More Information:
- Please visit “Why Carnegie Mellon” to learn more about becoming part of an institution inspiring innovations that change the world.
- Click here to view a listing of employee benefits
- Carnegie Mellon University is an Equal Opportunity Employer/Disability/Veteran.
- Statement of Assurance
{{notification.msg}}