Site Reliability Engineering Team Lead

Art of Problem Solving

San Diego, CA, USA

Full time

$140-170k (annually)



Nov 14

As the Site Reliability Engineering Team Lead, you’ll play a key role in supporting and scaling the technology that helps us discover, inspire, and train the great problem solvers of the next generation. In this position, you will lead our cloud modernization efforts and maintain existing infrastructure across all of our products and services, supporting a growing user base currently numbering around one million. This position is ideal for a detail-oriented and strategic engineering leader who will set and execute our cloud infrastructure strategy alongside their team of two Site Reliability Engineers.

The Site Reliability Engineering Team Lead will:

  • Manages a team of Site Reliability Engineers, including hiring, evaluating, training, and developing their team members as well building a collaborative and productive team culture. 
  • Owns and maintains company cloud infrastructure strategy and SRE team roadmap.
  • Implements/evaluates reliability metrics for our products and services, and advocates for projects to reduce our exposure to or better understand reliability risks.
  • Runs, evaluates, and improves SRE processes and procedures including task workflow, reviews, launches, etc., including managing regular team responsibilities and leading the maintenance of team documentation.
  • Provides technical expertise by collaborating with stakeholders to make high-level decisions related to their team, providing technical direction to team members, and being a knowledge base of information for their team.
  • Allocates team resources by mapping team members to tasks and projects, helping estimate time for their team members to complete projects, and advocating for engineering resources as needed.
  • Drives continuous improvement in the SRE space and the broader Engineering Department by proposing and advocating for projects that will improve reliability, security, and/or maintainability, improve development workflow, remove operational bottlenecks, or otherwise improve engineering department bandwidth. 
  • Is accountable for the overall risk management and reduction practices and contributes to risk management practices in other engineering teams.
  • Communicates cross-team by being the main point of contact between the SRE team and other engineering teams, and between their team and company stakeholders. Facilitates connections between their team members and other teams, and regularly works with engineering managers, engineering team leads, project managers.
  • Performs all the duties of a Site Reliability Engineer.

The ideal candidate has:

  • Expert-level experience planning, designing, implementing, securing, and monitoring scalable infrastructure for web applications in the AWS ecosystem
  • Experience leading technical strategy and execution in projects
  • Experience deploying and managing Infrastructure-as-Code with Terraform
  • Familiarity with Node.js (preferred) and/or PHP
  • Familiarity with MariaDB, PostgreSQL, Redis, Apache, and nginx or similar technologies.
  • Prior full-stack or backend software engineering experience is preferred
  • Prior people management experience, especially in an SRE or DevOps role, is preferred

Why Join AoPS:

This is a hybrid full-time position based at our headquarters in San Diego, CA. The full salary range for this position is 140k-170k with a 6year-end bonus. Here are some things you can look forward to:

  • Impact: The opportunity to drive the reliability and scalability of our infrastructure, supporting our growing number of customers
  • Culture: Work and collaborate with an organization filled with builders and life-long learners who strive to discover, inspire, and train the great problem solvers of the next generation
  • Flexibility: Casual work environment with a hybrid work week and flexible scheduling
  • Benefits: Multiple options for Medical, Dental and Vision plans   
  • Future Planning: 401K with company match
  • Quality of Life: PTO Plan and supportive leadership that gives you the work-life balance you deserve
  • Ease of Transition: Relocation bonus (if currently located outside of San Diego)

Background Check: 

Please note that employment is contingent on the successful completion of a background check.

Work Authorization:

Please note that in order to be considered for this position you must be legally authorized to work in the US. We are unable to offer sponsorship, including STEM-OPT and H-1B. 

About AoPS:

Art of Problem Solving (AoPS) is on a mission to discover, inspire, and train the great problem solvers of the next generation. Since 2003, we have trained hundreds of thousands of the country’s top students, including nearly all the members of the US International Math Olympiad team, through our online school, in-person academies, textbooks, and online learning systems. While our primary focus has been math for most of our history, through the years we have expanded our unique problem solving curriculum into more subjects, such as language arts, science, and computer science.

Apply for this position Back to job

You must be logged in to apply to this job.