Senior Engineering Manager - Compute
The Compute organization has the mission of building and delivering common software, frameworks, and constructs for the rest of Reddit. Reddit has a complex production serving environment consisting of AWS, GCP and running several compute clusters (Kubernetes) across both.
We are looking to expand our engineering management in two specific domains and thus are interested in candidates with experience in either or both of the following areas.
Deployment Platform / Infrastructure:
A software engineering team that is responsible for building the systems Reddit uses to deploy our software across the globe. This team is building APIs, services, and managing the underlying infrastructure for these systems. This team works closely with our compute platforms to ensure we can reliably deploy and scale Reddit.
Compute Platform / Infrastructure:
Part software engineering (SWE) and part site reliability engineering (SRE) this group’s responsibility spans from the Linux kernel up to our Kubernetes autoscalers and multi-cluster orchestration/federation. Reddit runs on this platform and thus it must balance developer experience, reliability, and safety in supporting one of the world’s largest websites.
Our teams are building and maintaining the complex software needed to rapidly grow and sustain Reddit's cloud compute production environment. Having experience managing high-performance software engineering teams is important for success in this role. Previous experience with distributed systems at web scale (thousands of nodes, hundreds of systems) is a plus.
This is a high impact role where your positive contributions will be amplified through the Reddit technology stack and lead to direct business impact. You will work closely with other infrastructure teams, Developer Experience, Observability, Compute, Transport and dozens of product teams, such as the ML infra team, that appreciate being able to run large-scale software securely, reliably, efficiently, and scalably.
You will:
Lead: Work with the team to select, scope, and drive high leverage projects that align with Reddit’s goals to scale our infrastructure to a large multiple of what it is today.
Build: Hire, onboard, and build out your team to execute on a strategy and create more efficient, more reliable Storage systems and Caching infrastructures.
Amplify: Mentor your ICs and be a leader for the team.
Collaborate: Work together with a variety of teams across Reddit Engineering.
Evolve: Learn and improve your own technical and non-technical abilities.
What we’re looking for:
5+ years experience in people management of high performing engineering teams.
7+ years experience on cloud infrastructure or deployment systems.
This experience should include ample time developing and shipping software.
Strong focus on scalability, performance, and quality. You are an undying advocate for the user, and you have a deep intuition for how critical infra systems work at scale.
High empathy, excellent communication skills, and the ability to find compromise working across the entire engineering org.
Experience in Go, Kubernetes, Argo, Flux, and other CNCF landscape projects is a huge plus.
Benefits:
Comprehensive Healthcare Benefits
401k Matching
Workspace benefits for your home office
Personal & Professional development funds
Family Planning Support
Flexible Vacation (please use them!) & Reddit Global Wellness Days
4+ months paid Parental Leave
Paid Volunteer time off
About the job
Apply for this position
Senior Engineering Manager - Compute
The Compute organization has the mission of building and delivering common software, frameworks, and constructs for the rest of Reddit. Reddit has a complex production serving environment consisting of AWS, GCP and running several compute clusters (Kubernetes) across both.
We are looking to expand our engineering management in two specific domains and thus are interested in candidates with experience in either or both of the following areas.
Deployment Platform / Infrastructure:
A software engineering team that is responsible for building the systems Reddit uses to deploy our software across the globe. This team is building APIs, services, and managing the underlying infrastructure for these systems. This team works closely with our compute platforms to ensure we can reliably deploy and scale Reddit.
Compute Platform / Infrastructure:
Part software engineering (SWE) and part site reliability engineering (SRE) this group’s responsibility spans from the Linux kernel up to our Kubernetes autoscalers and multi-cluster orchestration/federation. Reddit runs on this platform and thus it must balance developer experience, reliability, and safety in supporting one of the world’s largest websites.
Our teams are building and maintaining the complex software needed to rapidly grow and sustain Reddit's cloud compute production environment. Having experience managing high-performance software engineering teams is important for success in this role. Previous experience with distributed systems at web scale (thousands of nodes, hundreds of systems) is a plus.
This is a high impact role where your positive contributions will be amplified through the Reddit technology stack and lead to direct business impact. You will work closely with other infrastructure teams, Developer Experience, Observability, Compute, Transport and dozens of product teams, such as the ML infra team, that appreciate being able to run large-scale software securely, reliably, efficiently, and scalably.
You will:
Lead: Work with the team to select, scope, and drive high leverage projects that align with Reddit’s goals to scale our infrastructure to a large multiple of what it is today.
Build: Hire, onboard, and build out your team to execute on a strategy and create more efficient, more reliable Storage systems and Caching infrastructures.
Amplify: Mentor your ICs and be a leader for the team.
Collaborate: Work together with a variety of teams across Reddit Engineering.
Evolve: Learn and improve your own technical and non-technical abilities.
What we’re looking for:
5+ years experience in people management of high performing engineering teams.
7+ years experience on cloud infrastructure or deployment systems.
This experience should include ample time developing and shipping software.
Strong focus on scalability, performance, and quality. You are an undying advocate for the user, and you have a deep intuition for how critical infra systems work at scale.
High empathy, excellent communication skills, and the ability to find compromise working across the entire engineering org.
Experience in Go, Kubernetes, Argo, Flux, and other CNCF landscape projects is a huge plus.
Benefits:
Comprehensive Healthcare Benefits
401k Matching
Workspace benefits for your home office
Personal & Professional development funds
Family Planning Support
Flexible Vacation (please use them!) & Reddit Global Wellness Days
4+ months paid Parental Leave
Paid Volunteer time off