Lead Site Reliability Engineer

hims & hers

Full-time

USA

$150k-$175k per year

engineer

java

python

docker

sql

The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

About the Role:

We are seeking a Lead Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things.

You Will:

Design and implement SRE practices ensuring availability, scalability and observability of production systems with a strong focus on excellent customer experience
Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
Use automation extensively to design, configure, manage, and monitor systems in support of our product development teams
Understanding of Infrastructure and infra automation (Infrastructure as Code)
Manage incidents and emergency response, track outages, ensure data integrity and engineer releases to promote safe, efficient and rapid deployments
Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
Improve the codebase by resolving logic issues, deprecating unused code, etc.
Implement monitoring, logging, alerting and SLO Reporting
Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives
Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence
Provides reviews on design documents from internal and external teams
Performs more-complex tasks using highly-specialized knowledge and advanced business experience
Resolves complex tickets in creative manners
Develops and leads large and highly-complex cross-functional projects or programs
Determines solutions to blockers, identify tasks, and developing solutions as appropriate
Responsible for at least for 1 major delivery domain and accountable for all the aspects of SRE for that domain
Develops standards, tools, and knowledge requirements for skill and career development

You Have:

10+ years as a software engineer, shipping production code
5+ years of experience as a Site Reliability Engineer or Production support Engineer
Bachelor's degree in Computer Science, Engineering, or related field, or relevant years of work experience
Experience with service-oriented architectures and microservices at scale
Strong proficiency with RDBMS databases (PostgreSQL, MySQL, SQL Server, etc.)
Strong proficiency in SQL scripting
Proficiency developing in one or more languages such as Java, Kotlin, Python, and/or others
Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)
Knowledge of CDN, typescript frameworks, and GQL.
Knowledge and good understanding of any pub/sub / Queue messaging systems
Proficiency in Git or other VCS
Experience with configuring, customizing, and extending monitoring tools (Datadog, Prometheus, New Relic etc.)
Excellent debugging and troubleshooting skills
Strong technical competency, with a data-driven analytical approach towards solving complex challenges
Have a systematic problem-solving approach, coupled with strong and effective communication skills and a sense of drive
- Nice-to-have: Experience with Terraform or other IAC tools such as Chef, Puppet or Ansible

Our Benefits (there are more but here are some highlights):

Competitive salary & equity compensation for full-time roles
Unlimited PTO, company holidays, and quarterly mental health days
Comprehensive health benefits including medical, dental & vision, and parental leave
Employee Stock Purchase Program (ESPP)
Employee discounts on hims & hers & Apostrophe online products
401k benefits with employer matching contribution
Offsite team retreats

#LI-Remote

About the job

Full-time

USA

$150k-$175k per year

4 Applicants

Posted 4 months ago

engineer

java

python

docker

sql

30,000+
REMOTE JOBS

Unlock access to our database and
kickstart your remote career

Join Premium

Lead Site Reliability Engineer

hims & hers

The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

About the Role:

You Will:

Design and implement SRE practices ensuring availability, scalability and observability of production systems with a strong focus on excellent customer experience
Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
Use automation extensively to design, configure, manage, and monitor systems in support of our product development teams
Understanding of Infrastructure and infra automation (Infrastructure as Code)
Manage incidents and emergency response, track outages, ensure data integrity and engineer releases to promote safe, efficient and rapid deployments
Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
Improve the codebase by resolving logic issues, deprecating unused code, etc.
Implement monitoring, logging, alerting and SLO Reporting
Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives
Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence
Provides reviews on design documents from internal and external teams
Performs more-complex tasks using highly-specialized knowledge and advanced business experience
Resolves complex tickets in creative manners
Develops and leads large and highly-complex cross-functional projects or programs
Determines solutions to blockers, identify tasks, and developing solutions as appropriate
Responsible for at least for 1 major delivery domain and accountable for all the aspects of SRE for that domain
Develops standards, tools, and knowledge requirements for skill and career development

You Have:

10+ years as a software engineer, shipping production code
5+ years of experience as a Site Reliability Engineer or Production support Engineer
Bachelor's degree in Computer Science, Engineering, or related field, or relevant years of work experience
Experience with service-oriented architectures and microservices at scale
Strong proficiency with RDBMS databases (PostgreSQL, MySQL, SQL Server, etc.)
Strong proficiency in SQL scripting
Proficiency developing in one or more languages such as Java, Kotlin, Python, and/or others
Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)
Knowledge of CDN, typescript frameworks, and GQL.
Knowledge and good understanding of any pub/sub / Queue messaging systems
Proficiency in Git or other VCS
Experience with configuring, customizing, and extending monitoring tools (Datadog, Prometheus, New Relic etc.)
Excellent debugging and troubleshooting skills
Strong technical competency, with a data-driven analytical approach towards solving complex challenges
Have a systematic problem-solving approach, coupled with strong and effective communication skills and a sense of drive
- Nice-to-have: Experience with Terraform or other IAC tools such as Chef, Puppet or Ansible

Our Benefits (there are more but here are some highlights):

Competitive salary & equity compensation for full-time roles
Unlimited PTO, company holidays, and quarterly mental health days
Comprehensive health benefits including medical, dental & vision, and parental leave
Employee Stock Purchase Program (ESPP)
Employee discounts on hims & hers & Apostrophe online products
401k benefits with employer matching contribution
Offsite team retreats

#LI-Remote

About the job

30,000+
REMOTE JOBS

Lead Site Reliability Engineer

About the Role:

You Will:

You Have:

Our Benefits (there are more but here are some highlights):

Working Nomads

Jobs by Category

Jobs by Position Type

Jobs by Region

Jobs by Skill

Jobs by Country