Senior Site Reliability Engineer - Ai

Full-time
USA
Posted 1 year ago
Go ad-free with Premium ×
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

Role Description: Senior Site Reliability Engineer - Ai

 

Role Purpose:

We are looking for a talented and motivated Senior Site Reliability Engineer (SRE) to join our dynamic and growing team. In this role, you'll collaborate closely with technical teams to deliver and maintain a high-performance, large-scale machine learning platform. Your contributions will be key in ensuring that our systems are reliable, performant, and capable of scaling with increasing customer demands. If you're passionate about engineering solutions that drive exceptional user experiences, we want you to help us build next-gen applications with high availability and rich features.

As our Senior SRE, you will empower our end users with seamless and reliable experiences while leveraging your expertise to solve complex, large-scale problems. If you are ready for your next challenge and work on innovative projects, optimize systems, and work cross-functionally to deliver

real-world solutions, we encourage you to apply.

*Principals only. No recruiters.

Responsibilities:

  • Design, develop, performance, and overall health of the production system, ensuring a seamless experience for users.

  • Develop software and systems to enhance infrastructure management, automation, and application deployment.

  • Work with cross-functional teams to streamline processes, improve product reliability, and ensure high-quality, timely releases.

  • Drive automation initiatives to improve system sustainability, reliability, and operational efficiency.

  • Provide engineering support for large-scale distributed software applications, ensuring smooth operations and quick resolutions of technical issues.

Requirements:

  • Bachelor’s degree in Computer Science or a related field required.

  • 7+ years of experience and a proven track record of success in technical engineering, with hands-on experience in scaling and solving complex systems.

  • 3+ years of experience in one or more high-level programming languages (e.g., Python, Go, C/C++, JavaScript) with experience in structured programming and object-oriented design.

  • Hands-on experience with distributed storage technologies (e.g., NFS, HDFS, Amazon S3) and resource management frameworks (e.g., Terraform, Kubernetes, Yarn).

  • Ability to proactively identify system inefficiencies, bottlenecks, and areas for improvement.

  • Comfort with advanced coding beyond simple scripts, with a focus on building robust, maintainable solutions.

Go ad-free with Premium ×
About the Job
Full-time
USA
Posted 1 year ago
Check if your resume is a good fit
25/100
Get Full Report
+ 1,284 new jobs added today
30,000+
Remote Jobs

Don't miss out — new listings every hour

Join Premium

Senior Site Reliability Engineer - Ai

The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

Role Description: Senior Site Reliability Engineer - Ai

 

Role Purpose:

We are looking for a talented and motivated Senior Site Reliability Engineer (SRE) to join our dynamic and growing team. In this role, you'll collaborate closely with technical teams to deliver and maintain a high-performance, large-scale machine learning platform. Your contributions will be key in ensuring that our systems are reliable, performant, and capable of scaling with increasing customer demands. If you're passionate about engineering solutions that drive exceptional user experiences, we want you to help us build next-gen applications with high availability and rich features.

As our Senior SRE, you will empower our end users with seamless and reliable experiences while leveraging your expertise to solve complex, large-scale problems. If you are ready for your next challenge and work on innovative projects, optimize systems, and work cross-functionally to deliver

real-world solutions, we encourage you to apply.

*Principals only. No recruiters.

Responsibilities:

  • Design, develop, performance, and overall health of the production system, ensuring a seamless experience for users.

  • Develop software and systems to enhance infrastructure management, automation, and application deployment.

  • Work with cross-functional teams to streamline processes, improve product reliability, and ensure high-quality, timely releases.

  • Drive automation initiatives to improve system sustainability, reliability, and operational efficiency.

  • Provide engineering support for large-scale distributed software applications, ensuring smooth operations and quick resolutions of technical issues.

Requirements:

  • Bachelor’s degree in Computer Science or a related field required.

  • 7+ years of experience and a proven track record of success in technical engineering, with hands-on experience in scaling and solving complex systems.

  • 3+ years of experience in one or more high-level programming languages (e.g., Python, Go, C/C++, JavaScript) with experience in structured programming and object-oriented design.

  • Hands-on experience with distributed storage technologies (e.g., NFS, HDFS, Amazon S3) and resource management frameworks (e.g., Terraform, Kubernetes, Yarn).

  • Ability to proactively identify system inefficiencies, bottlenecks, and areas for improvement.

  • Comfort with advanced coding beyond simple scripts, with a focus on building robust, maintainable solutions.