Staff Platform Engineer
About The Team:
We have grown rapidly to where we are today serving billions of HTTP requests daily.. We achieved this scale by writing scale-sensitive components in languages like Rust and Go. This potent combination of high performance with efficient resource utilization has given us an incredible competitive edge.
We are hiring a Platform Engineer to help us continue to scale by operating and engineering the future of our infrastructure. We are maintaining 99.95% uptime today, and we are investing to ensure we maintain that as the business continues to grow and as the product evolves.
Your primary task will be software engineering with a focus on infrastructure, operations, and automation. You'll be building systems to run our product, improving internal services, and advising product teams on architecture as it relates to the operability of the service.
The systems you'll be responsible for include all of the services which power our product. This ranges from off-the-shelf services like PostgreSQL, Scylla, Redis, Kafka, Kubernetes, etc. to our in-house services such as the Rails web app, various Rust backend services, and our high-performance API layer written in Go.
You'll be working with Kubernetes to automate our data center operations and writing operational services to automate database operations. One of the key challenges in this role is to not only understand systems to the point of being able to manually operate by hand but also to understand in sufficient detail to write software systems to automate such operations.
Our blog contains more information about the OneSignal Engineering career ladder, remote-first culture, and our diverse team.
What You'll Do:
Optimize and Elevate Performance: Identify bottlenecks in our systems and unleash your creativity to introduce cutting-edge optimizations. You'll have the chance to improve the performance of our databases and evaluate innovative storage technologies that will elevate our infrastructure to new heights.
Forge Infrastructure as Code: Take the lead in setting up robust infrastructure and configuration as code using Kubernetes and Terraform. You'll be at the forefront of shaping our foundational architecture, ensuring it’s both resilient and scalable.
Drive Observability and Monitoring: Establish and maintain a state-of-the-art observability and monitoring stack. Your insights will enable us to stay ahead of potential issues, ensuring our services remain reliable and performant.
Craft the Golden Path for CI/CD: Define and implement best practices for continuous integration and deployment. Your work will streamline the deployment process for our engineering teams, allowing them to roll out new features swiftly and safely.
Collaborate Across Teams: Work closely with various engineering teams to architect highly scalable and observable services. Your collaboration will be essential in creating a cohesive and efficient development environment.
Be a Key Player in Incident Response: Join the on-call rotation and be a crucial part of maintaining our systems' health. Your expertise will be vital in troubleshooting and resolving issues, ensuring our services always meet the highest standards.
What You'll Bring:
At least 8 years of platform experience
Experience operating reliable production systems at scale
Knowledge of Linux systems internals
Desire and ability to automate tasks
Experience in managing PostgreSQL for high-scale throughput systems or similar experience with other relatable SQL datastores’.
Operational experience deploying and managing Kubernetes
Experience working with Cloud Providers (AWS/GCP/Azure)
We value a variety of experiences, so these are not required. It would be an added bonus if you have experience in any of the following:
Recently writing Go and/or Rust
Working with Layers 1-3 of the OSI networking model
Redis, Kafka, etcd, ZooKeeper, nginx, haproxy
About the job
Apply for this position
Staff Platform Engineer
About The Team:
We have grown rapidly to where we are today serving billions of HTTP requests daily.. We achieved this scale by writing scale-sensitive components in languages like Rust and Go. This potent combination of high performance with efficient resource utilization has given us an incredible competitive edge.
We are hiring a Platform Engineer to help us continue to scale by operating and engineering the future of our infrastructure. We are maintaining 99.95% uptime today, and we are investing to ensure we maintain that as the business continues to grow and as the product evolves.
Your primary task will be software engineering with a focus on infrastructure, operations, and automation. You'll be building systems to run our product, improving internal services, and advising product teams on architecture as it relates to the operability of the service.
The systems you'll be responsible for include all of the services which power our product. This ranges from off-the-shelf services like PostgreSQL, Scylla, Redis, Kafka, Kubernetes, etc. to our in-house services such as the Rails web app, various Rust backend services, and our high-performance API layer written in Go.
You'll be working with Kubernetes to automate our data center operations and writing operational services to automate database operations. One of the key challenges in this role is to not only understand systems to the point of being able to manually operate by hand but also to understand in sufficient detail to write software systems to automate such operations.
Our blog contains more information about the OneSignal Engineering career ladder, remote-first culture, and our diverse team.
What You'll Do:
Optimize and Elevate Performance: Identify bottlenecks in our systems and unleash your creativity to introduce cutting-edge optimizations. You'll have the chance to improve the performance of our databases and evaluate innovative storage technologies that will elevate our infrastructure to new heights.
Forge Infrastructure as Code: Take the lead in setting up robust infrastructure and configuration as code using Kubernetes and Terraform. You'll be at the forefront of shaping our foundational architecture, ensuring it’s both resilient and scalable.
Drive Observability and Monitoring: Establish and maintain a state-of-the-art observability and monitoring stack. Your insights will enable us to stay ahead of potential issues, ensuring our services remain reliable and performant.
Craft the Golden Path for CI/CD: Define and implement best practices for continuous integration and deployment. Your work will streamline the deployment process for our engineering teams, allowing them to roll out new features swiftly and safely.
Collaborate Across Teams: Work closely with various engineering teams to architect highly scalable and observable services. Your collaboration will be essential in creating a cohesive and efficient development environment.
Be a Key Player in Incident Response: Join the on-call rotation and be a crucial part of maintaining our systems' health. Your expertise will be vital in troubleshooting and resolving issues, ensuring our services always meet the highest standards.
What You'll Bring:
At least 8 years of platform experience
Experience operating reliable production systems at scale
Knowledge of Linux systems internals
Desire and ability to automate tasks
Experience in managing PostgreSQL for high-scale throughput systems or similar experience with other relatable SQL datastores’.
Operational experience deploying and managing Kubernetes
Experience working with Cloud Providers (AWS/GCP/Azure)
We value a variety of experiences, so these are not required. It would be an added bonus if you have experience in any of the following:
Recently writing Go and/or Rust
Working with Layers 1-3 of the OSI networking model
Redis, Kafka, etcd, ZooKeeper, nginx, haproxy