Member of Technical Staff (Cluster Manager)
As a Member of Technical Staff on Cluster Management, you will:
Be responsible for the reliability, performance, and scalability of our compute infrastructure.
Design, build, and maintain the tools that keep our systems running smoothly.
Monitor system performance, troubleshoot issues, and implement solutions to prevent future problems.
Collaborate with engineering and research teams to ensure our infrastructure meets their needs.
Manage machine and storage resources efficiently, and implement strategies to reduce infrastructure costs.
You may be a good fit, if you have:
Experience managing and troubleshooting large-scale distributed systems.
Strong scripting and automation skills (e.g., Python, Bash).
Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana).
A deep understanding of cloud computing platforms (e.g., AWS, GCP, Azure).
Strongly desired: Experience with HPC/GPU cluster management tools (e.g., Slurm, GPU monitoring tools, distributed file systems).
The ability to build in a fast-paced environment under some uncertainty.
Reka's Mission
Reka's mission is to build useful multimodal artificial intelligence and use it to empower organisations and businesses. We are a globally distributed foundation model startup, headquartered in the San Francisco Bay Area, California. Embracing a remote-first approach, our team brings together top talent from around the world. Our founding team, along with many of our team members, has contributed to many of the breakthroughs in AI over the past decade.
Why Reka?
An Elite Team: Collaborate with top-tier engineers, researchers, operators from renowned organizations like Google DeepMind and Facebook AI Research (FAIR) and successful startups, driving innovation in AI technology.
Cutting Edge Infra: Opportunity to design and manage large-scale cluster with latest hardware.
Massive Market Opportunity: Be part of a rapidly growing industry poised to transform multiple sectors globally, offering the chance to make a significant impact.
Inclusive and Open Culture: Thrive in an open and inclusive work environment that values diverse perspectives and fosters creativity.
Visa Support: We provide visa assistance, including H1B and OPT transfers, for US employees to ensure a smooth transition and support your career with us.
Member of Technical Staff (Cluster Manager)
As a Member of Technical Staff on Cluster Management, you will:
Be responsible for the reliability, performance, and scalability of our compute infrastructure.
Design, build, and maintain the tools that keep our systems running smoothly.
Monitor system performance, troubleshoot issues, and implement solutions to prevent future problems.
Collaborate with engineering and research teams to ensure our infrastructure meets their needs.
Manage machine and storage resources efficiently, and implement strategies to reduce infrastructure costs.
You may be a good fit, if you have:
Experience managing and troubleshooting large-scale distributed systems.
Strong scripting and automation skills (e.g., Python, Bash).
Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana).
A deep understanding of cloud computing platforms (e.g., AWS, GCP, Azure).
Strongly desired: Experience with HPC/GPU cluster management tools (e.g., Slurm, GPU monitoring tools, distributed file systems).
The ability to build in a fast-paced environment under some uncertainty.
Reka's Mission
Reka's mission is to build useful multimodal artificial intelligence and use it to empower organisations and businesses. We are a globally distributed foundation model startup, headquartered in the San Francisco Bay Area, California. Embracing a remote-first approach, our team brings together top talent from around the world. Our founding team, along with many of our team members, has contributed to many of the breakthroughs in AI over the past decade.
Why Reka?
An Elite Team: Collaborate with top-tier engineers, researchers, operators from renowned organizations like Google DeepMind and Facebook AI Research (FAIR) and successful startups, driving innovation in AI technology.
Cutting Edge Infra: Opportunity to design and manage large-scale cluster with latest hardware.
Massive Market Opportunity: Be part of a rapidly growing industry poised to transform multiple sectors globally, offering the chance to make a significant impact.
Inclusive and Open Culture: Thrive in an open and inclusive work environment that values diverse perspectives and fosters creativity.
Visa Support: We provide visa assistance, including H1B and OPT transfers, for US employees to ensure a smooth transition and support your career with us.