Site Reliability Engineer

Anywhere365

Full-time

South Africa

engineer

devops

python

azure

linux

The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote System Administration jobs

Founded in 2010 in The Netherlands, Anywhere365 is a global leader in Enterprise Dialogue Management, with a vision to ensure every employee and customer feels heard, understood, and valued. With around 240 employees in working from 22 different countries, we partner with over 2,000 leading enterprises, including Mazda, the UN International Organization for Migration, Adecco Group, and the University of Cape Town, to deliver exceptional customer experiences through the power of Microsoft Teams and AI-driven insights. Our commitment to innovation, customer focus, and accountability drives our success.

We are looking for a highly skilled and driven Site Reliability Engineer (SRE) to join our team with a strong emphasis on communications technologies, cloud operations, and system performance. This role requires expertise in monitoring, alerting, anomaly detection, automation, security, and performance tuning across our critical communications platforms. You will be responsible for the reliability, availability, and performance of services such as SIP, Skype for Business, and Azure Communication Services (ACS). Your role will also focus on optimizing resource utilization, cost management, and ensuring disaster recovery and business continuity (BCP/DR).

Main responsibilities:

Develop and maintain real-time monitoring and alerting systems using tools like Prometheus, Grafana, and the ELK stack to ensure system health and performance.
Identify and resolve anomalies and bottlenecks proactively, reducing downtime through automated detection and alert mechanisms.
Automate infrastructure provisioning, scaling, and patching using tools like Terraform and Azure DevOps across Kubernetes, Windows, and Linux environments.
Build self-healing systems and leverage Kubernetes operators, CI/CD pipelines, and event-driven automation to improve reliability.
Analyze and optimize system performance for latency-sensitive services, including VoIP, video, and messaging.
Implement cloud cost optimization strategies, such as using Reserved Instances, rightsizing virtual machines, and leveraging Azure Cost Management tools.
Strengthen system security by enforcing best practices for hardening, vulnerability patching, and incident management in collaboration with security teams.
Design and execute robust disaster recovery plans, ensuring fault-tolerant architectures and reliable backup and restore strategies.

Why we would like to have a dialogue with you

We pick competencies over skills and experience. Can you convince us that you possess the following competencies:

Communication: The ability to communicate clearly and effectively with individuals across the organization, and to be responsive to their needs and concerns.
Action-oriented: The ability to act quickly and decisively, even in the face of uncertainty, to move projects forward and achieve business goals.
Commitment: The ability to consistently meet or exceed the quality standards expected by stakeholders.
Collaboration: The ability to work effectively with others and to build strong relationships based on trust and mutual respect, recognizing that everyone has something to contribute.
Taking ownership: The ability to take full responsibility and accountability for tasks, projects, or actions, demonstrating a sense of commitment and dedication towards achieving desired outcomes.

Competencies are key, but to be successful in this role you need to bring a few essentials to kickstart the conversation:

Key skills & experience:

5+ years of experience as an SRE, Systems Engineer, or in a similar role with a focus on communications technologies.
Proven experience with cloud platforms, with a strong focus on Azure and experience with Azure Resource Patching for Kubernetes (AKS), VMs (Windows and Linux).
Experience with Microsoft Teams/Skype for Business and Azure Communication Services.
Strong understanding of SIP, VoIP, and related protocols.
Strong understanding of networking concepts and experience with Cisco networking technologies (e.g., routers switches, firewalls).
Experience with scripting languages (e.g., Python, PowerShell, Bash, Terraform, Helm, Pulumi) and automation tools.
Experience with network performance monitoring tools (e.g., Wireshark, tcdump) is a plus.

Some last notes:

We are in the process of establishing a legal entity in South Africa, and you will be employed directly by us once it is finalized. In the interim, we expect you to work as a contractor.

Currently, this role is remote. However, we are also planning to open an office in South Africa, where you will be expected to work on-site four days a week once it is operational.

As this position involves supporting regions such as Europe and the US, we require flexibility in working shifts to cover these time zones. Additionally, occasional weekend work will be expected. Rest assured, you will be compensated for any irregular hours worked.

Anywhere365 is committed to creating a diverse environment and is proud to be an equal-opportunity employer. We accept difference and we thrive on it for the benefit of our employees, our products, and our community.

Please note that we have a background check policy. The background check differs per country and position. If you would like to know more, the Talent Acquisition Specialists are happy to answer any questions!

Site Reliability Engineer

Anywhere365

The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote System Administration jobs

Main responsibilities:

Develop and maintain real-time monitoring and alerting systems using tools like Prometheus, Grafana, and the ELK stack to ensure system health and performance.
Identify and resolve anomalies and bottlenecks proactively, reducing downtime through automated detection and alert mechanisms.
Automate infrastructure provisioning, scaling, and patching using tools like Terraform and Azure DevOps across Kubernetes, Windows, and Linux environments.
Build self-healing systems and leverage Kubernetes operators, CI/CD pipelines, and event-driven automation to improve reliability.
Analyze and optimize system performance for latency-sensitive services, including VoIP, video, and messaging.
Implement cloud cost optimization strategies, such as using Reserved Instances, rightsizing virtual machines, and leveraging Azure Cost Management tools.
Strengthen system security by enforcing best practices for hardening, vulnerability patching, and incident management in collaboration with security teams.
Design and execute robust disaster recovery plans, ensuring fault-tolerant architectures and reliable backup and restore strategies.

Why we would like to have a dialogue with you

We pick competencies over skills and experience. Can you convince us that you possess the following competencies:

Communication: The ability to communicate clearly and effectively with individuals across the organization, and to be responsive to their needs and concerns.
Action-oriented: The ability to act quickly and decisively, even in the face of uncertainty, to move projects forward and achieve business goals.
Commitment: The ability to consistently meet or exceed the quality standards expected by stakeholders.
Collaboration: The ability to work effectively with others and to build strong relationships based on trust and mutual respect, recognizing that everyone has something to contribute.
Taking ownership: The ability to take full responsibility and accountability for tasks, projects, or actions, demonstrating a sense of commitment and dedication towards achieving desired outcomes.

Competencies are key, but to be successful in this role you need to bring a few essentials to kickstart the conversation:

Key skills & experience:

5+ years of experience as an SRE, Systems Engineer, or in a similar role with a focus on communications technologies.
Proven experience with cloud platforms, with a strong focus on Azure and experience with Azure Resource Patching for Kubernetes (AKS), VMs (Windows and Linux).
Experience with Microsoft Teams/Skype for Business and Azure Communication Services.
Strong understanding of SIP, VoIP, and related protocols.
Strong understanding of networking concepts and experience with Cisco networking technologies (e.g., routers switches, firewalls).
Experience with scripting languages (e.g., Python, PowerShell, Bash, Terraform, Helm, Pulumi) and automation tools.
Experience with network performance monitoring tools (e.g., Wireshark, tcdump) is a plus.

Some last notes:

We are in the process of establishing a legal entity in South Africa, and you will be employed directly by us once it is finalized. In the interim, we expect you to work as a contractor.

Currently, this role is remote. However, we are also planning to open an office in South Africa, where you will be expected to work on-site four days a week once it is operational.

About the job

30,000+
REMOTE JOBS

Site Reliability Engineer

Working Nomads

Jobs by Category

Jobs by Position Type

Jobs by Region

Jobs by Skill

Jobs by Country