Job Description
We’re looking for a Site Reliability Engineer who thrives at the intersection of software engineering and systems operations. This isn’t a passive monitoring or alert-routing role — our SREs own reliability as a core product feature. You’ll be embedded with a team managing a platform that processes real-time transactions across multiple regions with high throughput and strict latency requirements. Expect deep dives into distributed systems, hands-on automation, and the mandate to eliminate toil wherever it hides.
Key Responsibilities
- Design and manage fault-tolerant and self-healing infrastructure across Kubernetes, AWS (EKS, EC2, RDS), and GCP workloads.
- Build and refine automated runbooks, deployment pipelines, and chaos testing frameworks to simulate system degradation and improve recovery.
- Collaborate with application teams to define and enforce Service Level Objectives (SLOs) and Error Budgets.
- Develop and extend our observability platform using tools like Grafana, Prometheus, OpenTelemetry, and Honeycomb.
- Participate in on-call rotations, conduct post-incident reviews, and lead blameless retrospectives with a bias toward long-term fixes.
- Proactively identify hidden reliability risks in code deployments, API integrations, and third-party services.
Qualifications
- 5+ years of hands-on experience in a production SRE or infrastructure engineering role, with deep expertise in distributed systems.
- Strong coding ability in Go, Python, or Rust, with a focus on writing tools, infrastructure-as-code, or platform services.
- Proven experience with service mesh technologies (e.g., Istio or Linkerd) and zero-downtime deployment strategies.
- Practical experience operating multi-region or multi-cloud architectures at scale.
- You’ve deployed or maintained infrastructure using Terraform, Helm, and CI/CD systems like ArgoCD or GitHub Actions.
- Familiarity with incident management tools (e.g., PagerDuty, FireHydrant) and structured postmortem practices.
- Comfortable challenging assumptions and driving cross-functional alignment in high-pressure situations.
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#GraphicDesignJobsOnline#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers#Dynamicbrandguru