Job Description
About the Role
We’re looking for a Cloud Engineer with deep hands-on experience in designing and automating infrastructure across multi-cloud environments (AWS/GCP preferred). This isn’t a checkbox role — you’ll be tasked with transforming how infrastructure is deployed, monitored, and scaled across our platform. You’ll work closely with platform engineers and security teams to ensure everything from ephemeral environments to production workloads are reproducible, auditable, and cost-efficient.
Responsibilities
- Architect and manage Kubernetes clusters (EKS/GKE), focusing on scalability, fault tolerance, and developer experience.
- Own and evolve the Terraform modules underpinning our infrastructure-as-code strategy, with a focus on reusability and modularity.
- Design and implement service mesh layers (e.g., Istio or Linkerd) to improve observability, traffic routing, and security across internal services.
- Lead cost optimization initiatives: establish budget monitoring across cloud accounts, build automated alerts, and deliver monthly analysis to Engineering leadership.
- Implement fine-grained IAM policies and federation strategies to align with least-privilege principles across cloud providers.
- Build CI/CD pipelines (using GitHub Actions or ArgoCD) that support canary releases and blue-green deployments with full rollback automation.
- Conduct root cause analysis and postmortems on production incidents, with a focus on preventing recurrence.
- Develop internal tooling (in Python or Go) to automate tasks like image scanning, ephemeral environment setup, and secrets rotation.
Required Qualifications
- 4+ years of experience in cloud infrastructure engineering with at least 2 years working in a production environment at scale.
- Expert-level experience with Terraform (not just using modules — you’ve written your own reusable ones).
- Strong experience with at least one major cloud provider (AWS or GCP) and good working knowledge of the other.
- Solid understanding of Kubernetes internals, from pod scheduling to CRD authoring.
- Familiarity with distributed tracing, metrics aggregation (Prometheus/Grafana or similar), and log management (e.g., Loki, Fluent Bit, or ELK stack).
- Proven experience implementing zero-downtime deployment strategies and rollback mechanisms.
- Experience hardening cloud infrastructure from a security and compliance perspective (e.g., CIS benchmarks, Cloud Custodian).
Nice to Have
- Experience with policy-as-code (OPA/Gatekeeper or HashiCorp Sentinel).
- Contributions to open-source projects in the DevOps/cloud-native space.
- Prior experience in environments with SOC2, HIPAA, or FedRAMP compliance requirements.
- Knowledge of cloud-native databases (e.g., Amazon Aurora, Cloud Spanner) and how to manage them via infrastructure automation.
You Might Be a Fit If You…
- See IaC as a craft, not just a task.
- Think writing bash scripts to test cloud behavior is fun.
- Believe a 30-minute postmortem doc is more valuable than a week of speculation.
- Have ever caught a $2,000/month misconfiguration just by skimming billing reports.
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#GraphicDesignJobsOnline#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers#Dynamicbrandguru