Staff Site Reliability Engineer
Okta
Use the employer link to read the full source listing and submit your application.
Listing data may include public employer ATS feeds and Jobs by Adzuna.
Before you apply
The decision-making details job seekers want first
We pulled the strongest signals from the listing so you can quickly judge fit, compensation, and what the company expects before opening the full source post.
Compensation
Salary & market context
Salary not listed
Requirements
Top requirements
- What You’ll Bring to the Role: Strong hands-on experience architecting and operating cloud-native distributed systems (AWS and GCP).
- Proficiency with Infrastructure as Code tools such as Terraform (multi-provider), Ansible, or CloudFormation.
- Advanced understanding of CI/CD pipelines (ArgoCD, GitLab CI, Spinnaker), Linux systems, and networking fundamentals (Direct Connect/Interconnect, DNS, routing, load balancing) and Redis (must have).
- Hands-on experience with observability tools (Prometheus, Grafana, ELK, Loki, OpenTelemetry, Google Cloud Operations) for performance and reliability insights.
Perks & setup
Work setup
- On-site
- Senior level
- Posted 2d ago
Start here
Requirements
- What You’ll Bring to the Role: Strong hands-on experience architecting and operating cloud-native distributed systems (AWS and GCP).
- Proficiency with Infrastructure as Code tools such as Terraform (multi-provider), Ansible, or CloudFormation.
- Advanced understanding of CI/CD pipelines (ArgoCD, GitLab CI, Spinnaker), Linux systems, and networking fundamentals (Direct Connect/Interconnect, DNS, routing, load balancing) and Redis (must have).
- Hands-on experience with observability tools (Prometheus, Grafana, ELK, Loki, OpenTelemetry, Google Cloud Operations) for performance and reliability insights.
- Strong communication and problem-solving skills, with demonstrated success leading cross-team projects and mentoring peers.
- Experience: 8+ years in SRE, DevOps, or Infrastructure Engineering roles. 3–5 years of experience with Kubernetes (EKS/GKE) and related ecosystem tools (Helm, Karpenter, etc.) in production. 3–5 years of experience with AWS and GCP. 3–5 years using Terraform to manage multi-cloud infrastructure. 5+ years of coding experience in Python, Go, or similar languages.
Responsibilities
What you'll do
- What You’ll Be Doing Design, build, and operate highly scalable, reliable, and secure infrastructure powering our production systems across AWS and GCP.
- Lead major reliability and modernization initiatives, including container platform migrations (e.g., ECS to EKS/GKE) and microservice enablement across multi-cloud environments.
- Partner with development teams to architect and enable microservice-based applications, ensuring production readiness, scalability, and observability.
- Lead complex technical projects from conception to completion, managing timelines, and technical dependencies across teams.
- Deep expertise with Kubernetes (EKS and GKE) — design, provisioning, scaling, and advanced troubleshooting in production.
Role snapshot
About the role
Secure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.
This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.
Source text