Site Reliability Engineer (Pacific timezone
PostHog
Use the employer link to read the full source listing and submit your application.
Listing data may include public employer ATS feeds and Jobs by Adzuna.
Before you apply
The decision-making details job seekers want first
We pulled the strongest signals from the listing so you can quickly judge fit, compensation, and what the company expects before opening the full source post.
Compensation
Salary & market context
Salary not listed
Requirements
Top requirements
- S
- Deep hands-on experience with Kubernetes in production (EKS preferred). You've debugged node pressure, networking issues, and deployment failures at scale (thousands of nodes)
- Strong experience operating production infrastructure on AWS. Not just one account, but understanding organizational boundaries, IAM, and networking between many
- Experience automating infrastructure using Terraform or Terragrunt at scale, including module design and state management
Perks & setup
Benefits candidates care about
- Teams are flexible and easy to change when needed. - Shipping fast: Why not now? https://posthog.com/handbook/values#why-not-now We want to build a lot of products; we can't do that shipping at a normal pace.
- This isn't about age or experience https://posthog.com/handbook/company/grown-ups, it's about being low-ego, flexible, and respectful. - Genuine builders.
Why candidates care
Benefits & perks
- Teams are flexible and easy to change when needed. - Shipping fast: Why not now? https://posthog.com/handbook/values#why-not-now We want to build a lot of products; we can't do that shipping at a normal pace.
- This isn't about age or experience https://posthog.com/handbook/company/grown-ups, it's about being low-ego, flexible, and respectful. - Genuine builders.
Start here
Requirements
- S
- Deep hands-on experience with Kubernetes in production (EKS preferred). You've debugged node pressure, networking issues, and deployment failures at scale (thousands of nodes)
- Strong experience operating production infrastructure on AWS. Not just one account, but understanding organizational boundaries, IAM, and networking between many
- Experience automating infrastructure using Terraform or Terragrunt at scale, including module design and state management
- Solid understanding of Linux systems (disk, memory, networking, failure modes)
- Experience supporting stateful systems (databases, queues, storage systems, etc.)
- Ability to debug and reason about performance and reliability issues in production
- You're comfortable owning systems end-to-end, including on-call responsibilities
Responsibilities
What you'll do
- Engineers lead product teams https://posthog.com/handbook/wide-company and make product decisions https://posthog.com/handbook/which-products.
- Teams are flexible and easy to change when needed. - Shipping fast: Why not now? https://posthog.com/handbook/values#why-not-now We want to build a lot of products; we can't do that shipping at a normal pace.
- We've built the company around small teams – autonomous, highly-efficient groups of cracked engineers https://posthog.com/founders/cracked-manifesto who can outship much larger companies because they own their products end-to-end. - Time for building: Nothing gets shipped in a meeting.
- We need proactive people that can fully own projects and get them done, and know to get help when needed. "Are we there yet?" is the wrong question. - Optimistic problem solvers.
- WHAT YOU'LL BE DOING You won’t be in a typical “keep the lights on” SRE role.
- You'll work on the kind of problems that only show up at large scale (petabytes of data, thousands of cores, constant ingestion) across a multi-region, multi-account AWS platform running many services on Kubernetes. - Operating EKS clusters across several environments with Karpenter autoscaling, Cilium networking, and ArgoCD-driven GitOps deployments - Managing and evolving a multi AWS account organization, provisioning, networking, access control, and cross-account connectivity - Maintaining the Terraform/Terragrunt IaC platform - modules, automated plan-on-PR / apply-on-merge pipelines, and safe patterns for shared infrastructure - Improving operational tooling around deploys, schema changes, backups, restores, and incident response - Reducing operational load by identifying repeat pain points and eliminating them through code and self-healing automation - Optimizing cloud spend as you go - Participating in on-call and incident response, with a strong focus on making incidents rarer over time You'll have room to design and automate, not just respond to alerts.
Role snapshot
About the role
ABOUT POSTHOG
Product development used to mean manually writing code, running analysis, diagnosing bugs, and rolling out changes using dozens of tools.
PostHog is the only platform that acts like a co-pilot for you (and your AI agents) to do it all – autonomously.
We started with open-source product analytics, launched out of Y Combinator's W20 cohort https://posthog.com/handbook/story. We've since shipped more than a dozen products https://posthog.com/products, including:
More detail
Nice to have
- E
- Experience with GitOps workflows (ArgoCD) and CI/CD pipelines (GitHub Actions)
- Experience with building AI agent-enabled base-level infra services for teams that move fast
- Familiarity with multi-region infrastructure and the consistency/availability tradeoffs that come with it
- If this sounds like you, we should talk.
- We are committed to ensuring a fair and accessible interview process. If you need any accommodations or adjustments, please let us know.
- #LI-DNI
Source text