Site Reliability Engineer (Pacific timezone

PostHog

Remote 2w ago

Marketing AnalystRemoteMid level

Tailor My Resume for This Role View Full Listing

Use the employer link to read the full source listing and submit your application.

Listing data may include public employer ATS feeds and Jobs by Adzuna.

Before you apply

The decision-making details job seekers want first

We pulled the strongest signals from the listing so you can quickly judge fit, compensation, and what the company expects before opening the full source post.

RemoteMid levelPosted 2w ago

Compensation

Salary & market context

Salary not listed

Requirements

Top requirements

S
Deep hands-on experience with Kubernetes in production (EKS preferred). You've debugged node pressure, networking issues, and deployment failures at scale (thousands of nodes)
Strong experience operating production infrastructure on AWS. Not just one account, but understanding organizational boundaries, IAM, and networking between many
Experience automating infrastructure using Terraform or Terragrunt at scale, including module design and state management

Perks & setup

Benefits candidates care about

Teams are flexible and easy to change when needed. - Shipping fast: Why not now? https://posthog.com/handbook/values#why-not-now We want to build a lot of products; we can't do that shipping at a normal pace.
This isn't about age or experience https://posthog.com/handbook/company/grown-ups, it's about being low-ego, flexible, and respectful. - Genuine builders.

Why candidates care

Benefits & perks

Teams are flexible and easy to change when needed. - Shipping fast: Why not now? https://posthog.com/handbook/values#why-not-now We want to build a lot of products; we can't do that shipping at a normal pace.
This isn't about age or experience https://posthog.com/handbook/company/grown-ups, it's about being low-ego, flexible, and respectful. - Genuine builders.

Start here

Requirements

S
Deep hands-on experience with Kubernetes in production (EKS preferred). You've debugged node pressure, networking issues, and deployment failures at scale (thousands of nodes)
Strong experience operating production infrastructure on AWS. Not just one account, but understanding organizational boundaries, IAM, and networking between many
Experience automating infrastructure using Terraform or Terragrunt at scale, including module design and state management
Solid understanding of Linux systems (disk, memory, networking, failure modes)
Experience supporting stateful systems (databases, queues, storage systems, etc.)
Ability to debug and reason about performance and reliability issues in production
You're comfortable owning systems end-to-end, including on-call responsibilities

Responsibilities

What you'll do

Engineers lead product teams https://posthog.com/handbook/wide-company and make product decisions https://posthog.com/handbook/which-products.
Teams are flexible and easy to change when needed. - Shipping fast: Why not now? https://posthog.com/handbook/values#why-not-now We want to build a lot of products; we can't do that shipping at a normal pace.
We've built the company around small teams – autonomous, highly-efficient groups of cracked engineers https://posthog.com/founders/cracked-manifesto who can outship much larger companies because they own their products end-to-end. - Time for building: Nothing gets shipped in a meeting.
We need proactive people that can fully own projects and get them done, and know to get help when needed. "Are we there yet?" is the wrong question. - Optimistic problem solvers.
WHAT YOU'LL BE DOING You won’t be in a typical “keep the lights on” SRE role.
You'll work on the kind of problems that only show up at large scale (petabytes of data, thousands of cores, constant ingestion) across a multi-region, multi-account AWS platform running many services on Kubernetes. - Operating EKS clusters across several environments with Karpenter autoscaling, Cilium networking, and ArgoCD-driven GitOps deployments - Managing and evolving a multi AWS account organization, provisioning, networking, access control, and cross-account connectivity - Maintaining the Terraform/Terragrunt IaC platform - modules, automated plan-on-PR / apply-on-merge pipelines, and safe patterns for shared infrastructure - Improving operational tooling around deploys, schema changes, backups, restores, and incident response - Reducing operational load by identifying repeat pain points and eliminating them through code and self-healing automation - Optimizing cloud spend as you go - Participating in on-call and incident response, with a strong focus on making incidents rarer over time You'll have room to design and automate, not just respond to alerts.

Role snapshot

About the role

ABOUT POSTHOG

Product development used to mean manually writing code, running analysis, diagnosing bugs, and rolling out changes using dozens of tools.

PostHog is the only platform that acts like a co-pilot for you (and your AI agents) to do it all – autonomously.

We started with open-source product analytics, launched out of Y Combinator's W20 cohort https://posthog.com/handbook/story. We've since shipped more than a dozen products https://posthog.com/products, including:

More detail

Nice to have

E
Experience with GitOps workflows (ArgoCD) and CI/CD pipelines (GitHub Actions)
Experience with building AI agent-enabled base-level infra services for teams that move fast
Familiarity with multi-region infrastructure and the consistency/availability tradeoffs that come with it
If this sounds like you, we should talk.
We are committed to ensuring a fair and accessible interview process. If you need any accommodations or adjustments, please let us know.
#LI-DNI

Source text

Full listing preview

ABOUT POSTHOG Product development used to mean manually writing code, running analysis, diagnosing bugs, and rolling out changes using dozens of tools. PostHog is the only platform that acts like a co-pilot for you (and your AI agents) to do it all – autonomously. We started with open-source product analytics, launched out of Y Combinator's W20 cohort https://posthog.com/handbook/story. We've since shipped more than a dozen products https://posthog.com/products, including: - PostHog Code https://posthog.com/code, the only AI devtool that understands your product, not just your codebase. - A built-in data warehouse https://posthog.com/docs/data-warehouse, so users can query product and customer data together using custom SQL insights. - PostHog AI https://posthog.com/ai, an AI-powered analyst that answers product questions, helps users find useful session recordings, and writes custom SQL queries. We are: 1. Product-led. More than 450,000 organizations have installed PostHog, mostly driven by word-of-mouth. We have intensely strong product-market fit. 2. Default alive https://paulgraham.com/aord.html. Revenue is growing incredibly quickly, and we're very efficient. We raise money to push ambition and grow faster, not to keep the lights on. 3. Well-funded. We've raised more than $180m from some of the world's top investors. We're set up for a long, ambitious journey. We're focused on building an awesome product for end users, hiring exceptional teammates, shipping fast, and being as weird as possible https://posthog.com/deskhog. THINGS WE CARE ABOUT - Transparency: Everyone can read about our roadmap, how we pay (or even let go of) people, our strategy, and how we work, in our public company handbook https://posthog.com/handbook. Internally, we share revenue, notes and slides from board meetings, and fundraising plans, so everyone has the context they need to make good decisions. - Autonomy: We don’t tell anyone what to do. Everyone chooses what to work on next based on what's going to have the biggest impact on our customers, and what they find interesting and motivating to work on. Engineers lead product teams https://posthog.com/handbook/wide-company and make product decisions https://posthog.com/handbook/which-products. Teams are flexible and easy to change when needed. - Shipping fast: Why not now? https://posthog.com/handbook/values#why-not-now We want to build a lot of products; we can't do that shipping at a normal pace. We've built the company around small teams – autonomous, highly-efficient groups of cracked engineers https://posthog.com/founders/cracked-manifesto who can outship much larger companies because they own their products end-to-end. - Time for building: Nothing gets shipped in a meeting. We're a natively remote company. We default to async communication – PRs > Issues > Slack. Tuesdays and Thursdays are meeting-free days https://posthog.com/handbook/company/culture#were-on-the-makers-schedule, and we prioritize heads down building time over perfect coordination. This will be the most productive job you've ever had. - Ambition: We want to solve big problems. We strongly believe that aiming for the best possible upside, and sometimes missing, is better than never trying. We're optimistic about what's possible and our ability to get there. - Being weird: Weird means redesigning an already world-class website for the 5th time. It means shipping literally every product that relates to customer data. It means building an objectively unnecessary developer toy https://posthog.com/deskhog with dubious shareholder value. Doing weird stuff is a competitive advantage. And it's fun. WHO WE'RE LOOKING FOR We’re looking for people (in the Pacific timezones) that like deep ownership of production systems, people that are not afraid of working with stateful infrastructure and love working in AWS, VMs, automation, and making messy systems reliable. In general we seek SRE’s who are: - Enthusiastic drivers. We need proactive people that can fully own projects and get them done, and know to get help when needed. "Are we there yet?" is the wrong question. - Optimistic problem solvers. Things get hard here sometimes, whether it's scaling, shipping complex products, handling a stream of support requests, or trying to ship something that touches multiple teams. We need people who won't get disheartened, and will collaborate, iterate, and ship their way out of anything. - Grown ups. We’re an international bunch of weirdos, but one thing unites us: everyone is kind, considerate, and professional towards each other. This isn't about age or experience https://posthog.com/handbook/company/grown-ups, it's about being low-ego, flexible, and respectful. - Genuine builders. PostHog is full of people who just love building stuff, people who would still be building software even if there wasn't a paycheck at the end. If this sounds like you, we should talk. WHAT YOU'LL BE DOING You won’t be in a typical “keep the lights on” SRE role. The work is about turning a fast-growing, stateful system into a predictable, well-automated platform. (provisioning, scaling, rebalancing, recovery) That means reducing operational stress, designing safe automation for traffic-heavy workloads, and building the tooling and patterns that let the system scale without scaling human effort. You'll work on the kind of problems that only show up at large scale (petabytes of data, thousands of cores, constant ingestion) across a multi-region, multi-account AWS platform running many services on Kubernetes. - Operating EKS clusters across several environments with Karpenter autoscaling, Cilium networking, and ArgoCD-driven GitOps deployments - Managing and evolving a multi AWS account organization, provisioning, networking, access control, and cross-account connectivity - Maintaining the Terraform/Terragrunt IaC platform - modules, automated plan-on-PR / apply-on-merge pipelines, and safe patterns for shared infrastructure - Improving operational tooling around deploys, schema changes, backups, restores, and incident response - Reducing operational load by identifying repeat pain points and eliminating them through code and self-healing automation - Optimizing cloud spend as you go - Participating in on-call and incident response, with a strong focus on making incidents rarer over time You'll have room to design and automate, not just respond to alerts. You should join this team if you like deep ownership of production systems and enjoy building the platform layer that everything else runs on. REQUIREMENTS - Deep hands-on experience with Kubernetes in production (EKS preferred). You've debugged node pressure, networking issues, and deployment failures at scale (thousands of nodes) - Strong experience operating production infrastructure on AWS. Not just one account, but understanding organizational boundaries, IAM, and networking between many - Experience automating infrastructure using Terraform or Terragrunt at scale, including module design and state management - Solid understanding of Linux systems (disk, memory, networking, failure modes) - Experience supporting stateful systems (databases, queues, storage systems, etc.) - Ability to debug and reason about performance and reliability issues in production - You're comfortable owning systems end-to-end, including on-call responsibilities You don't need to be an expert in every system we run on day one. But you do need to enjoy owning complex infrastructure and learning how the pieces fit together. NICE TO HAVE - Experience with GitOps workflows (ArgoCD) and CI/CD pipelines (GitHub Actions) - Experience with building AI agent-enabled base-level infra services for teams that move fast - Familiarity with multi-region infrastructure and the consistency/availability tradeoffs that come with it If this sounds like you, we should talk. We are committed to ensuring a fair and accessible interview process. If you need any accommodations or adjustments, please let us know. #LI-DNI

Back to all jobs