Staff Applied Scientist - Agentic Interfaces
Datadog
Use the employer link to read the full source listing and submit your application.
Listing data may include public employer ATS feeds and Jobs by Adzuna.
Before you apply
The decision-making details job seekers want first
We pulled the strongest signals from the listing so you can quickly judge fit, compensation, and what the company expects before opening the full source post.
Compensation
Salary & market context
270% above the BLS national median
BLS national median: $74,680
- $345,000
- Benefits and Growth: New hire stock equity (RSUs) and employee stock purchase plan (ESPP) Continuous professional development, product training, and career pathing An inclusive company culture, giving programs, and the ability to join our Community Guilds (Datadog employee resource groups) Competitive global benefits and global Spring Health benefits for employees and dependents age 6+ #LI-Onsite Datadog offers a competitive salary and equity package, and may include variable compensation.
Requirements
Top requirements
- You have a BS/MS/PhD in a scientific field, or equivalent experience.
- 10+ years of relevant engineering or applied science experience, including time as a technical lead.
- Proven track record of leading ML or GenAI initiatives in a product-driven environment, from research through production.
- Significant experience with evaluation, experimentation, or measurement of ML systems at scale.
Perks & setup
Benefits candidates care about
- Partner with the Bits SRE, Bits Assistant, and Bits Dev Agent teams so first-party agents benefit from the same measurement substrate as third-party integrations, and so learnings move freely in both directions.
- Benefits and Growth: New hire stock equity (RSUs) and employee stock purchase plan (ESPP) Continuous professional development, product training, and career pathing An inclusive company culture, giving programs, and the ability to join our Community Guilds (Datadog employee resource groups) Competitive global benefits and global Spring Health benefits for employees and dependents age 6+ #LI-Onsite Datadog offers a competitive salary and equity package, and may include variable compensation.
- In addition, Datadog offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental health benefits, a 401(k) plan and match, paid time off, fitness reimbursements, and a discounted employee stock purchase plan.
Why candidates care
Benefits & perks
- Partner with the Bits SRE, Bits Assistant, and Bits Dev Agent teams so first-party agents benefit from the same measurement substrate as third-party integrations, and so learnings move freely in both directions.
- Benefits and Growth: New hire stock equity (RSUs) and employee stock purchase plan (ESPP) Continuous professional development, product training, and career pathing An inclusive company culture, giving programs, and the ability to join our Community Guilds (Datadog employee resource groups) Competitive global benefits and global Spring Health benefits for employees and dependents age 6+ #LI-Onsite Datadog offers a competitive salary and equity package, and may include variable compensation.
- In addition, Datadog offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental health benefits, a 401(k) plan and match, paid time off, fitness reimbursements, and a discounted employee stock purchase plan.
Start here
Requirements
- You have a BS/MS/PhD in a scientific field, or equivalent experience.
- 10+ years of relevant engineering or applied science experience, including time as a technical lead.
- Proven track record of leading ML or GenAI initiatives in a product-driven environment, from research through production.
- Significant experience with evaluation, experimentation, or measurement of ML systems at scale.
- You bring a strong product mindset and are comfortable driving initiatives across cross-functional teams.
- You thrive in ambiguity and can make sound technical calls when the path isn’t yet defined.
- Benefits and Growth:
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
Responsibilities
What you'll do
- Team description At Datadog, AI agents are becoming first-class consumers of observability, security, and software delivery data — from third-party coding agents like Claude Code, Cursor, and Copilot, to our own Bits SRE, Bits Assistant, and Bits Dev Agent.
- We're hiring a Staff Applied Scientist to define what "good" means for an Agentic interface at Datadog and to build the measurement systems that make it true. "Good" isn't one number — it spans answer quality, tool-selection accuracy, retrieval relevance, latency, token cost, and end-to-end agent success on real customer workflows.
- You'll design the evals, build the datasets, define the metrics, and partner with the AI engineers on the team to land the platform that lets every product group at Datadog ship integrations that are demonstrably better release over release.
- How do you build a measurement system that catches regressions across first-party and third-party agents at once, without each team writing their own harness?
- If those are the problems you want to spend your time on, come build this with us.
- What You’ll Do: Own the evaluation strategy for Datadog's AI agent integrations.
Role snapshot
About the role
Team description
At Datadog, AI agents are becoming first-class consumers of observability, security, and software delivery data — from third-party coding agents like Claude Code, Cursor, and Copilot, to our own Bits SRE, Bits Assistant, and Bits Dev Agent. The Agentic Interfaces team owns the platform that connects these agents to Datadog: the MCP Server, the tools and retrieval surfaces agents call into, and — critically — the evaluation systems that tell us whether an agent's experience on Datadog data is actually getting better over time.
This role is about that last piece. We're hiring a Staff Applied Scientist to define what "good" means for an Agentic interface at Datadog and to build the measurement systems that make it true. "Good" isn't one number — it spans answer quality, tool-selection accuracy, retrieval relevance, latency, token cost, and end-to-end agent success on real customer workflows. You'll design the evals, build the datasets, define the metrics, and partner with the AI engineers on the team to land the platform that lets every product group at Datadog ship integrations that are demonstrably better release over release.
The space is full of open research questions. How do you evaluate an agent end-to-end when the trajectory is non-deterministic? How do you score tool selection when the tool catalog has hundreds of entries and grows weekly? How do you build a measurement system that catches regressions across first-party and third-party agents at once, without each team writing their own harness? If those are the problems you want to spend your time on, come build this with us.
Source text