Designing AI evaluations that reflect human judgment

As AI features and agents become deeply embedded in products, evaluation of AI responses becomes critical to building trustworthy experiences. Evaluations determine whether an AI feature consistently meets human expectations for clarity, usefulness, and tone. Although many teams rely on manual human review, this can often be slow and difficult to scale.

To address this, teams are increasingly using LLMs to evaluate the outputs of other LLMs. While these approaches are faster and more efficient, are they trustworthy?

To build AI experiences people can trust, human standards must be embedded into evaluation workflows, seamlessly and at scale. But how?

In this talk, Himali will share lessons from hands-on experience designing and evolving AI features in Microsoft 365, shaped by working closely with partner teams in engineering, product design, research, and applied science. Drawing on her work in the Microsoft‑wide Content Design Community, she’ll share practical strategies for designing evaluation workflows that embed human judgment throughout the process using metrics, rubrics, and tests. She’ll also cover ways to address the limitations of human review, such as inefficiency and bias, while preserving what makes human judgment valuable: qualitative insight, nuance, and empathy.

In this session, you’ll learn how to:

Design an evaluation workflow that involves all disciplines.
Create and apply a clear evaluation framework.
Define metrics with concrete examples so both humans and models interpret them consistently.
Design evaluation methods that ease the load of human review and make scoring more consistent, with practical templates included.
Compare human and LLM-based evaluations to identify misalignment and iteratively improve evaluation loops.

Himali Kelvekar

Content Designer 2, Microsoft Ireland

Designing AI evaluations that reflect human judgment

More Button 2026 sessions

Harm reduction in content design: How we live out our values at work

What got us here won’t get us there: Repositioning content design for what’s next

Designing the front door: Turn empty states into effective product navigation

Sign up for Button email!