Designing AI evaluations that reflect human judgment

,
CDT
Watch party
Zoom breakouts

As AI features and agents become deeply embedded in products, evaluation of AI responses becomes critical to building trustworthy experiences. Evaluations determine whether an AI feature consistently meets human expectations for clarity, usefulness, and tone. Although many teams rely on manual human review, this can often be slow and difficult to scale.

To address this, teams are increasingly using LLMs to evaluate the outputs of other LLMs. While these approaches are faster and more efficient, are they trustworthy?

To build AI experiences people can trust, human standards must be embedded into evaluation workflows, seamlessly and at scale. But how?

In this talk, Himali will share lessons from hands-on experience designing and evolving AI features in Microsoft 365, shaped by working closely with partner teams in engineering, product design, research, and applied science. Drawing on her work in the Microsoft‑wide Content Design Community, she’ll share practical strategies for designing evaluation workflows that embed human judgment throughout the process using metrics, rubrics, and tests. She’ll also cover ways to address the limitations of human review, such as inefficiency and bias, while preserving what makes human judgment valuable: qualitative insight, nuance, and empathy.

In this session, you’ll learn how to:

  • Design an evaluation workflow that involves all disciplines.
  • Create and apply a clear evaluation framework.
  • Define metrics with concrete examples so both humans and models interpret them consistently.
  • Design evaluation methods that ease the load of human review and make scoring more consistent, with practical templates included.
  • Compare human and LLM-based evaluations to identify misalignment and iteratively improve evaluation loops.
Himali Kelvekar

Content Designer 2, Microsoft Ireland

Sign up for Button email!

Be the first to hear about Button events, free content design resources, and special offers.

Thanks! Check your inbox to confirm your subscription.
👉 IMPORTANT: Firewalls and spam filters can block us. Add “hello@buttonconf.com” to your email contacts so they don’t!
Oops! Something went wrong while submitting the form.