Trust & Mental Models in an AI Shopping Assistant
12 consecutive weeks of evaluative research at concept stage studying how shoppers built trust with Amazon Rufus, navigated a conversational AI interface, and formed mental models of AI-powered recommendations.
The Challenge
Amazon was building Rufus, a conversational AI shopping assistant, and needed to understand how shoppers would experience, understand, and trust a fundamentally new kind of shopping interface at the earliest concept stage, before significant engineering investment was locked in.
The challenge was not just usability. It was understanding how users form mental models of AI acting on their behalf, where they feel confident delegating decisions to the system, where trust breaks down, and what the experience of using a conversational AI for something as high-stakes as shopping actually feels like in practice.
Core Research Questions
- arrow_rightHow do shoppers build (or withhold) trust in AI-powered product recommendations?
- arrow_rightHow do users understand the boundaries of what Rufus can and cannot do?
- arrow_rightWhere does the conversational interface feel natural, and where does it create friction?
Methodology
The research was designed to match the pace of product development, one round of research per week, with findings feeding directly into the next design iteration. Each round combined evaluative and behavioral methods with targeted trust and explainability probes.
Think-Aloud Protocol
Participants verbalized their thought process while using Rufus, surfacing real-time trust signals, confusion moments, and decision-making patterns as they occurred.
Trust Probing
Structured probes after each interaction to understand why participants trusted or questioned specific recommendations, and what information would increase their confidence.
Mental Model Mapping
Post-session exercises to understand how participants conceptualized what Rufus was, how it worked, and what its limitations were, independent of what they had been told.
12 rounds over 12 consecutive weeks with 7–8 participants per round (~90 total). Each round was designed based on findings from the previous one, creating a cumulative, iterative research program rather than a series of disconnected studies.
Key Learning
Iterative research compounds. Each round builds on the last.
This project reinforced something I now carry into every engagement: velocity and rigor are not opposites. The key is building systems that let both coexist.
We started broad. Early rounds surfaced questions we didn't know to ask, and the research kept rewriting itself in productive ways. By the middle rounds we could go narrower and deeper on what actually mattered. Across all twelve weeks I maintained a theme tracking system to capture recurring patterns and surface emerging ones, so nothing got lost in the churn of weekly sessions.
The project also required onboarding and offboarding researchers mid-stream while keeping the work moving at pace. I stayed on for all twelve weeks as the lead, and built comprehensive onboarding materials and tracking systems that let new team members get up to speed fast without disrupting continuity. The research ultimately shaped the initial launch of Rufus on mobile, and then informed its subsequent evaluation and expansion to desktop.
The lesson I took away: creating good systems for data collection and reporting isn't overhead. It's what makes rapid, rigorous research possible in the first place.
Outcome
Findings fed directly into 12 successive product iterations, with each round's insights shaping what the design team prioritized in the following week's build. The research program accelerated Amazon's ability to make confident product decisions at concept stage for a product that now serves millions of Amazon shoppers.