Doubling support automation to 48%: redesigning Zalando's conversational AI experience
Zalando is Europe's largest online fashion platform, with 51.8 million active customers across 29 markets and €10.6 billion in annual revenue. Zalando's chatbot was 20× cheaper than agent chat but resolved only 23% of conversations.
I redesigned the experience from free-text frustration to button-driven confidence; doubling automation to 48% and saving €4.11M annually while increasing user satisfaction by 12 points.
The cheapest support channel was also the worst
Zalando's chatbot cost €0.18 per interaction. Agent-assisted chat cost €3.50. The math was obvious: automate more, save more. But the chatbot had a serious problem.
It only resolved 23% of conversations without human handoff. Out of 1.6 million annual chats where the bot was first line of contact, barely 380,000 ended without escalation. The rest transferred to agents, creating a double cost: the bot interaction plus the agent interaction. Worse, the chatbot had the highest repeat contact rate of any channel and the lowest satisfaction scores.
Customers had figured out how to game the system. They'd type "I want to chat with a human" or answer "no" to every resolution prompt. The NLP engine recognized both as transfer intents and handed them to agents every time. 4 out of 7 users in our research preferred calling over chatting, specifically because past chatbot experiences had wasted their time with generic answers that didn't address their situation.
The real problem wasn't usability, it was trust
I analyzed 100+ chat transcripts in depth from the 1.6M annual interactions and co-led qualitative depth interviews (1-hour sessions) with customers across Germany and the Netherlands.
Three findings reframed the design direction.
Keyword anxiety
Customers feared "not typing the right keywords" in free-text chatbots. They'd learned from past experiences that saying the wrong thing meant the bot couldn't help, and they'd wasted time for nothing.
Trust deficit, not usability gap
The core insight wasn't about interface design. The problem wasn't NLP accuracy, which had a hard ceiling given 2022 technology. The solution was eliminating the anxiety entirely. This reframed the entire design direction.
Guided beats free-text
When presented with a guided, button-driven interface instead of free text, engagement shifted dramatically. 7 out of 7 users identified the right menu options on first try. 6 out of 7 correctly identified their parcels from the visual display. Buttons eliminated the cognitive load of formulating a question and the anxiety of getting it wrong.
Chatbot adoption isn't a usability problem. It's a trust problem. Customers feared typing the wrong thing, so the solution was removing the need to type entirely.
Designing around NLP limitations
In 2022, chatbot technology ran on rule-based NLP. It worked for narrow inputs but collapsed when users expressed problems naturally. Rather than pour resources into marginally better intent recognition, I designed around the limitation entirely.
About 20% of conversations failed because the NLP couldn't parse broad context. Buttons eliminated this entirely. 95% of intents were covered through button selections across first and follow-up questions. The remaining 5% escalated cleanly to agents.
Previously, 30% of customers dropped off the moment the bot asked for their order number. They had to leave the chat, search emails, find the number, and come back. Most didn't bother. The new bot authenticates on sign-in and auto-fetches recent orders. One tap selects the relevant order. This single change recovered nearly a third of abandoning users.
During usability testing, users described the bot's instant responses as "unnatural" and "irritating." The bot was too fast, which paradoxically eroded trust. I introduced a 3-5 second simulated thinking delay with a typing indicator before responses. This led to a 60% improvement in user perception. In a channel where users expect human-like interaction, machine-speed responses feel wrong.
In 2021, Zalando processed 3.7 million manual refunds through customer care, costing roughly €10 million. For clear-cut cases (missing items, parcels lost by carrier), the bot now performs risk checks and initiates refunds directly in the conversation. Instant resolution that customers never expected from a chatbot.
50% of delivery contacts happened before the parcel was actually delivered, and 66% of those came before the delivery promise date had even passed. I designed a closed-loop email system: the bot acknowledges the issue, sets expectations, and proactively follows up when the status changes. Customers don't need to contact support again to check.
Three principles that guided every decision
Each principle came directly from research findings, not best practice assumptions.
NLP had a hard ceiling for multi-intent queries in 2022. Rather than chase marginal improvements in understanding, I removed the need for customers to describe problems in the bot's limited vocabulary. Structured menus covered 95% of intents through tap-to-select flows.
Hick's Law (reduced decision complexity)30% of users dropped off at the order number prompt. The bot now fetches recent orders on sign-in and displays them visually. One tap replaces the interrogation loop that was driving customers to game the handover system.
Context first, questions secondUsers described instant responses as 'unnatural' and 'irritating.' A brief thinking delay with a typing indicator made the experience feel considered rather than automated. 60% improvement in user perception. In conversational AI, trust beats speed.
Counterintuitive discovery from testingFor Milestone 2 (returns/refunds), I co-led qualitative research with 7 participants across Germany and the Netherlands, testing three hypotheses. 7/7 identified the right menu and submenu options. 6/7 correctly identified parcels.
But 3 out of 4 users found 'Partial Refund' and 'Incorrect Refund' confusing because the labels sounded interchangeable.
That triggered a redesign of the refund issue menu before launch. 6 out of 6 found reassurance messages clear, and post-chat email confirmations were consistently valued.
The experiment proved it at scale
of all customer inquiries flowed through the updated system by Q4 2022.
Expanded to Poland, Italy, Netherlands, Austria, Sweden, Denmark
Launched on native Android and iOS apps
France, Belgium, and Switzerland expansion launched within 3 months.
“Before this user test, I would have opted to call instead of using the chat. Just now, I realize how efficient and workable the chat is. It gives me all the information I need, and it's easy to use.”
“I really like that all the information about my return is readily available through the chatbot. I don't have to manually enter any input, which makes the process much more convenient.”
Building trust through transparency
The biggest shift in this project was understanding that chatbot adoption isn't a usability problem. It's a trust problem.
Every design decision, from guided flows to simulated thinking delays to proactive order fetching, was about rebuilding confidence that the bot could actually help.
Starting with a content strategist from day one would've baked empathy into the foundation, not patched it in later. We spent weeks refining tone after the structure was built. That late-stage work showed a 12% lift in sentiment scores, but imagine the impact if we'd designed conversation flows and emotional framing together from the start.
Push for a broader qualitative sample. 7 participants gave clear direction, but a larger panel across additional markets would've caught issues like the partial/incorrect refund confusion earlier.
Designing around technological constraints, rather than fighting them, often produces better experiences than the unconstrained version would have. Guided buttons weren't a compromise because NLP was limited. They were genuinely better for users who feared typing the wrong thing. Sometimes constraints push you toward the right answer.