Vijay Raina, a leading authority in SaaS and software architecture, has spent years advising enterprise organizations on how to build systems that are both robust and user-centric. As artificial intelligence moves from the experimental fringes to the core of product design, Raina has become a vocal advocate for “Probabilistic Design,” a framework that challenges the traditional, rigid ways we build software. He suggests that the greatest risk in the current AI gold rush is not the technology itself, but our tendency to wrap unpredictable, pattern-matching models in deterministic interfaces that promise a certainty they cannot deliver. In this discussion, we explore how to navigate this shift, moving away from binary “yes/no” systems toward a more nuanced approach that embraces uncertainty, accounts for historical bias, and prioritizes long-term resilience over short-term conversion metrics.
The 2024 Air Canada tribunal case showed us the devastating consequences of a chatbot hallucinating a refund policy. Why do these probabilistic systems often present predictions as absolute truths, and what is the underlying risk to the organization?
The Air Canada incident is a perfect, albeit painful, example of what happens when a company treats a statistical guess as a concrete corporate mandate. In that 2024 ruling, the tribunal essentially told the airline that if their bot says it’s policy, then for the customer, it is policy, regardless of what the “real” rules say. The core of the problem is that we are wrapping probabilistic systems—which are essentially just predicting the next most likely word—in deterministic interfaces that look and feel like a human authority. When a user sees a clean, professional chat window, they don’t see a prediction based on training patterns; they see a commitment from the brand. This creates a massive organizational risk because it eliminates the nuance and the “escape hatches” that a human agent would naturally provide. We are essentially automating the process of making promises we might not be able to keep, and that erodes trust faster than any technical glitch ever could.
You’ve mentioned that humans are naturally wired for deterministic thinking, often struggling to grasp the fluid nature of AI. How does the “coin flip” analogy help designers transition to a more probabilistic mindset?
If you flip a coin 999 times and it lands on heads every single time, our deterministic brains are screaming that the 1000th flip must be heads because the coin is clearly rigged, or perhaps it’s “due” for a tails. We desperately want to believe that past actions or patterns dictate a guaranteed future outcome, but the probabilistic mind understands that the 1000th flip is still a 50-50 shot. In design, we often fall into the trap of thinking that because a user did “X” in the past, they will definitely do “Y” in the future, and AI reinforces this by finding those past patterns. If we don’t break out of that mindset, we build fragile experiences that shatter the moment a user deviates from the statistical norm. Designers need to hold onto the uncomfortable truth that even the most high-confidence prediction is still just one possible path among millions, much like Doctor Strange’s vision of the future in the Avengers.
How should a designer’s approach change when an AI model reports a 60% confidence score versus a 90% confidence score for a specific user action?
The difference between 60% and 90% isn’t just a number; it’s a completely different design problem that requires a unique visual and functional vocabulary. At 60% confidence, the user is likely undecided or lacks information, so the interface needs to step up its persuasive game by offering testimonials, detailed comparisons, and reassuring signals to help them move forward. You can’t just push them through the funnel; you have to earn their trust with transparency. Conversely, when the system hits that 90% confidence mark, the user is already motivated and ready to act, so your primary job is to get out of their way and remove every ounce of friction. If you use the same “one-size-fits-all” layout for both scenarios, you end up annoying the high-intent users and confusing the low-intent ones, which is a recipe for high bounce rates.
Simulations are often touted as a way to “test” designs before they go live, but you’ve warned that they aren’t a replacement for real experimentation. How can teams use structured prompts to evaluate accessibility for groups like neurodivergent users without falling into the trap of false certainty?
I view AI simulations as a powerful conversation starter and a way to highlight blind spots, but they are absolutely not a verdict on whether a design works. When we use a prompt to evaluate a design for neurodivergent users—looking at ADHD or autism spectrum disorder—we are asking the model to perform a SWOT analysis based on patterns in its training data. It might flag that a navigation flow is too sensory-heavy or that the language isn’t intuitive enough, which gives the team a “probability score” for successful use. However, you have to remember that the model is looking backward at historical data, not forward at your specific, unique user base. If you use these simulations to replace actual testing with people, you risk optimizing for a stereotype rather than a reality, which is why I always tell my teams to use the output as a hypothesis to be tested, not a final approval.
The Amazon recruitment tool that was scrapped due to gender bias is a famous cautionary tale in the industry. What does this teach us about the relationship between historical data and the “truth” of an AI’s output?
Amazon’s experiment with that recruitment tool is the ultimate warning that AI doesn’t just find patterns; it inherits our history, flaws and all. The model was trained on 10 years of historical hiring decisions, and because the tech industry was—and is—predominantly male, the AI learned that being male was a prerequisite for success. It started penalizing resumes that mentioned “women’s” clubs or female-centric organizations, not because it was programmed to be sexist, but because the data it was fed was a mirror of past biases. It proves that what an AI presents as the “statistically likely” best candidate isn’t necessarily the truth; it’s just a reflection of what happened before. As designers and architects, we have to ask if the past is a fair map for the future we want to build, and if it isn’t, we have to intervene manually because the model won’t do it for us.
Transparency is often cited as a fix for the “black-box” nature of AI, but how does showing the reasoning behind a recommendation actually change the way a user interacts with the product?
Transparency is the bridge that allows a user to move from blind faith to informed judgment. When a system is a black box, the user has two choices: trust it completely or reject it entirely, and neither of those is a healthy way to interact with technology. By revealing the reasoning, the sources, and the summaries behind a recommendation, we give the user the tools to calibrate their own trust. If a facial recognition tool says “this looks like Pratik, is that right?” instead of just slapping a name on the photo, it acknowledges its own fallibility and invites the user to be a partner in the process. This kind of honesty doesn’t weaken the product; it actually strengthens the relationship because the user feels in control, and they are much more likely to forgive a mistake if the system was upfront about its uncertainty.
You suggest a “predict, test, learn, adjust, repeat” loop for experimentation. How does reducing the steps in an onboarding flow, say from 5 steps down to 3, serve as a test of a behavioral assumption?
In a probabilistic framework, we aren’t just testing a feature to see if it “works”; we are testing a specific hypothesis about human behavior. For example, we might believe that reducing an onboarding flow from 5 steps to 3 will increase completion because we suspect users are suffering from decision fatigue. We set a clear metric—maybe we’re looking for a 15% increase in step-to-step conversion—and we use AI to model the potential outcome before we ever write a line of code. If the 3-step version actually works, we haven’t just won a higher conversion rate; we’ve gained a validated learning about our users’ cognitive load. This approach allows us to fail fast on the ideas that don’t hold water and double down on the ones that do, turning every experiment into a building block for a more resilient system.
Why is it important to have multiple versions of an experience living side by side rather than searching for one “perfect” design?
The idea of a single “perfect” design is a relic of the deterministic age that simply doesn’t scale in a world of diverse user needs and AI-driven personalization. Different users have different motivations; a high-intent power user might find a “minimal” checkout experience liberating, while a first-time, skeptical user might find it suspicious and prefer a version with more reassurance. By embracing multi-versions, we can serve different segments of our audience simultaneously, using AI to route them to the experience that has the highest likelihood of success for their specific context. It’s about moving away from the “risky bet” of a single large change and instead managing a portfolio of probabilities that can adapt as the market and user behavior shift.
When communicating uncertainty, you mentioned using delivery windows like “Friday to Monday” instead of a specific timestamp. How does this kind of “honest variability” impact user trust over time?
A specific timestamp is a deterministic promise that is almost guaranteed to be broken by the messy reality of traffic, weather, and logistics. Every time a “guaranteed” 2:00 PM delivery slips to 2:15 PM, a tiny piece of user trust is chipped away because the system lied. However, when you provide a range like “Friday to Monday,” you are telling the truth about the variability of the situation. You are setting an honest expectation that the system can actually meet, which paradoxically makes the user feel more secure because they aren’t being over-promised. In UX, the goal isn’t to eliminate the feeling of uncertainty—it’s to design for it intelligently so the user knows exactly what kind of variability they should expect.
Users often fall into categories like “overtrusting” or “distrustful” when dealing with AI. What specific design goals should we have for a user who tends to follow AI suggestions too blindly?
For the overtrusting user, the design goal has to be about slowing them down and forcing them to see the seams in the machine. These are the users who will take a chatbot’s “hallucination” and run with it, potentially leading to the kind of legal or financial mess we saw with the airline. In these cases, we need to show uncertainty more prominently, perhaps by using confidence indicators or explicit warnings that the content is AI-generated and needs verification. We have to create intentional friction that breaks their “auto-pilot” mode and reminds them that they are still the final decision-maker. It’s about protecting the user from their own tendency to outsource their judgment to a statistical model.
Human-in-the-loop (HITL) is often seen as a safety net, but you’ve called it a “refinement engine.” How do tools like GitHub Copilot or Gmail’s Smart Compose use human feedback to improve their underlying models?
GitHub Copilot and Gmail are masterclasses in subtle, high-quality feedback loops. They don’t force a choice; they offer a suggestion that the user can accept with a tab, edit to fit their needs, or ignore completely. Every one of those actions—the accepts, the edits, and the ignores—is a data point that is far more valuable than passive analytics. When a developer edits a Copilot suggestion, they are providing a direct correction that the model can learn from, refining its future predictions. HITL isn’t just about preventing mistakes in the moment; it’s about creating a continuous stream of human expertise that feeds back into the system, ensuring the AI grows more aligned with human intent over time.
In safety-critical domains like healthcare, where the stakes are life and death, how does the role of the “human reviewer” change the design of the AI interface?
In healthcare, the AI is never the pilot; it is always the co-pilot, and the interface must reflect that hierarchy with absolute clarity. The system might flag an anomaly in an X-ray or suggest a potential diagnosis, but it must also provide the “why” behind that suggestion—the specific data points or patterns it detected. The design has to support the clinician’s authority, providing them with the details they need to either validate or override the machine without feeling pressured or rushed. We also have to log every override and capture the context of why the human disagreed, because those moments of disagreement are where the most critical learning happens for the entire medical system.
You’ve highlighted Duolingo’s “hearts” system and Meta’s pivot to “meaningful social interactions” as examples of optimizing for long-term health over short-term wins. Why is this distinction so vital for AI products?
If you only optimize for the next click or the immediate conversion, you can very easily build a product that is successful in the short term but toxic in the long run. Duolingo’s hearts system is a fascinating choice because it actually adds friction; it stops you from binging lessons if you make too many mistakes, which on paper looks like a conversion killer. But by forcing you to slow down and practice, they are optimizing for the metric that actually matters: long-term retention and actual learning. AI makes it incredibly easy to “hack” human psychology for engagement, but if we don’t look at the second-order effects—like user burnout or the erosion of social trust—we end up with a fragile ecosystem that will eventually collapse under the weight of its own unintended consequences.
What is your forecast for the future of AI-driven UX as we move past this initial phase of deterministic chatbots?
I believe we are heading toward a world where the “static” interface disappears entirely, replaced by a highly adaptive, fluid environment that reshapes itself in real-time based on shifting probabilities. We will stop designing “pages” and start designing “states” and “ranking rules” that allow the system to respond to a user’s intent with surgical precision. However, this only works if we maintain our human judgment and keep asking “what else might be true?” rather than blindly following the most likely statistical path. The future of design isn’t about achieving perfection or total automation; it’s about building resilient, transparent systems that can dance with uncertainty and still deliver value even when the data is noisy and the predictions are wrong.
