Architecting Context-Aware Conversation Flows for E-commerce Chatbots

E-commerce chatbots fail when they treat every message in isolation. A user says "I'm looking for running shoes," then "something under $100," then "in blue"—and the bot asks which category they want. At Lumin AI, we built context-aware flows so the assistant remembers the conversation and can recommend products and answer follow-ups without starting over. Here's how we structured it and what we'd keep.

Tech Stack

Next.js + React for the chat UI and real-time updates
Nest.js for the backend and orchestration
MongoDB for conversation history and user context
Redis for caching conversation state and recent context windows
Supabase for real-time channels and auth where needed
TypeScript end-to-end for type-safe message and context shapes
LLM / NLU layer for intent and entity extraction and response generation

The Challenge

Two things made this hard. First, context across turns: we had to keep a compact representation of what the user had already said (product type, budget, preferences) and feed it into each new turn so the model didn't hallucinate or ignore prior answers. Dumping full conversation history blows up token count and latency—research on context window size shows latency can increase many times once you push past a few thousand tokens. Second, latency: if the bot takes more than a couple of seconds to reply, the conversation feels broken. We had to fetch context, call the model, optionally hit the product API, and stream the reply without blowing the 2s budget. Doing that in 15+ languages added another constraint—we couldn't rely on one monolithic prompt; we needed structured context that worked across locales.

The Solution

1. Typed conversation context

We modeled the current "state" of the conversation so the backend and the LLM always had a consistent view of what the user wanted. That made it easy to pass only the relevant slice into the context window instead of dumping full history.

Tsx

// types/conversation.ts
export interface ConversationContext {
  sessionId: string;
  locale: string;
  intent: 'browse' | 'support' | 'recommendation' | 'unknown';
  entities: {
    category?: string;
    priceMax?: number;
    attributes?: Record<string, string>;
  };
  lastProductIds: string[];
  turnCount: number;
  updatedAt: string;
}

2. Context cache and short history

We didn't send the entire thread to the model every time. We kept a rolling summary plus the last N messages in Redis keyed by session, and only loaded that for each request. That pattern—durable session state plus a small context window—matches what production LLM apps need for conversation memory and context management. Token usage stayed predictable and response time under control.

Tsx

// Simplified: load or build context for this turn
async function getContextForTurn(sessionId: string, newMessage: string): Promise<ConversationContext> {
  const cached = await redis.get<ConversationContext>(`ctx:${sessionId}`);
  const base = cached ?? { sessionId, locale: 'en', intent: 'unknown', entities: {}, lastProductIds: [], turnCount: 0, updatedAt: new Date().toISOString() };

  const updated = await extractAndMergeEntities(base, newMessage);
  await redis.setex(`ctx:${sessionId}`, 3600, JSON.stringify(updated)); // 1h TTL

  return updated;
}

3. Intent and entities drive the flow

We used the LLM (or a smaller NLU step) to update intent and entities from each message, then branched logic on that instead of free-form text. For "show me blue ones under $100" we'd set category/color and priceMax, then the recommendation path could call the product API with filters and inject results into the reply. That gave us consistent behavior and made it possible to tune prompts per intent and locale.

The Result

On Lumin AI we got average response time under 2 seconds, 60% reduction in support response time, and 25% higher customer satisfaction. Multilingual support (15+ languages) worked because the context structure was locale-agnostic—we only swapped prompts and copy per language. The analytics we built on top showed which intents and flows drove the most conversions, so we could double down on what worked. Context-aware flows weren't a gimmick; they were what made the chatbot actually useful. I'm currently looking for new challenges in the AI and Full Stack space. If you're building something interesting, let's chat.