memcity

Reference

Use Cases

Real-world examples of Memcity in production, including how it powers the search and AI chat on this very site.

How This Site Uses Memcity

You're reading this on memcity.dev — and the search and AI chat features you see on this site are powered by Memcity itself. We eat our own dogfood. Here's exactly how it works.

The Docs Search (Cmd+K)

Press Cmd+K (or Ctrl+K) anywhere on the docs and you'll see a search dialog. This isn't a simple keyword filter — it's powered by Memcity's full RAG pipeline.

How it works:

  1. All 13 documentation pages are ingested into a Memcity knowledge base at build time
  2. Each page is chunked into ~512 token segments with heading metadata preserved
  3. When you type a query like "how does reranking work?", it goes through the full pipeline:
    • Query is embedded via Jina v4
    • Semantic search finds chunks about reranking
    • BM25 keyword search catches exact terms like "Jina Reranker v3"
    • RRF fusion merges both result sets
    • Citations are generated with page and heading breadcrumbs
  4. Results link directly to the relevant section of the relevant doc page

Why this matters: You can search in natural language. "What happens when I go over my limit?" finds the quotas documentation even though the word "limit" doesn't appear in the page title. "How do I make search faster?" finds the performance tuning section of the search pipeline page.

The AI Chat Widget

The floating "Ask AI" button at the bottom-right of every docs page opens a chat widget powered by Memcity + Kimi 2.5 (via OpenRouter).

How it works:

  1. You type a question like "How do I set up ACLs?"
  2. The question is sent to a Convex HTTP action at /chat
  3. The HTTP action searches the Memcity knowledge base with your question
  4. The top matching chunks are assembled into a context block
  5. The context + your question are sent to Kimi 2.5 via OpenRouter
  6. The LLM generates an answer grounded in the actual documentation
  7. The response is streamed back in real-time

The key insight: The chatbot doesn't hallucinate because it only answers based on documentation chunks that Memcity retrieved. If the docs don't cover a topic, the chatbot says "I don't have information about that" instead of making something up.

Here's the actual Convex HTTP action that powers the chat:

ts
// convex/http.ts — the /chat endpoint
http.route({
  path: "/chat",
  method: "POST",
  handler: httpAction(async (_ctx, request) => {
    const apiKey = process.env.OPENROUTER_API_KEY;
    const { messages } = await request.json();
 
    // The system prompt contains all documentation knowledge
    // Memcity's RAG pipeline could also be used here to
    // dynamically retrieve relevant chunks per query
 
    const response = await fetch(
      "https://openrouter.ai/api/v1/chat/completions",
      {
        method: "POST",
        headers: {
          Authorization: `Bearer ${apiKey}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify({
          model: "moonshotai/kimi-k2",
          messages: [
            { role: "system", content: DOCS_SYSTEM_PROMPT },
            ...messages,
          ],
          stream: true,
        }),
      }
    );
 
    // Stream the response back in AI SDK format
    // (compatible with useChat from "ai/react")
    return streamResponse(response);
  }),
});

The Architecture

typescript
User's Browser

    ├── Docs Search (Cmd+K)
    │   └── Convex HTTP ActionMemcity getContext()
    │       ├── Jina v4 (embeddings)
    │       ├── Vector + BM25 search
    │       ├── RRF fusion + citations
    │       └── Return ranked results with breadcrumbs

    └── AI Chat Widget
        └── Convex HTTP Action/chat
            ├── Memcity getContext() (find relevant chunks)
            ├── Assemble context prompt
            └── OpenRouterKimi 2.5 (generate answer)
                └── Stream response back to browser

Stack:

  • Frontend: Next.js 15, React 19, Tailwind CSS v4
  • Backend: Convex (actions, HTTP routes, storage)
  • Search: Memcity component (Pro tier)
  • Embeddings: Jina v4 (1,024 dimensions)
  • LLM: Kimi 2.5 via OpenRouter
  • UI: shadcn-style command palette (cmdk) for search

The Ingestion Script

All docs are ingested at deploy time. Here's how:

ts
// scripts/ingest-docs.ts
import { Memory } from "memcity";
import { components } from "../convex/_generated/api";
import fs from "fs";
import path from "path";
 
const memory = new Memory(components.memcity, {
  tier: "pro",
  ai: { gateway: "openrouter", model: "google/gemini-2.0-flash-001" },
});
 
async function ingestDocs(ctx: any) {
  const orgId = await memory.createOrg(ctx, { name: "Memcity" });
  const kbId = await memory.createKnowledgeBase(ctx, {
    orgId,
    name: "Documentation",
    description: "Memcity documentation pages",
  });
 
  const docsDir = path.join(process.cwd(), "content/docs");
  const files = fs.readdirSync(docsDir).filter(f => f.endsWith(".mdx"));
 
  for (const file of files) {
    const content = fs.readFileSync(path.join(docsDir, file), "utf-8");
    // Strip frontmatter
    const body = content.replace(/^---[\s\S]*?---/, "").trim();
 
    await memory.ingestText(ctx, {
      orgId,
      knowledgeBaseId: kbId,
      text: body,
      source: file.replace(".mdx", ""),
    });
 
    console.log(`Ingested: ${file}`);
  }
 
  return { orgId, kbId, docsIngested: files.length };
}

Use Case: Customer Support Bot

The Problem

Your support team answers the same questions hundreds of times:

  • "How do I reset my password?"
  • "What's your refund policy?"
  • "My order hasn't arrived, what do I do?"

Each answer exists somewhere in your help docs, FAQ, or past tickets — but customers can't find it, so they email support.

The Solution

Ingest all your support content into Memcity and build a chatbot that searches it.

ts
// convex/support.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import { Memory } from "memcity";
import { components } from "./_generated/api";
 
const memory = new Memory(components.memcity, {
  tier: "pro",
  ai: { gateway: "openrouter", model: "google/gemini-2.0-flash-001" },
  search: {
    enableQueryRouting: true,
    reranking: true,
    enableHyde: true,
  },
});
 
// Ingest your help center articles
export const ingestHelpDocs = action({
  args: { orgId: v.string(), kbId: v.string() },
  handler: async (ctx, { orgId, kbId }) => {
    const articles = [
      {
        text: `# Password Reset\n\nTo reset your password:\n1. Go to account settings\n2. Click "Security"\n3. Click "Reset Password"\n4. Check your email for a reset link\n\nLinks expire after 1 hour.`,
        source: "password-reset.md",
      },
      {
        text: `# Refund Policy\n\nWe offer full refunds within 30 days of purchase.\n- Digital products: refund processed within 24 hours\n- Physical products: return the item first, refund within 5 business days\n- Subscriptions: prorated refund for unused time`,
        source: "refund-policy.md",
      },
      // ... more articles
    ];
 
    await memory.batchIngest(ctx, {
      orgId,
      knowledgeBaseId: kbId,
      documents: articles,
    });
  },
});
 
// The chatbot search endpoint
export const supportSearch = action({
  args: {
    orgId: v.string(),
    kbId: v.string(),
    query: v.string(),
    userId: v.optional(v.string()),
  },
  handler: async (ctx, args) => {
    const results = await memory.getContext(ctx, {
      orgId: args.orgId,
      knowledgeBaseId: args.kbId,
      query: args.query,
      userId: args.userId, // Include customer history if available
    });
 
    return results;
  },
});

How episodic memory helps: If a customer contacts support about the same issue twice, the bot remembers the first interaction:

ts
// After resolving a ticket
await memory.addMemory(ctx, {
  orgId,
  userId: customerId,
  content: "Had an issue with order #4521 not arriving. Was reshipped on 2024-03-10.",
  type: "fact",
});
 
// Next time: "My order still hasn't arrived"
// → Bot retrieves the memory → knows about order #4521 → can reference the reshipment

Use Case: Internal Knowledge Base for Teams

The Problem

Your company's knowledge is scattered everywhere — Notion pages, Slack messages, Google Docs, PDF policies, spreadsheets. When someone needs to find the code review process, they search 4 different tools and ask 3 colleagues.

The Solution

Ingest everything into Memcity and give your team a single search interface.

ts
// Ingest from multiple sources
// PDF company handbook
const uploadUrl = await memory.getUploadUrl(ctx);
// ... upload handbook.pdf ...
await memory.processUploadedFile(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  storageId: handbookStorageId,
  fileName: "employee-handbook.pdf",
});
 
// Markdown engineering docs
await memory.ingestText(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  text: engineeringDocsMarkdown,
  source: "engineering-standards.md",
  principals: ["group:engineering"], // Only engineers can see this
});
 
// HR policies (restricted access)
await memory.ingestText(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  text: salaryBands,
  source: "salary-bands.md",
  principals: ["group:hr", "role:admin"], // HR and admins only
});
 
// Spreadsheet with project timelines
await memory.processUploadedFile(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  storageId: timelineStorageId,
  fileName: "project-timelines.xlsx",
});

How ACLs help: HR documents are only visible to HR team members. Engineering docs are visible to engineers. Company-wide policies are visible to everyone. The knowledge graph connects people mentioned in the handbook to projects mentioned in the timeline.

Use Case: AI-Powered SaaS Application

The Problem

You're building a SaaS product (like a project management tool or CRM) and want to add AI-powered search. But each customer's data must be completely isolated, and different pricing tiers should have different capabilities.

The Solution

Use Memcity's multi-org support with per-tenant knowledge bases and quotas.

ts
// When a new customer signs up
export const onboardCustomer = action({
  args: { companyName: v.string(), plan: v.string() },
  handler: async (ctx, { companyName, plan }) => {
    // Create an org for this customer
    const orgId = await memory.createOrg(ctx, {
      name: companyName,
    });
 
    // Create their knowledge base
    const kbId = await memory.createKnowledgeBase(ctx, {
      orgId,
      name: `${companyName} Knowledge Base`,
    });
 
    // Set quotas based on their plan
    const quotas = {
      free:       { searchesPerDay: 100,  maxDocuments: 50   },
      starter:    { searchesPerDay: 1000, maxDocuments: 500  },
      enterprise: { searchesPerDay: Infinity, maxDocuments: Infinity },
    };
 
    await memory.setQuota(ctx, {
      orgId,
      quotas: quotas[plan as keyof typeof quotas],
    });
 
    return { orgId, kbId };
  },
});

How RAPTOR helps: When a user asks "Give me an overview of project Alpha", individual chunks are too granular. RAPTOR's hierarchical summaries provide document-level and section-level overviews that answer big-picture questions.

Use Case: Educational Platform

The Problem

You're building an online learning platform. Students need help understanding course material, but a generic chatbot gives generic answers. You want a tutor that knows the specific course content and adapts to each student's progress.

The Solution

Ingest course content into knowledge bases and use episodic memory to track each student's progress.

ts
// Ingest course material
await memory.ingestText(ctx, {
  orgId,
  knowledgeBaseId: pythonCourseKbId,
  text: `
    # Module 3: Lists and Arrays
 
    A list in Python is an ordered collection of items.
    Lists are mutable — you can change their contents after creation.
 
    ## Creating Lists
    my_list = [1, 2, 3, 4, 5]
    names = ["Alice", "Bob", "Charlie"]
    mixed = [1, "hello", True, 3.14]
 
    ## Accessing Elements
    Lists use zero-based indexing:
    first = my_list[0]   # 1
    last = my_list[-1]    # 5
 
    ## List Methods
    my_list.append(6)     # Add to end
    my_list.insert(0, 0)  # Insert at position
    my_list.remove(3)     # Remove first occurrence
    my_list.pop()         # Remove and return last item
  `,
  source: "module-3-lists.md",
});
 
// Register a student
const studentId = await memory.createUser(ctx, {
  orgId,
  externalId: "student_alice",
  name: "Alice",
});
 
// After a tutoring session, record what the student learned
await memory.addMemory(ctx, {
  orgId,
  userId: studentId,
  content: "Completed Module 3: Lists. Struggled with nested list indexing — took 3 attempts to understand list[1][2] syntax.",
  type: "fact",
});
 
await memory.addMemory(ctx, {
  orgId,
  userId: studentId,
  content: "Learns best with step-by-step examples. Gets confused by abstract explanations.",
  type: "preference",
});

How memory decay helps: If Alice hasn't practiced lists in 2 weeks, that memory has decayed. When she starts Module 5 (which uses lists), the tutoring system detects the weakened memory and suggests a quick review before moving on.

ts
// The tutoring search includes student context
const results = await memory.getContext(ctx, {
  orgId,
  knowledgeBaseId: pythonCourseKbId,
  query: "How do I iterate over a dictionary?",
  userId: studentId,  // Includes Alice's learning preferences and history
});
 
// Results include:
// 1. Course content about dictionaries
// 2. Alice's memory: "Learns best with step-by-step examples"
// 3. Alice's memory: "Struggled with nested indexing" (relevant because
//    dictionary iteration is conceptually similar)
//
// The LLM can use this to generate a step-by-step explanation
// and proactively mention how dict iteration differs from list indexing

Building Your Own Use Case

Every use case follows the same pattern:

  1. Create an org and knowledge base — Container for your data
  2. Ingest content — Text, files, URLs — whatever you have
  3. Search — Natural language queries through the RAG pipeline
  4. Personalize (optional) — Add episodic memory for per-user context
  5. Secure (optional) — Add ACLs for multi-tenant data isolation
  6. Monitor (optional) — Add audit logging and quotas for production

The RAG pipeline handles the complexity. You focus on your application logic.

Start with the Getting Started guide, then customize your Configuration for your specific needs.