Getting Started
Configuration
Every configuration option explained with examples, defaults, and tier requirements.
How Configuration Works
When you create a Memory instance, you pass a configuration object. Memcity deep-merges your config with sensible defaults — you only need to specify what you want to change.
const memory = new Memory(components.memcity, {
tier: "pro",
ai: { gateway: "openrouter" },
// Everything else uses defaults
});Tier enforcement is automatic. If you're on the Community tier and try to enable a Pro feature, Memcity silently overrides it to the default value. No errors, no crashes — it just uses what your tier supports.
// On Community tier, this config:
const memory = new Memory(components.memcity, {
tier: "community",
search: { reranking: true }, // Pro+ feature
});
// Behaves exactly like this:
const memory = new Memory(components.memcity, {
tier: "community",
search: { reranking: false }, // Silently overridden
});The Full MemoryConfig Interface
Here's the complete TypeScript interface showing every option:
interface MemoryConfig {
tier: "community" | "pro" | "team";
ai: {
gateway: "openrouter" | "vercel";
model: string;
};
search: {
maxResults: number;
minScore: number;
weights: {
semantic: number;
bm25: number;
};
enableQueryRouting: boolean; // Pro+
enableQueryDecomposition: boolean; // Pro+
enableHyde: boolean; // Pro+
reranking: boolean; // Pro+
maxQueryExpansions: number; // Pro+
maxChunkExpansions: number; // Pro+
};
chunking: {
strategy: "recursive" | "fixed";
chunkSize: number;
chunkOverlap: number;
};
graph: { // Pro+
enabled: boolean;
traversalStrategy: "breadth_first" | "best_first" | "hybrid";
maxDepth: number;
maxNodes: number;
};
enterprise: { // Team only
acl: boolean;
auditLog: boolean;
quotas: boolean;
};
}AI Configuration
Gateway: OpenRouter vs Vercel
The ai.gateway option controls how Memcity accesses language models:
| OpenRouter | Vercel AI Gateway | |
|---|---|---|
| Setup | Set OPENROUTER_API_KEY env var | Uses Vercel's built-in credentials |
| Models | 200+ models from all providers | OpenAI, Anthropic, Google |
| Pricing | Pay-per-token via OpenRouter | Pay-per-token via Vercel |
| Best for | Most users, widest model selection | Vercel-deployed apps wanting simplicity |
| Fallbacks | Automatic model fallbacks | Limited fallback support |
// OpenRouter (recommended for most users)
ai: {
gateway: "openrouter",
model: "google/gemini-2.0-flash-001",
}
// Vercel AI Gateway
ai: {
gateway: "vercel",
model: "gpt-4o-mini",
}Model Selection
The model is used for reasoning tasks — query routing, entity extraction, HyDE generation, query decomposition. It is not used for embeddings (those always use Jina v4).
| Model | Cost | Quality | Speed | Best For |
|---|---|---|---|---|
google/gemini-2.0-flash-001 | Low | Good | Fast | Default choice, good balance |
gpt-4o-mini | Low | Good | Fast | If you prefer OpenAI |
anthropic/claude-3.5-haiku | Low | Good | Fast | If you prefer Anthropic |
google/gemini-2.5-pro-preview | High | Excellent | Slow | Maximum quality entity extraction |
anthropic/claude-sonnet-4 | High | Excellent | Medium | Complex reasoning tasks |
Recommendation: Start with google/gemini-2.0-flash-001. It's fast, cheap, and good enough for most use cases. Only upgrade if you need better entity extraction or query understanding.
Search Configuration
maxResults
How many results to return from a search. Default: 10.
When to change: If you're building a chat interface, 3-5 results is usually enough context. If you're building a search results page, 10-20 gives users more to browse.
search: {
maxResults: 5, // For chat: fewer but more focused results
}minScore
The minimum relevance score (0-1) a result must have to be included. Default: 0.1.
When to change: If you're getting too many low-quality results, raise this to 0.3 or 0.5. If you're getting too few results, lower it to 0.05.
search: {
minScore: 0.3, // Only return results that are at least 30% relevant
}weights: semantic vs bm25
These control how much weight to give semantic (meaning-based) search vs BM25 (keyword-based) search. They must sum to 1.0. Default: 0.7 semantic, 0.3 BM25.
What's the difference?
- Semantic search understands meaning. "How do I cancel my subscription?" matches "To terminate your plan, visit account settings" even though the words are different.
- BM25 search matches keywords. "error code 4012" matches documents containing exactly "error code 4012". It's precise but doesn't understand synonyms.
| Use Case | Semantic | BM25 | Why |
|---|---|---|---|
| Natural language Q&A | 0.8 | 0.2 | Users ask in their own words |
| Technical documentation | 0.6 | 0.4 | Function names and codes matter |
| Code search | 0.3 | 0.7 | Exact identifiers are critical |
| Legal/compliance docs | 0.5 | 0.5 | Both exact terms and concepts matter |
search: {
weights: {
semantic: 0.6, // Understanding matters
bm25: 0.4, // But exact terms also matter
},
}enableQueryRouting (Pro+)
When enabled, Memcity classifies each query as simple, moderate, or complex before processing. This determines which pipeline steps activate:
- Simple queries ("What is X?") skip decomposition and HyDE — they're fast.
- Moderate queries ("How does X compare to Y?") use query expansion but skip decomposition.
- Complex queries ("What are the implications of X on Y and Z?") use the full pipeline.
Default: false. When to enable: When your users ask a mix of simple and complex questions and you want to optimize for both speed and quality.
search: {
enableQueryRouting: true,
}enableQueryDecomposition (Pro+)
Breaks complex queries into simpler sub-queries that are searched independently, then results are merged.
Before decomposition:
"Compare the vacation policy with the sick leave policy and explain which is more generous"
After decomposition:
- "What is the vacation policy?"
- "What is the sick leave policy?"
- "How do vacation days compare to sick days in terms of quantity?"
Each sub-query gets its own search, and results are merged. This dramatically improves recall for complex questions.
Default: false. When to enable: When users ask multi-part or comparative questions.
enableHyde (Pro+)
HyDE stands for Hypothetical Document Embeddings. Instead of just embedding the query, Memcity asks the LLM: "If a document existed that perfectly answered this question, what would it say?" Then it embeds that hypothetical answer and searches for real documents similar to it.
Why it works: Queries are short ("refund policy?") but answers are long and detailed. A hypothetical answer is more similar to the actual document than the short query is.
Example:
- Query: "refund policy"
- HyDE generates: "Our refund policy allows customers to return products within 30 days of purchase for a full refund. Items must be unused and in original packaging..."
- This hypothetical text matches the real refund policy document much better than the two-word query would.
Default: false. When to enable: When users ask short questions about topics with detailed documentation.
search: {
enableHyde: true,
}reranking (Pro+)
After the initial search retrieves candidates, a reranker (Jina Reranker v3) re-scores them using a cross-encoder model that looks at the query and each candidate together.
Why initial ranking isn't enough: The initial search uses separate embeddings for the query and documents. A reranker directly compares each pair, which is more accurate but slower (you can't rerank thousands of results, only the top candidates).
Think of it like a hiring process: the initial search is the resume screening (fast, approximate), and the reranker is the interview (slower, more accurate).
Default: false. When to enable: Almost always. This is the single most impactful quality improvement for most use cases. The latency cost (~100ms) is usually worth it.
search: {
reranking: true,
}maxQueryExpansions (Pro+)
How many semantic variations of the query to generate. Default: 3.
Example: For the query "Python web frameworks", expansions might be:
- "Django Flask FastAPI web development Python"
- "Building web applications with Python"
- "Python HTTP server frameworks comparison"
More expansions improve recall but increase latency and cost. Range: 1-5.
maxChunkExpansions (Pro+)
For top results, how many surrounding chunks to fetch for additional context. Default: 2.
Think of it like reading a book — if a sentence matches your query, you probably want to read the paragraph (or page) around it. Chunk expansion gives you that context.
search: {
maxChunkExpansions: 3, // Fetch 3 chunks before and after each result
}Chunking Configuration
What is Chunking?
When you ingest a document, Memcity splits it into "chunks" — smaller pieces of text. Each chunk gets its own embedding and can be retrieved independently.
Why not just embed the whole document? Because embeddings work best on focused pieces of text. A 50-page document embedded as one vector loses detail. But a 512-token chunk about "refund policy" creates a precise, searchable vector.
Strategy
| Strategy | Description | Tier |
|---|---|---|
recursive | Splits on paragraph → sentence → word boundaries, preserving structure | All |
fixed | Splits at a fixed token count regardless of structure | All |
Use recursive (the default) in almost all cases. It produces more natural chunks that respect paragraph boundaries.
chunking: {
strategy: "recursive",
chunkSize: 512, // Target tokens per chunk
chunkOverlap: 50, // Overlap between consecutive chunks
}chunkSize and chunkOverlap
- chunkSize (default: 512): How many tokens per chunk. Smaller chunks (256) are more precise but lose context. Larger chunks (1024) have more context but are less focused.
- chunkOverlap (default: 50): How many tokens overlap between consecutive chunks. This prevents information at chunk boundaries from being lost.
| Content Type | Chunk Size | Overlap | Why |
|---|---|---|---|
| FAQ / short answers | 256 | 25 | Each Q&A pair should be one chunk |
| Technical docs | 512 | 50 | Good balance for most content |
| Long-form articles | 1024 | 100 | Preserve more narrative context |
| Legal documents | 512 | 100 | Higher overlap prevents clause splitting |
Graph Configuration (Pro+)
The knowledge graph automatically extracts entities and relationships from your documents. See Knowledge Graph for a deep dive.
traversalStrategy
How the graph is traversed when searching for related entities:
breadth_first— Explore all neighbors at each depth level before going deeper. Like exploring a building floor by floor.best_first— Always follow the highest-scoring connection. Like a detective following the hottest lead.hybrid(default) — BFS for the first hop, then best-first. Gets the best of both strategies.
graph: {
enabled: true,
traversalStrategy: "hybrid",
maxDepth: 3, // How many hops to traverse (default: 3)
maxNodes: 50, // Max nodes to visit (default: 50)
}Enterprise Configuration (Team)
These features are available on the Team tier only. Each has a dedicated documentation page:
- Access Control (ACLs) — Per-document permissions with principal-based filtering
- Audit Logging — Immutable trail of every operation
- Usage Quotas — Rate limiting and usage tracking per organization
enterprise: {
acl: true, // Enable per-document access control
auditLog: true, // Enable immutable audit logging
quotas: true, // Enable usage quotas and rate limiting
}Configuration Recipes
"Fast and Cheap" — Minimize Costs
Best for: prototypes, low-traffic apps, simple Q&A.
const memory = new Memory(components.memcity, {
tier: "community",
ai: {
gateway: "openrouter",
model: "google/gemini-2.0-flash-001",
},
search: {
maxResults: 5,
weights: { semantic: 0.7, bm25: 0.3 },
// All advanced features disabled by default on Community
},
chunking: {
strategy: "recursive",
chunkSize: 512,
chunkOverlap: 50,
},
});"Maximum Quality" — Best Possible Results
Best for: customer-facing search, support bots, enterprise apps.
const memory = new Memory(components.memcity, {
tier: "pro",
ai: {
gateway: "openrouter",
model: "google/gemini-2.5-pro-preview",
},
search: {
maxResults: 10,
minScore: 0.2,
weights: { semantic: 0.7, bm25: 0.3 },
enableQueryRouting: true,
enableQueryDecomposition: true,
enableHyde: true,
reranking: true,
maxQueryExpansions: 5,
maxChunkExpansions: 3,
},
graph: {
enabled: true,
traversalStrategy: "hybrid",
maxDepth: 3,
maxNodes: 50,
},
});"Enterprise Secure" — Full Compliance
Best for: regulated industries, multi-tenant SaaS, enterprise deployments.
const memory = new Memory(components.memcity, {
tier: "team",
ai: {
gateway: "openrouter",
model: "google/gemini-2.0-flash-001",
},
search: {
maxResults: 10,
enableQueryRouting: true,
reranking: true,
},
enterprise: {
acl: true, // Per-document access control
auditLog: true, // Immutable operation logging
quotas: true, // Usage limits per organization
},
});