Knowledge Graph

What is a Knowledge Graph?

Imagine you have hundreds of documents — employee handbooks, product specs, meeting notes, support tickets. A knowledge graph is like a mind map your application builds automatically by reading all those documents and figuring out what "things" exist and how they're connected.

For example, after ingesting a few documents, Memcity might build a graph like this:

typescript

[John Smith] --works_at--> [Acme Corp]
[Acme Corp] --headquartered_in--> [San Francisco]
[Acme Corp] --manufactures--> [Widget Pro]
[Widget Pro] --uses_technology--> [React Native]
[Jane Doe] --manages--> [John Smith]
[Jane Doe] --works_at--> [Acme Corp]

Each box is an entity (a person, company, product, location, concept, or technology). Each arrow is a relationship (how two entities are connected).

Why Do You Need One Alongside Vector Search?

Vector search is great at finding text that sounds similar to your query. But it misses logical connections between pieces of information that are in different documents.

Example: You have two documents:

"John Smith is the CEO of Acme Corp" (in the team page)
"Acme Corp reported $50M in quarterly revenue" (in the finance report)

If a user asks "What is the revenue of John Smith's company?", pure vector search might not connect these — the text in document 1 doesn't mention revenue, and document 2 doesn't mention John Smith.

But the knowledge graph connects them:

typescript

[John Smith] --is_ceo_of--> [Acme Corp] --has_revenue--> [$50M]

Memcity traverses this graph at search time, finding document 2 through the entity "Acme Corp" that connects it to John Smith.

How Memcity Builds the Graph

Entity Extraction

When you ingest a document, Memcity's LLM reads each chunk and identifies entities — the "nouns" of your data. Entities have a name, a type, and optional metadata:

Type	Examples
Person	"John Smith", "Dr. Sarah Chen", "the CTO"
Organization	"Acme Corp", "Engineering Team", "Board of Directors"
Product	"Widget Pro", "API Gateway", "Mobile App v3"
Technology	"React", "PostgreSQL", "Kubernetes", "OAuth 2.0"
Concept	"microservices architecture", "agile methodology", "refund policy"
Location	"San Francisco", "Building 3", "us-east-1"

Entity extraction happens automatically during ingestion — you don't need to tag anything manually.

Relationship Tracking

After extracting entities, the LLM identifies relationships between them. Relationships are stored as triples: subject → predicate → object.

typescript

Subject          Predicate           Object
──────────       ─────────           ──────
John Smith       is_ceo_of           Acme Corp
Acme Corp        uses_technology     React Native
Widget Pro       depends_on          PostgreSQL
Jane Doe         reports_to          John Smith
Refund Policy    applies_to          Digital Products

Relationships are bidirectional — if "John Smith is_ceo_of Acme Corp" exists, searching for either entity finds the other.

Concrete Example

Let's say you ingest these three documents:

Document 1: Team Directory

typescript

Sarah Chen is the VP of Engineering at TechCo. She leads a team
of 50 engineers across three offices. Sarah previously worked
at Google on the Search team.

Document 2: Product Spec

typescript

Project Atlas is TechCo's next-generation search platform.
It uses Elasticsearch for indexing and React for the frontend.
The project is scheduled for Q3 2025 launch.

Document 3: Meeting Notes

typescript

In today's standup, Sarah approved the use of Kubernetes for
Project Atlas deployment. The team will migrate from bare metal
to GKE clusters by end of month.

After ingestion, the knowledge graph contains:

typescript

[Sarah Chen] --role:vp_engineering--> [TechCo]
[Sarah Chen] --leads--> [Engineering Team]
[Sarah Chen] --previously_at--> [Google]
[Sarah Chen] --approved--> [Kubernetes for Atlas]
[Project Atlas] --owned_by--> [TechCo]
[Project Atlas] --uses--> [Elasticsearch]
[Project Atlas] --uses--> [React]
[Project Atlas] --uses--> [Kubernetes]
[Project Atlas] --launches--> [Q3 2025]
[Engineering Team] --size--> [50 engineers]

Now if someone asks "What technologies does Sarah's project use?", the graph connects: Sarah Chen → leads → Engineering at TechCo → owns → Project Atlas → uses → [Elasticsearch, React, Kubernetes]

Vector search alone would struggle with this because "Sarah" and "Elasticsearch" never appear in the same document.

GraphRAG Traversal Strategies

When you search, Memcity extracts entities from your query, finds them in the graph, and traverses outward to find related information. The traversal strategy determines how it explores:

Breadth-First (`breadth_first`)

Explores all neighbors at each depth level before going deeper. Like exploring a building floor by floor — check every room on floor 1, then every room on floor 2, etc.

typescript

Depth 0: [Sarah Chen]
Depth 1: [TechCo], [Engineering Team], [Google], [Kubernetes]
Depth 2: [Project Atlas], [50 engineers], [Search team], ...
Depth 3: [Elasticsearch], [React], [Q3 2025], ...

Best for: Broad exploration when you want to discover all connections at each level.

Best-First (`best_first`)

Always follows the highest-scoring connection first, regardless of depth. Like a detective who always chases the hottest lead.

If the query is about "technology", the traversal might go:

typescript

[Sarah Chen] --approved--> [Kubernetes] (high relevance to "technology")
[Kubernetes] --used_by--> [Project Atlas] (high relevance)
[Project Atlas] --uses--> [Elasticsearch] (high relevance)

It dives deep along the most relevant path rather than exploring broadly.

Best for: When you know roughly what you're looking for and want to find the most relevant path quickly.

Hybrid (`hybrid`) — Recommended

BFS for the first hop (discover all immediate connections), then best-first for deeper exploration (follow the most promising leads). This gets the best of both strategies.

typescript

Hop 1 (BFS):  [TechCo], [Engineering Team], [Google], [Kubernetes]
Hop 2+ (Best): Follow most relevant → [Project Atlas] → [Elasticsearch]

Best for: Most use cases. This is the default.

Configuration

graph: {
  enabled: true,                     // Enable knowledge graph
  traversalStrategy: "hybrid",       // "breadth_first" | "best_first" | "hybrid"
  maxDepth: 3,                       // How many hops to traverse
  maxNodes: 50,                      // Max nodes to visit per search
}

Option	Default	Description
`enabled`	`true`	Set to `false` to disable graph entirely
`traversalStrategy`	`"hybrid"`	How to explore the graph
`maxDepth`	`3`	Maximum relationship hops. Higher = broader but slower
`maxNodes`	`50`	Safety limit on nodes visited. Prevents runaway traversals

Tuning tips:

Increase maxDepth (to 4-5) if your documents have long chains of relationships (e.g., org charts, dependency trees)
Decrease maxDepth (to 1-2) if you only care about direct connections
Increase maxNodes (to 100) if you have a densely connected graph and want thorough exploration
Decrease maxNodes (to 20) if you want faster searches at the cost of coverage

Code Examples

Ingesting Documents That Build the Graph

The graph builds automatically during ingestion — no special code needed:

// These three documents will create graph connections
await memory.ingestText(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  text: "Sarah Chen is the VP of Engineering at TechCo.",
  source: "team-directory.md",
});
 
await memory.ingestText(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  text: "Project Atlas is TechCo's search platform using Elasticsearch.",
  source: "product-spec.md",
});
 
await memory.ingestText(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  text: "Sarah approved Kubernetes for the Atlas deployment.",
  source: "meeting-notes.md",
});

Searching with Graph-Enhanced Results

Graph results appear alongside regular vector search results:

const results = await memory.getContext(ctx, {
  orgId,
  knowledgeBaseId: kbId,
  query: "What technologies does Sarah's project use?",
});
 
// Results include both:
// 1. Direct vector matches (documents mentioning technologies)
// 2. Graph-traversed matches (documents connected through entities)
for (const result of results.results) {
  console.log(result.text);
  console.log("Score:", result.score);
  console.log("Source:", result.citations?.source);
}

How It Integrates with the Search Pipeline

The knowledge graph is Step 11 in the 16-step pipeline. It runs after RRF fusion and deduplication, adding graph-discovered results to the candidate set. These graph results then go through reranking (Step 12) alongside the vector search results, so they're scored on equal footing.

Limitations and Best Practices

Works best with:

Structured, factual content (team pages, product docs, policies)
Documents that reference shared entities (people, products, companies)
Content where relationships between concepts matter

Works less well with:

Highly abstract or creative content
Very short documents with no clear entities
Content in languages the LLM handles poorly

Best practices:

Use descriptive source names when ingesting — these help with citation generation
Ingest related documents into the same knowledge base so the graph can connect them
The quality of entity extraction depends on your AI model — if you need better extraction, consider upgrading to a more capable model

Availability

The knowledge graph is available on Pro and Team tiers. Community tier uses vector search and BM25 only.

What is a Knowledge Graph?

Why Do You Need One Alongside Vector Search?

How Memcity Builds the Graph

Entity Extraction

Relationship Tracking

Concrete Example

GraphRAG Traversal Strategies

Breadth-First (breadth_first)

Best-First (best_first)

Hybrid (hybrid) — Recommended

Configuration

Code Examples

Ingesting Documents That Build the Graph

Searching with Graph-Enhanced Results

How It Integrates with the Search Pipeline

Limitations and Best Practices

Availability

Breadth-First (`breadth_first`)

Best-First (`best_first`)

Hybrid (`hybrid`) — Recommended