Core Concepts
IntelliRag is a code intelligence platform that uses Retrieval-Augmented Generation (RAG) to make large codebases intelligible to developers and AI coding assistants. This page covers the foundational concepts you will encounter when working with the platform.
Data hierarchy
Section titled “Data hierarchy”All data in IntelliRag follows a strict hierarchy:
Tenant > Workspace > Repository > BranchEach level inherits the isolation boundary of its parent. A tenant cannot access another tenant’s data at any level of the hierarchy.
Tenants
Section titled “Tenants”A tenant is the top-level isolation boundary. Each organization is a tenant. All data - symbols, graphs, embeddings, configuration - is isolated by tenant_id across every datastore. There are no cross-tenant operations.
Workspaces
Section titled “Workspaces”A workspace is a logical grouping of repositories. Use workspaces to organize by team, product, or microservice group. A tenant can have multiple workspaces, and each workspace can contain multiple repositories.
Repositories
Section titled “Repositories”A repository represents a git repository connected to IntelliRag. Repositories are identified by their remote URL and must be unique per tenant. You must create a repository in the dashboard before the indexer can process it - the indexer performs lookup only, not auto-creation.
Symbols
Section titled “Symbols”Symbols are the named code elements extracted by the indexer: functions, classes, methods, interfaces, constants, types, and other declarations. Each symbol carries metadata including its location, visibility, documentation, and relationships to other symbols.
Fully Qualified Name (FQN)
Section titled “Fully Qualified Name (FQN)”Every symbol has a fully qualified name - a unique identifier within its repository. FQNs follow language-specific conventions:
| Language | Example |
|---|---|
| Java | com.example.UserService.findById |
| Go | pkg/handlers.CreateUser |
| TypeScript | src/auth/middleware.ts#validateToken |
FQNs are used for precise symbol lookup and as stable references in the knowledge graph.
Indexing pipeline
Section titled “Indexing pipeline”The indexing pipeline is the process of analyzing a codebase and extracting structured intelligence. It runs in the following order:
- Git clone or pull - Fetch the latest code from the repository.
- File filtering - Exclude dependency directories, build artifacts, binary files, lock files, and other non-useful content.
- Tree-sitter AST parsing - Parse source files into abstract syntax trees for structural analysis.
- Language analysis - Extract symbols, references, call edges, and import edges using language-specific analyzers.
- Framework detection - Identify framework patterns (Spring, Express, Rails, etc.) using import edges from the language pass.
- Text analysis - Process non-code files (OpenAPI specs, SQL DDL, Maven POM files, property files) that have no tree-sitter grammar.
- Embedding generation - Generate vector embeddings for semantic search via the embedding service.
- Batch upload - Write all extracted data to the API server in batches.
Analyzers
Section titled “Analyzers”Analyzers are pluggable components that extract intelligence from source files. There are three types:
Language analyzers use tree-sitter grammars to parse ASTs and extract symbols, references, and call edges. Each supported language (Java, Go, TypeScript, Python, C#, Ruby, PHP) has its own analyzer.
Framework analyzers build on language analysis to detect framework-specific patterns - route definitions, dependency injection, event handlers, and more. They run after language analyzers and use the import edges from the language pass for framework detection.
Text analyzers handle files that do not have a tree-sitter grammar, such as OpenAPI specifications, SQL DDL files, Maven POM files, and property files. They extract structured data without AST parsing.
Knowledge graph
Section titled “Knowledge graph”IntelliRag builds a knowledge graph in Neo4j that captures the structural relationships in your codebase. The graph contains:
- Call edges - Which functions call which other functions.
- Import edges - Which modules import which other modules.
- Data flow - How data moves through the system.
- Entry points - HTTP endpoints, message handlers, and other external interfaces.
The knowledge graph powers navigation queries such as “who calls this function,” “what depends on this module,” and “what is affected if I change this.”
Vector collections
Section titled “Vector collections”IntelliRag maintains seven Qdrant vector collections for semantic search:
| Collection | Purpose |
|---|---|
code_chunks |
Source code segments for natural language code search |
module_summaries |
LLM-generated module descriptions |
pattern_matches |
Detected design patterns and architectural patterns |
git_archaeology_chunks |
Git history analysis (churn, ownership, change patterns) |
debt_vectors |
Technical debt items and code quality signals |
api_contract_chunks |
API endpoint definitions and contracts |
event_catalog_vectors |
Event producers, consumers, and message schemas |
All collections use the same embedding model (Voyage AI voyage-code-3, 1024 dimensions) and enforce tenant_id filtering on every query.
Enrichment
Section titled “Enrichment”After the indexing pipeline completes, an asynchronous enrichment process runs LLM-powered analysis on the indexed data. Enrichment is queued via a job system and processed by a dedicated worker - it never runs in the API server request path.
Enrichment job types include:
- Module summary - Generate natural language descriptions of modules and packages.
- Debt triage - Classify and prioritize detected technical debt.
- Dead code review - Identify potentially unused code paths.
- Schema annotation - Add context to database schema definitions.
- Contract inference - Infer API contracts from code patterns.
- Event description - Describe event producers and consumers.
Enrichment results are stored alongside indexed data and made available through the same MCP tools and API endpoints.