Language Support

The IntelliRag indexer uses tree-sitter grammars to parse source files into abstract syntax trees (ASTs), then extracts structured intelligence from each file. This page covers the supported languages, what gets extracted, and how the indexer handles files that fall outside tree-sitter grammars.

Supported languages

Language	Grammar	Symbols	Calls	Imports	Entry points	Data flow	Patterns
Java	tree-sitter-java	Yes	Yes	Yes	Yes	Yes	Yes
Go	tree-sitter-go	Yes	Yes	Yes	Yes	Yes	Yes
Python	tree-sitter-python	Yes	Yes	Yes	Yes	Yes	Yes
TypeScript	tree-sitter-typescript	Yes	Yes	Yes	Yes	Yes	Yes
C#	tree-sitter-c-sharp	Yes	Yes	Yes	Yes	Yes	Yes
Ruby	tree-sitter-ruby	Yes	Yes	Yes	Yes	Yes	Yes
PHP	tree-sitter-php	Yes	Yes	Yes	Yes	Yes	Yes
Terraform (HCL)	tree-sitter-hcl	Yes	-	Yes	-	Yes	Yes

What gets extracted

Each column in the table above represents a category of intelligence the indexer extracts:

Symbols - Named code elements: functions, methods, classes, interfaces, structs, constants, types, and other declarations. Each symbol includes its fully qualified name (FQN), location, visibility, and documentation.
Calls - Function and method invocations, including method dispatch through interfaces. These form the call graph in the knowledge graph.
Imports - Module, package, and file import relationships. Import edges are used by framework analyzers to detect framework usage and by dependency analysis tools.
Entry points - External interfaces into the codebase: HTTP route handlers, message consumers, CLI commands, event listeners, and scheduled tasks.
Data flow - How data moves through function parameters, return values, and assignments. Used to trace the path of a value through the system.
Patterns - Framework usage patterns, design patterns, and anti-patterns detected in the code. Includes things like singleton implementations, repository patterns, and dependency injection usage.

File type detection

The indexer uses a multi-step hierarchy to determine how to process each file:

Shebang line - A #!/usr/bin/env python3 header overrides the file extension. This handles scripts with no extension or mismatched extensions.
Exact filename match - Recognized filenames like Makefile, Dockerfile, Jenkinsfile, Procfile, and pom.xml are routed to the appropriate analyzer regardless of extension.
Extension match - Standard file extensions (.java, .go, .py, .ts, .cs, .rb, .php, .tf) select the language analyzer.
Path pattern match - Files matching path patterns like .github/workflows/*.yml are classified by their location in the repository.
Content sniffing - YAML files are classified by their top-level keys. SQL files are checked for DDL statements (CREATE TABLE, ALTER TABLE).
go-enry fallback - The go-enry library provides a final classification based on heuristics.
Text analyzer fallback - Files that do not match any tree-sitter grammar are passed to text analyzers for structured extraction.

Text analyzers

Some files carry valuable intelligence but have no tree-sitter grammar. The indexer includes dedicated text analyzers for these formats:

File type	Analyzer	What it extracts
OpenAPI/Swagger (YAML/JSON)	`openapi`	API endpoints, request/response schemas, authentication requirements
SQL DDL files	`sqlddl`	Table definitions, column types, constraints, indexes, foreign keys
Maven `pom.xml`	`maven`	Project dependencies, build plugins, module structure
Java `.properties` files	`properties`	Configuration keys, values, and which files define or consume them

Text analyzers produce the same IndexOutput as language analyzers and follow the same batch upload path.

Excluded files

Before any analysis, the indexer filters out files that provide no code intelligence value. This keeps indexing fast and avoids noise in search results.

Dependency directories - node_modules/, vendor/, target/, .gradle/, Pods/, and similar package manager output.

Build artifacts - dist/, build/, out/, .next/, bin/, obj/, and compiled output directories.

Binary extensions - .class, .jar, .dll, .so, .wasm, .png, .jpg, .woff2, and other non-text formats.

Lock files - package-lock.json, yarn.lock, go.sum, Cargo.lock, Gemfile.lock, and other dependency lock files.

Secrets - .env, .env.*, .pem, .key, .crt, and other files that may contain credentials.

IDE and OS files - .vscode/, .idea/, .DS_Store, .cache/, coverage/, and editor/OS metadata.

All exclusion patterns use O(1) map lookups by directory name, file extension, or exact filename. To see the full list of patterns, refer to the source in indexer/internal/pipeline/exclusions.go.

Next steps

See framework detection for additional intelligence extracted from framework-specific patterns
Configure the indexer for your repository