Language Support
The IntelliRag indexer uses tree-sitter grammars to parse source files into abstract syntax trees (ASTs), then extracts structured intelligence from each file. This page covers the supported languages, what gets extracted, and how the indexer handles files that fall outside tree-sitter grammars.
Supported languages
Section titled “Supported languages”| Language | Grammar | Symbols | Calls | Imports | Entry points | Data flow | Patterns |
|---|---|---|---|---|---|---|---|
| Java | tree-sitter-java | Yes | Yes | Yes | Yes | Yes | Yes |
| Go | tree-sitter-go | Yes | Yes | Yes | Yes | Yes | Yes |
| Python | tree-sitter-python | Yes | Yes | Yes | Yes | Yes | Yes |
| TypeScript | tree-sitter-typescript | Yes | Yes | Yes | Yes | Yes | Yes |
| C# | tree-sitter-c-sharp | Yes | Yes | Yes | Yes | Yes | Yes |
| Ruby | tree-sitter-ruby | Yes | Yes | Yes | Yes | Yes | Yes |
| PHP | tree-sitter-php | Yes | Yes | Yes | Yes | Yes | Yes |
| Terraform (HCL) | tree-sitter-hcl | Yes | - | Yes | - | Yes | Yes |
What gets extracted
Section titled “What gets extracted”Each column in the table above represents a category of intelligence the indexer extracts:
-
Symbols - Named code elements: functions, methods, classes, interfaces, structs, constants, types, and other declarations. Each symbol includes its fully qualified name (FQN), location, visibility, and documentation.
-
Calls - Function and method invocations, including method dispatch through interfaces. These form the call graph in the knowledge graph.
-
Imports - Module, package, and file import relationships. Import edges are used by framework analyzers to detect framework usage and by dependency analysis tools.
-
Entry points - External interfaces into the codebase: HTTP route handlers, message consumers, CLI commands, event listeners, and scheduled tasks.
-
Data flow - How data moves through function parameters, return values, and assignments. Used to trace the path of a value through the system.
-
Patterns - Framework usage patterns, design patterns, and anti-patterns detected in the code. Includes things like singleton implementations, repository patterns, and dependency injection usage.
File type detection
Section titled “File type detection”The indexer uses a multi-step hierarchy to determine how to process each file:
-
Shebang line - A
#!/usr/bin/env python3header overrides the file extension. This handles scripts with no extension or mismatched extensions. -
Exact filename match - Recognized filenames like
Makefile,Dockerfile,Jenkinsfile,Procfile, andpom.xmlare routed to the appropriate analyzer regardless of extension. -
Extension match - Standard file extensions (
.java,.go,.py,.ts,.cs,.rb,.php,.tf) select the language analyzer. -
Path pattern match - Files matching path patterns like
.github/workflows/*.ymlare classified by their location in the repository. -
Content sniffing - YAML files are classified by their top-level keys. SQL files are checked for DDL statements (
CREATE TABLE,ALTER TABLE). -
go-enry fallback - The go-enry library provides a final classification based on heuristics.
-
Text analyzer fallback - Files that do not match any tree-sitter grammar are passed to text analyzers for structured extraction.
Text analyzers
Section titled “Text analyzers”Some files carry valuable intelligence but have no tree-sitter grammar. The indexer includes dedicated text analyzers for these formats:
| File type | Analyzer | What it extracts |
|---|---|---|
| OpenAPI/Swagger (YAML/JSON) | openapi |
API endpoints, request/response schemas, authentication requirements |
| SQL DDL files | sqlddl |
Table definitions, column types, constraints, indexes, foreign keys |
Maven pom.xml |
maven |
Project dependencies, build plugins, module structure |
Java .properties files |
properties |
Configuration keys, values, and which files define or consume them |
Text analyzers produce the same IndexOutput as language analyzers and follow the same batch upload path.
Excluded files
Section titled “Excluded files”Before any analysis, the indexer filters out files that provide no code intelligence value. This keeps indexing fast and avoids noise in search results.
Dependency directories - node_modules/, vendor/, target/, .gradle/, Pods/, and similar package manager output.
Build artifacts - dist/, build/, out/, .next/, bin/, obj/, and compiled output directories.
Binary extensions - .class, .jar, .dll, .so, .wasm, .png, .jpg, .woff2, and other non-text formats.
Lock files - package-lock.json, yarn.lock, go.sum, Cargo.lock, Gemfile.lock, and other dependency lock files.
Secrets - .env, .env.*, .pem, .key, .crt, and other files that may contain credentials.
IDE and OS files - .vscode/, .idea/, .DS_Store, .cache/, coverage/, and editor/OS metadata.
All exclusion patterns use O(1) map lookups by directory name, file extension, or exact filename. To see the full list of patterns, refer to the source in indexer/internal/pipeline/exclusions.go.
Next steps
Section titled “Next steps”- See framework detection for additional intelligence extracted from framework-specific patterns
- Configure the indexer for your repository