Usage Metering

IntelliRag tracks usage in near-real-time across all five dimensions. The metering pipeline is designed for accuracy and consistency - every metered operation is recorded exactly once, and usage data is available in the dashboard within minutes.

How metering works

The metering pipeline follows a three-stage architecture:

Event emission - Every metered operation (search query, API call, indexer run, enrichment job, vector write) emits an event to a Redis Stream. Events are tagged with a unique ID, tenant ID, dimension, and quantity.
Event processing - The metering worker consumes events from the stream using consumer groups. Each event is deduplicated by event ID before being written to the metering_events table in Postgres. This guarantees exactly-once recording even if the worker restarts or replays events.
Stripe reporting - Usage is aggregated and reported to Stripe hourly. Stripe handles invoice generation based on the active plan and any overages.

The dashboard reads directly from the metering_events table, so usage is visible within approximately one minute of the operation occurring.

Usage dashboard

View your current period usage in the dashboard under Settings > Usage. The usage page provides:

Dimension breakdown - Current consumption for each of the five dimensions (LLM credits, search queries, index minutes, vector storage, API calls) with progress bars showing percentage of plan limit consumed.
Historical usage - Graphs showing daily, weekly, and monthly usage trends. Use these to identify patterns and forecast whether you will approach limits before the period resets.
Per-repository breakdown - Drill down to see which repositories consume the most resources across each dimension. This helps identify candidates for optimization (disabling enrichment, switching to incremental indexing, or removing unused repos).

Alerts

IntelliRag sends proactive notifications as you approach plan limits:

Email alerts at 80% (warning threshold) and 100% (soft limit) of each dimension. Alerts are sent once per threshold per billing period - you will not receive repeated emails for the same threshold.
Dashboard banners appear when any dimension enters the warning or soft limit state. Banners persist until the dimension returns to a healthy state or the billing period resets.
API response headers include X-RateLimit-Remaining (requests left in the current window) and X-Quota-State (current quota state for the relevant dimension) on every response. Use these headers to build client-side awareness into your tooling.

Managing usage

If you are approaching plan limits, consider these strategies:

Reduce LLM credit usage - Disable enrichment for non-critical repositories in the dashboard under the repository settings. Enrichment tasks (module summaries, debt triage, dead code review) are the primary consumers of LLM credits.
Reduce index minutes - Use incremental indexing (indexer index) instead of full reindex (indexer index --full). Incremental runs process only files changed since the last index and typically complete in a fraction of the time.
Reduce vector storage - Remove repositories you no longer need. Deleting a repository immediately frees all associated vector storage across all 7 Qdrant collections.
Reduce API calls - Review integrations that poll the API on a schedule. Increase polling intervals or switch to event-driven patterns where possible.
Upgrade your plan - If optimization is not sufficient, upgrade from the dashboard under Settings > Billing. Plan changes take effect immediately and the new limits apply to the current billing period.