SchemaSense AI
An enterprise database intelligence platform that connects dynamically to PostgreSQL, runs data quality analytics, and auto-generates interactive schema documentation via DeepSeek-V3.
Performance
< 50ms
Cache Hit Latency
Cost Efficiency
95%
Reduction in LLM Tokens
Security Layer
Active
Prompt Injection Gate
Availability
99.9%
Async Pool Health
The Problem
Database schemas evolve rapidly, but documentation almost never keeps up. Engineering teams are forced to decipher undocumented tables, decipher implicit foreign key relationships from legacy source code, and lose countless hours due to tribal knowledge.
SchemaSense AI resolves this by establishing direct metadata introspection. The backend reads the PostgreSQL catalog table definitions, analyzes statistics, runs structural data quality checks, and leverages large language models (DeepSeek-V3 via OpenRouter) to compile interactive, human-readable documentation and automated Entity-Relationship graphs.
Key Features
- ✓ DB Connection Management (PostgreSQL)
- ✓ Automated Schema Introspection
- ✓ Multi-column Data Quality Scoring
- ✓ SQLite-cached AI Table Descriptions
- ✓ Markdown, PDF, and JSON Document Exports
System Architecture
An interactive, vector-sharp trace of how database metadata is ingested, cached, validated, and streamed back to the client interface. The flowchart updates state dynamically to follow the request lifecycle.
Ingestion & Processing Pipeline
Visual representation of DDL compilation, cache checking, prompt sanitization, AI reasoning, and real-time streaming.
Systems Engineering Control Panel
Technical telemetry and interface specifications detailing caching layers, validation checks, and repository files.
# Launching schema ingestion server on port 8000...
[INFO] Initializing database connection pool (asyncpg)
[DB] Connected to PostgreSQL at pg.production.db.local:5432
[DB] Active Pool Size: 10 connections
[INFO] WAL SQLite Cache initialized at /backend/cache_db.sqlite
[REQUEST] POST /api/v1/tables/describe (127.0.0.1)
[CACHE] Table 'users' DDL structure SHA-256 computed: 7e8b9a1c...
[CACHE-HIT] Hash matched local store. Skipping LLM lookup.
[SUCCESS] Response compiled. Cache hit duration: 12.8ms. Saved 8,500 prompt tokens.
[REQUEST] POST /api/v1/tables/describe (127.0.0.1)
[CACHE] Table 'transactions' DDL structure SHA-256 computed: 8b3c9f2a...
[CACHE-MISS] Hash not found. Forwarding query to DeepSeek-V3 via OpenRouter.
[AI-STREAM] SSE streaming connection established. Emitting blocks...
{
"database_hash": "d3f45a7b1c...",
"table_name": "billing_transactions",
"columns": [
{ "name": "id", "type": "UUID", "constraints": ["PRIMARY KEY"] },
{ "name": "user_id", "type": "UUID", "constraints": ["FOREIGN KEY references users(id)"] },
{ "name": "amount", "type": "NUMERIC(12,2)", "constraints": ["NOT NULL"] },
{ "name": "status", "type": "VARCHAR(30)", "constraints": ["DEFAULT 'pending'"] }
],
"cache_check": true,
"bypass_sanitizer": false
}SHA-256 Schema Caching
Analyzing 100+ database tables creates huge API bills. Computing a SHA-256 hash of table columns and checking SQLite saves 95% in token costs, rendering updates in under 50ms.
Injection Defense
Dynamic schema querying poses security issues. The app runs a sanitization gateway that strips SQL commands and rejects non-JSON queries before they reach the LLM.
├── app/ # Next.js Pages
├── components/ # ER Visualizer
├── hooks/ # API Call hooks
└── backend/ # FastAPI Application
├── routes/ # Endpoints
└── utils/ # Core Libs
├── database.py # Pools
├── cache_db.py # SQLite manager
└── deepseek.py # OpenRouter