AI Engineering 6 min readMay 2026

Building a Q&A Agent on Top of a Business Database

Most businesses have a database full of answers that nobody can access without filing a ticket or bothering the one person who knows SQL. That is a broken system. The data is there. The questions are simple. The bottleneck is the interface.

I built a Q&A agent that lets non-technical team members ask questions in plain English and get answers directly from the database. No SQL knowledge required. No waiting for someone else to write a query.

Why This Matters

Picture a sales manager who wants to know which product category grew the most last quarter. Or an operations lead who needs to see average delivery times by region. These are straightforward questions. But in most companies, they become emails, Slack messages, and eventually a Jira ticket that sits in a backlog for two weeks.

By the time the answer arrives, the decision has already been made without it. That is the real cost.

The Architecture

I kept it simple and robust. Here is the stack:

Database: PostgreSQL with a clean schema and proper foreign keys
Schema Layer: I wrote a detailed schema description that maps table names, column names, and relationships into plain language the LLM can understand
LLM: GPT-4 for query generation, with function calling to structure the output
Validation: Every generated SQL query goes through a validation layer that checks for destructive operations (no DROP, DELETE, UPDATE) and enforces row limits
Execution: Read-only database user with strict permissions. The agent can only SELECT
Response: Results come back as a formatted table plus a natural language summary

The whole pipeline runs in under 3 seconds for most queries.

The Schema Layer

This is the part most people get wrong. You cannot just throw a database schema at an LLM and expect good results. The model needs context. What does each table represent in business terms? Which columns are commonly queried together? What are the typical units and formats?

I spent about 60% of the total project time on the schema description. It is a single YAML file that describes every table, every column, and every relationship in plain English. It also includes example questions and the queries they map to.

This file is the real product. The LLM is just the engine.

Safety Guardrails

Letting an LLM write SQL against your database sounds risky. It can be, if you do not build proper guardrails. Here is what I put in place:

Read-only database user. The agent literally cannot modify data
Query validation that rejects anything outside a whitelist of SELECT statements
Row limits enforced at the connection level (max 1000 rows returned)
Query timeout of 10 seconds to prevent runaway queries
Full logging of every question asked and every query generated

The worst case scenario is the agent returns wrong data, which is why every response includes the actual SQL query so a technical person can verify it if needed.

Results After 2 Months

The team went from asking 2 to 3 data questions per week to asking 15 to 20. The data team went from spending 40% of their time on ad hoc queries to almost zero. Decision making got faster because people could get answers in real time.

The biggest win was cultural. Teams stopped treating data as something they had to request and started treating it as something they could explore.

What I Would Do Differently

I would add a feedback loop earlier. Letting users rate whether the answer was correct or not would help improve the schema descriptions over time. I would also add support for follow up questions so users could refine their queries conversationally instead of starting from scratch each time.

If you are thinking about building something like this, start with the schema layer. Invest time in describing your data in plain language. The LLM part is straightforward. The real work is making your database understandable to a language model.