Part 1: Guide to building high-accuracy Text-to-SQL BI agents
Struggling to achieve high accuracy with Text-to-SQL? This 3-part guide outlines our proven methodology for developing highly accurate Text-to-SQL agents tailored to your proprietary data. Built from our work with Fortune 500 enterprises, we share the strategies that have helped customers achieve 90%+ accuracy in their Text-to-SQL use cases. You'll also learn about common challenges and practical solutions to overcome them.
One of the most critical challenges enterprise data teams face is juggling ad hoc requests from business users while still delivering on their core analytics projects. When this channel is clogged, your business is starved of the insights it needs to make informed, time-sensitive decisions.
Business users don’t want another static dashboard—they want the flexibility to ask questions and get real-time answers. Unfortunately the overlap of people in your organization who need answers right now and those with advanced SQL skills is probably pretty small.

Enterprises seeking data-driven excellence are then faced with a dilemma: they can either overload their limited SQL experts—delaying strategic initiatives—or resign themselves to rigid dashboards that leave critical questions unanswered.
This has led to a growing demand for Text-to-SQL Business Intelligence (BI) agents powered by LLMs, enabling business users to access data with a simple prompt. Key benefits of this approach include:
- ✅ Democratizes data access – Empowers business users to self-serve the insights they need without relying on data analysts.
- ✅ Reduces burden on data teams – Frees data teams from repetitive requests, allowing them to focus on high-impact initiatives.
In just a matter of weeks, you can build your own BI agent and increase AI adoption in your business. Let’s get into it.

Text-to-SQL accuracy challenges
Organizations typically run into four main blockers when developing LLM-based Text-to-SQL agents:
- Curating high-quality training data - As the saying goes, garbage in, garbage out. With our customizable agentic data pipelines, you can start with a small number of examples and expand from there.
- Benchmarking - Public benchmarks don’t measure performance on your specific use case. That’s why we built a pipeline that produces a detailed performance report with metrics and error analysis.
- Creating a representative evaluation (gold) test set - Your gold test set should be high coverage and mirror real world queries. Easier said than done, right?
- Maintaining accuracy as data drifts - Schema changes, new query patterns, and evolving business logic mean your model needs to keep learning—or risk going stale.
We tackle these challenges with a structured eight-step process:
- Schema review and alignment – Deep-dive into your schema to understand table structures and relationships. Work with stakeholders to map real business questions to the right fields and tables.
- Create a glossary file – Codify your company’s unique terminology and business concepts to help the model understand your domain.
- Build your evaluation set (gold test set) – Define your training objective and create a reliable benchmark to measure model accuracy.
- Develop synthetic training data – Start with 20 good examples, then scale your dataset using synthetic data generation pipelines.
- Validate SQL queries – Generate and validate SQL outputs for syntactic correctness, schema alignment, and accuracy.
- Memory-tune your model – The fun part: fine-tune your model using the training data you've developed so far.
- Evaluate your model – Test your model against the gold set to identify failure patterns.
- Keep iterating - Iteration is key to achieving high levels of accuracy and maintaining it as your data drifts
Step 1: Schema review and alignment
Every company’s database schema is unique to its operations, making it essential to resolve ambiguity, noise, and redundancy during schema review. A user-centric approach helps—anticipating the types of questions users will ask and the language they’ll use ensures better alignment between schema structure and real-world queries.
However, natural language is inherently ambiguous, making it impossible to predict every potential question. To mitigate this, we start by defining the business concepts and terminology already established within the organization. For example, a global manufacturing company's definition of “share of market” may differ significantly from that of a domestic consumer goods company. These distinctions often deviate from an LLM’s general understanding of “market share,” necessitating training data that embeds proprietary business knowledge.
Real-world failures often stem from misaligned schema interpretation. For instance, in one real estate dataset, most tables mapped “region ID” to “city and state,” but one table linked it to zip codes. In evaluations with multiple leading LLMs, all models consistently failed region-based queries, defaulting to the wrong table. This illustrates the importance of explicit schema disambiguation, especially where similar terms appear across different join paths. Schema alignment isn’t just structural—it’s semantic, contextual, and critical to SQL accuracy.
Step 2: Create a glossary file
To address ambiguous terms and prevent inaccurate joins, we recommend creating a glossary file that defines specific business terms or concepts and provides additional context about the database.
The glossary should include mappings from user-friendly terms to table and column names, common synonyms, and join paths. For example, terms like “monthly revenue,” “customer type,” or “active users” may not exist explicitly in the schema but can be translated to relevant fields or computed expressions.
Including metadata such as data types, value examples, and relational hints (e.g., primary/foreign keys) enhances the glossary’s utility. Glossaries should be developed collaboratively with domain experts and kept versioned to stay in sync with schema updates. This step significantly improves both question understanding and SQL generation quality, especially in enterprise or industry-specific datasets.
In Part 2, we’ll walk through how to build your evaluation dataset (gold test set), generate additional high-quality synthetic training data, and validate your SQL queries—all using agentic data pipelines. This is where you’ll invest the most time—but it’s also where you’ll see big leaps in accuracy. Trust us, it’s worth it.
We'd love to chat about your text-to-SQL use case. To get started with a customized demo, contact us here.