Chapter 32 - English Query Best Practices
English Query is a set of tools that database administrators, application developers, and Web professionals can use to develop a natural-language interface to a database. Using English Query applications, users can perform ad hoc database queries using English questions or statements.
English Query provides a robust environment for developing an English Query model. However, because databases tend to be unique and users ask unique questions, creating a model that answers users' questions can be a complex process.
This chapter provides best practices for starting, expanding, and deploying an English Query application using the tools provided with Microsoft® SQL Server™ 2000.
An Overview of English Query
English Query works by receiving a question or statement in English and translating the English into Transact-SQL. For example, a user might submit, "Count the customers." English Query translates the statement into Transact-SQL, such as:
SELECT count(*) as "count" FROM dbo.Customers
Note The exact Transact-SQL generated depends on the English Query model.
In an application that includes English Query, the English Query engine performs the translation. The engine uses a domain file (.eqd), which contains a model of the database. The model contains information specific to the database being queried by the application, including the database schema, a semantic abstraction layer that is built over the schema, and the mapping between them.
In the model, tables and fields are represented by entities and the joins are represented by relationships. The entities and relationships defined in a model allow English Query to translate English into Transact-SQL.
The English Query Project Wizard can automatically define some entities and relationships based on database structure. Developers define other entities and relationships by using the English Query development environment. A complete and well-designed model will allow an English Query application to do a better job of translating questions into queries.
A Simple Example
The following illustration shows a simple database, which is based on the Northwind sample database. The database contains three user-defined tables: customers, orders, and products. Both the customers and products tables are joined to the orders table.
Using an English Query project wizard called the SQL Project Wizard is the easiest way to start an SQL-based project. For this simple database, the SQL Project Wizard initializes the English Query model with the set of entities and relationships shown in the following table.
Note When using the SQL Project Wizard with a complete database, the wizard creates many other entities and relationships.
Notice that there are relationships between each table entity (such as customer) and the columns within that table (such as customer_id). A customer_id is a trait of a customer, so the relationship is phrased as customers have customer_ids. Also note there are relationships between table entities when a join exists between them: orders have customers, orders have products, and customers have products. One of the relationships, customers have products, must be modified to be useful. A more useful phrasing for this relationship is customers buy products.
Here are two sample questions that can be answered by this simple model:
"List the customers."
"Which orders have the product Chai?"
However, the model cannot answer the following variations of these questions:
"List the customers and their IDs."
The word "ID" is not defined in the model.
"Which orders contain Chai?"
The verb "contain" is not associated with any entity or relationship.
Refining entity and relationship definitions, as well as creating additional relationships and phrasings, will allow this model to answer these questions and many more.
Before You Begin
To become familiar with English Query, work through the tutorials that are provided when English Query is installed. The tutorials introduce the English Query development environment and show how to start a project, how to create entity relationships, and how to deploy the solution to the Web.
English Query provides four tutorials:
Creating a Basic SQL Model
Creating a Complex Model
Creating an OLAP Model
Deploying the Application
To access the tutorials
On the Start menu, point to Programs/Microsoft SQL Server/English Query, and then click English Query Tutorials.
Starting a Basic Model
Once you are familiar with the English Query development environment, starting a basic model is easy if you prepare your database and follow a few guidelines. The following best practices will help you successfully start an English Query model.
Prepare Your Database
When you start an English Query project, your most important consideration is to ensure that your database is well designed and ready to be imported into the English Query environment. If your database is designed well, the English Query wizards can do much of the preliminary development work for you. If your database is poorly designed, getting the expected results can be difficult.
Normalize the database.
Applying normalization rules to a database ensures each table represents a single entity, each column defines one unique attribute, and each row represents one instance of the entity. English Query can best translate English into Transact-SQL when a database is normalized, and results generated from a normalized database will be more accurate.
If you do not own the database or cannot normalize the database for other reasons, consider using views to solve some normalization problems. For example, if an orders table exists for each financial quarter, create a view from a UNION of all orders tables, and then define relationships using the entity for the view.
For more information about how the normalization rules apply to English Query and how to use views to solve normalization problems, see "SQL Database Normalization Rules" in SQL Server Books Online.
Define primary and foreign keys.
English Query recognizes joins based on primary and foreign keys, and establishes relationships for these joins with the wizards. You can define joins in the English Query environment, but defining the keys and joins in the database before starting a project helps keep the database and model synchronized.
Use meaningful names.
The Project Wizard creates entity names based on the table and column names. Often these names are abbreviations. To make developing and using the model easier, it is good practice either to use meaningful names in the database or to rename all entities with meaningful names. For example, "EEs work in ETs" is not as meaningful as "Employees work in EmployeeTerritories". You can also provide synonyms for the entity names at the same time.
Start with the SQL Project Wizard
After you have prepared the database, use the SQL Project Wizard to start a SQL project. The wizard makes project setup and initial development easy. You simply follow the wizard's screens, selecting which server, database, and tables to use, and the wizard builds the model based on the database structure.
Edit Entity Properties
Review all entities that are created by the SQL Project Wizard. The wizard will guess each entity's properties where it can, and leave others blank by default. For this reason, you should verify all the entities' properties in your English Query model and, at the same time, further define the entities. For descriptions of the fields on each tab and dialog box, click Help.
Formulate and Test Typical Questions
After you have edited entity properties, the basic model created with the wizard should answer many questions, especially those regarding a single table or simple join. However, this simple model cannot answer all questions.
To discover the questions that the model will answer, generate a list of typical questions. If multiple groups use the database (such as executives, sales managers, production staff, and human resources staff), devise questions that each of those groups might ask.
Once you have a list of questions, you can test individual questions using the Model Test tool, which automatically appears after each project build. Click View Results to see query results. Click the Analysis tab to view the entities and relationships used to answer the question.
Click Save to save the current question and its results to a regression file (.eqr). This regression file can be used to retest the question, and can be used as the source of sample questions in your final English Query application.
Use the Suggestion Wizard
To answer questions that are not answered during testing, you will have to continue to improve your model by adding or modifying relationships. An easy way to build relationships that answer specific questions is to use the Suggestion Wizard provided in the Model Test tool.
When a question is not answered, click Suggestion Wizard on the Model Test toolbar. The Suggestion Wizard will direct you to provide information for the question.
After English Query modifies the model, test the question again to ensure the results are what you expect.
Add Help Text
Each entity and relationship in an English Query model can contain help text. Users can access the text to learn about the model, or to obtain any kind of helper information you want to provide.
To add help text, open the Entity dialog box by double-clicking an entity.
In the illustration, the help text "Five characters that uniquely identify each customer" describes the customer_id entity. If a user submits the question, "What is a customer id?" English Query will return the help text.
Expanding a Model
An English Query application created using the SQL Project Wizard and the Suggestion Wizard can answer many user questions. However, to create a comprehensive solution that answers a broad range of user questions, even those that you do not plan for, you must go beyond what the wizards offer. You must develop a comprehensive set of relationships that fully describes the database model and how the database objects are related.
As with basic models, creating a comprehensive set of relationships requires knowledge of the database and of questions people will ask. Forming entity relationships that answer specific questions or a wide range of questions requires at least two more skills:
Knowledge of English language semantics
If it has been a long time since you diagrammed a sentence, brush up on the basics, including subjects, verbs, adjectives, prepositional phrases, and direct objects.
Knowledge of the English Query development environment
For information about the dialog box options, click Help on each dialog box. SQL Server Books Online contains information that can help you decide when to use each option, and often provides examples. The English Query tutorials also are a good resource for learning how to use the English Query development environment.
Create Good Entity Relationships
An English Query model is only as good as its entities and relationships. To create good relationships for a model, think about how users will ask questions and what results the users expect. Good entity relationships:
Contain words that people will use when asking questions.
Try to define the relationship with words users will enter. If the words in a question are not recognized in the model, the user may receive a response like "Sorry, I didn't understand that."
Defining synonyms for words increase the number of ways a question can be asked. Both the Entities and Relationships dialog boxes support synonyms. You can click a word box to add your own synonyms, or click the ... button to select synonyms from a list.
Ambiguous relationships decrease the chance that users will get consistent results. Ambiguity is often introduced when multiple relationships serve similar purposes. For example, if a customer table contains city, state, and region columns, the following relationships introduce ambiguity:
Customers live in cities
Customers live in states
Customers live in regions
When a user asks, "Where does customer 10101 live?" what will the answer be? If the desired answer is the city, try the following relationships:
Customers live in cities
Cities are in states
States are in regions
Help users obtain the desired result set.
Most databases contain more information than a user wants to see at one time. If a user enters "List the orders" into an English Query application, does the user really want to see all orders or just the orders for today, this week, or this year? Once you define what information should be provided to the user by default, you can add these defaults to the English Query model.
If the desired result set for orders contains only today's orders, you can easily specify a default date for questions about orders. For a relationship such as orders have order_dates, open the Relationship dialog box, select order_date in the When box, and then select Today in the Default Date box.
If users really want to see all the orders, they can ask the question more specifically. The statement "List all the orders" will show all orders, regardless of the default setting.
First Create Broad Relationships, and Then Work on Specific Questions
When you create a model, you probably know how several database objects relate to each other. For example, you might know that customers buy products, suppliers sell products, and some products are discontinued. Creating these types of broad relationships, as well as fully defining entities, will allow the model to translate many questions that can be answered by the database.
After the broad relationships have been defined, test the model to see what questions it does and does not answer. For the questions the model does not answer, work on modifying entities and relationships to answer these questions, creating new entities and relationships if necessary.
When modifying the model to answer specific questions, try simple adjustments first. Create synonyms, define default conditions, or add lookup table data to the model before making several complex changes.
The process of building a model can change the results of question. To find these changes, make use of the regression file in the Model Test tool. When a question works, save it. Both the question and the results are saved to the regression file.
After you have saved questions to the regression file, periodically retest the questions. In the Project Explorer, right-click the .eqr file and click Run Regression. When the regression test is finished, right-click the .eqr file, and click View Differences. Any differences in answers will appear in a Regression Differences window.
For "Free-Form" Text Searches, Enable Full-Text Search
Many databases contain description or comment text fields. These fields may contain just about anything, and querying these fields using Transact-SQL may not generate the desired results. Full-text search provides a solution to this problem. Advantages of full-text search include:
Searches may work much faster than SQL table scans because they can take advantage of full-text indexes.
Searches can retrieve more information because the full-text searches support alternate tenses, singular and plural forms, phrases, and words near each other (proximity searches).
Results are ordered by relevance.
A full-text index is not the same as an SQL index. Each table can have only one full-text index. The index can be administered through the database but is stored in a file system. For more information, see "Full-Text Indexes" in SQL Server Books Online.
To enable full-text search for an English Query application
In SQL Server Enterprise Manager or using Transact-SQL, enable full-text indexing for the database.
In SQL Server Enterprise Manager or using Transact-SQL, enable full-text indexing for the table. During this step, choose the columns to include in the index.
In the English Query development environment, enable full-text searching for each full-text column using the SQL tab.
For more information about using full-text search with English Query, see "Enabling Full-Text Search for an English Query Model" in SQL Server Books Online.
For Data Analysis Questions, Create an OLAP Model
While creating an English Query model, you may find that not all of the questions are answered using the current database. Some questions require data analysis. For example, a question such as "What region sells the most Chai?" may be a data analysis question, not the type of question easily answered by a basic model. The user may really want aggregated data, possibly for multiple regions.
SQL Server Analysis Services (previously called OLAP) is a solution for organizing and aggregating data for quick analysis. Data is organized into multidimensional cubes. For example, a sales cube may provide aggregated data for sales by customer, city, and region, separating data into total sales and sales by product or category. For more information about Analysis Services, see "Analysis Services Overview" in SQL Server Books Online.
An Analysis Services database contains cubes, dimension tables, and fact tables, and is separate from the database that contains the original data. Therefore, in an English Query OLAP model, relationships are between cubes and the cube's dimension and fact tables.
To create an OLAP model
Create and test the Analysis Services database. For more information, see "Administering Analysis Services" in SQL Server Books Online.
Use the OLAP Wizard to create an English Query model for the database. Instead of selecting tables for the model, you select OLAP cubes.
Formulate typical questions and test the OLAP model.
Expand the model by modifying entities and relationships and then creating new relationships.
A mixed English Query model supports access to both the original database and the Analysis Services database. Using mixed models, you can ask questions that can be answered by either model, which can speed up the retrieval for questions that do not require aggregation of data.
For more information about using Analysis Services with English Query, see "Analysis Services in English Query" in SQL Server Books Online.
Deploying an English Query Solution
You can deploy an English Query application in several ways, including within a Microsoft Visual Basic® or Microsoft Visual C++® application and on a Web page running on Microsoft Internet Information Service (IIS). On the Web, the interface to English Query is provided by a set of Microsoft Active Server Pages (ASP).
Note If you have built an English Query solution for earlier versions of SQL Server and you want to use the solution with a SQL Server 2000 database, the English Query domain files must be rebuilt.
Use the Sample Applications
When you install English Query, you have the option of installing several sample applications and client interface examples. These examples are very useful when deploying an application. By borrowing source code from these examples, you can save hours or days of development work. Sample applications for Microsoft C++, Microsoft Visual Basic, and Active Server Pages (ASP) are installed to the \Program Files\Microsoft English Query\Samples directory.
If you are deploying a Web solution, the easiest way to deploy your application is to use the Deploy to Web command on the Project menu. When you deploy your project to the Web, several files are copied to the Web server. By developing an interface using these files, you can customize the English Query application and update the English Query model from the English Query development environment.
Note If you are deploying an English Query solution to the Web, you must install the full version of Microsoft Visual InterDev® before you can deploy the project. Also, the server you deploy the Web files to must be running Microsoft IIS 4.0 or later, and must have Microsoft FrontPage® extensions installed.
Provide Sample Questions for Users
When you deploy your English Query application, give users sample questions. If you simply have a box labeled "Ask a Question", users might not know how to get started. Sample questions will help users learn what they can do with your English Query application.
When you test a model using the Model Test tool, you can save sample questions to the regression file (.eqr). When you deploy the application to the Web, the questions in this file appear on the application's Web page.
Provide Question Builder
The sample applications contain Question Builder, a Microsoft ActiveX® control that exposes the English Query model. When Question Builder is integrated into an application, the interface allows users to explore what information is available to them, and to build questions based on the model.
To use Question Builder, users select a combination of entities and relationships. English Query then displays a list of questions that can be asked about that combination. Users can submit a questions as-is, or can modify the question before submitting it to the English Query engine.
In both Web and Microsoft Windows® applications, the user is presented with a multi-pane interface, as shown in the following illustration.
At the top of the screen, the first pane provides the schema for the English Query model. The second pane is a workspace in which users can drag entities and form questions. The third pane provides a list of questions users can ask about objects selected in the second pane. The lower panes provide the question and results interface.
The sample applications contain Question Builder, so you can see how Question Builder works and use the source code to integrate Question Builder into your applications. For more information, see "Adding Question Builder to an English Query Application" in SQL Server Books Online.
Maintaining and Improving a Model
When a database schema is modified, English Query applications must be updated as well. The English Query application depends greatly on the database schema. If the English Query model and the database schema are not the same, questions may not be answered correctly or may not be answered at all.
After you deploy your English Query application, you may discover it does not answer enough questions. By tracking which questions are not answered, you can improve your model to meet user demands.
Keep the Model Up-To-Date
When you initially create an English Query model, you might not include all tables. Or, you might want to add views or new tables to the model at a later time. In either case, you can add tables and views to your model with the Import Table command on the Model menu.
When you modify a table in the database, such as adding columns, the changes are not automatically reflected in the English Query model. The easiest way to update the table is from the SQL tab. Right-click the modified table, and then click Refresh Table.
After you add a table or view, or after you refresh a table, you can use the wizard to create new entities and relationships. Right-click the table or view and click Create Relationships. The wizard will display entities and relationships for the table or view and its columns. Click OK to add the entities and relationships to the model.
Use Logs to Improve Results
English Query may provide good answers to all the questions you ask. But how do you know if user questions are being answered? To find out, you will need to build some functionality into your application for reporting and finding problems:
For standard applications, provide a way for a user to send feedback, such as an e-mail address or Web form in which users can enter problem reports.
For Web applications, a user-driven solution is built into each application deployed using the English Query Deploy to Web tools. When you deploy the application, a log file named Problems.txt is copied to the Web server. When English Query cannot generate a Transact-SQL query, a "Report as Problem Question" link appears on the Web page. When a user clicks this link, the question is added to Problems.txt.
Another solution that is built into Web applications is the ability to log all questions to a table in the database. The first time a question is asked in the deployed Web solution, a table is created called EQLog. The first question and all subsequent questions asked using the Web-deployed application are logged in this table along with a response type, which indicates if the question was successfully answered or not. Periodically review the data in this table and improve the model to answer the questions that failed.
Note The EQLog table is owned by the run-time Web application user account. The first time a question is asked, the Web user account needs permission to create a table in the database. After the table is created, remove the permission to create a table and give the Web user account INSERT permission for the EQLog table.
During your testing phase or after the application is finished, you may discover that specific questions are not being answered correctly. Because database designs and ad hoc questions are unique, it is impossible to give instructions on how to fix specific problems. Here is one troubleshooting methodology:
On the Analysis tab of the Model Test Window, look at the relationships being used by English Query. A setting in one of those entities or relationships may be causing the problem. Or, you may need to create another relationship.
Look at the Transact-SQL generated by English Query. The Transact-SQL can show you how English Query is generating results. For example, if a question returns the necessary data along with unwanted data, the problem may an outer join. You can disable outer joins for individual joins on the SQL tab, or for the entire model in the Project Properties dialog box.
Look at how English Query has rephrased the question. If the rephrased question is very different from the original question, the phrasing of the question may be the problem. You may need to define synonyms, or there may be ambiguous relationships in the model.
Ask the question another way. If a question is answered when asked one way, but not another, the difference can give you hints about what is going wrong. For example, you may need to add synonyms for key words.