Creating a Database Plan
The first step in creating a database is creating a plan that serves both as a guide to be used when implementing the database and as a functional specification for the database after it has been implemented. The complexity and detail of a database design is dictated by the complexity and size of the database application as well as the user population.
The nature and complexity of a database application, as well as the process of planning it, can vary greatly. A database can be relatively simple and designed for use by a single person, or it can be large and complex and designed, for example, to handle all the banking transactions for hundreds of thousands of clients. In the first case, the database design may be little more than a few notes on some scratch paper. In the latter case, the design may be a formal document with hundreds of pages that contain every possible detail about the database.
In planning the database, regardless of its size and complexity, use these basic steps:
- Gather information.
- Identify the objects.
- Model the objects.
- Identify the types of information for each object.
- Identify the relationships between objects.
Before creating a database, you must have a good understanding of the job the database is expected to perform. If the database is to replace a paper-based or manually performed information system, the existing system will give you most of the information you need. It is important to interview everyone involved in the system to find out what they do and what they need from the database. It is also important to identify what they want the new system to do, as well as to identify the problems, limitations, and bottlenecks of any existing system. Collect copies of customer statements, inventory lists, management reports, and any other documents that are part of the existing system, because these will be useful to you in designing the database and the interfaces.
Identifying the Objects
During the process of gathering information, you must identify the key objects or entities that will be managed by the database. The object can be a tangible thing, such as a person or a product, or it can be a more intangible item, such as a business transaction, a department in a company, or a payroll period. There are usually a few primary objects, and after these are identified, the related items become apparent. Each distinct item in your database should have a corresponding table.
The primary object in the pubs sample database included with Microsoft® SQL Server™ 2000 is a book. The objects related to books within this company's business are the authors who write the books, the publishers who manufacture the books, the stores which sell them, and the sales transactions performed with the stores. Each of these objects is a table in the database.
Modeling the Objects
As the objects in the system are identified, it is important to record them in a way that represents the system visually. You can use your database model as a reference during implementation of the database.
For this purpose, database developers use tools that range in technical complexity from pencils and scratch paper to word processing or spreadsheet programs, and even to software programs specifically dedicated to the job of data modeling for database designs. Whatever tool you decide to use, it is important that you keep it up-to-date.
SQL Server Enterprise Manager includes visual design tools such as the Database Designer that can be used to design and create objects in the database. For more information see, Database Designer.
Identifying the Types of Information for Each Object
After the primary objects in the database have been identified as candidates for tables, the next step is to identify the types of information that must be stored for each object. These are the columns in the object's table. The columns in a database table contain a few common types of information:
- Raw data columns
These columns store tangible pieces of information, such as names, determined by a source external to the database.
- Categorical columns
These columns classify or group the data and store a limited selection of data such as true/false, married/single, VP/Director/Group Manager, and so on.
- Identifier columns
These columns provide a mechanism to identify each item stored in the table. These columns often have id or number in their names (for example, employee_id, invoice_number, and publisher_id). The identifier column is the primary component used by both users and internal database processing for gaining access to a row of data in the table. Sometimes the object has a tangible form of ID used in the table (for example, a social security number), but in most situations you can define the table so that a reliable, artificial ID can be created for the row.
- Relational or referential columns
These columns establish a link between information in one table and related information in another table. For example, a table that tracks sales transactions will commonly have a link to the customers table so that the complete customer information can be associated with the sales transaction.
Identifying the Relationships Between Objects
One of the strengths of a relational database is the ability to relate or associate information about various items in the database. Isolated types of information can be stored separately, but the database engine can combine data when necessary. Identifying the relationships between objects in the design process requires looking at the tables, determining how they are logically related, and adding relational columns that establish a link from one table to another.
For example, the designer of the pubs database has created tables for titles and publishers in the database. The titles table contains information for each book: an identifier column named title_id; raw data columns for the title, the price of the book, and the publishing date; and some columns with sales information for the book. The table contains a categorical column named type, which allows the books to be grouped by the type of content in the book. Each book also has a publisher, but the publisher information is in another table; therefore, the titles table has a pub_id column to store just the ID of the publisher. When a row of data is added for a book, the publisher ID is stored with the rest of the book information.