Database Design

by Jo K. Horan

Part 1 - Terms and Explanations

Designing a database, and doing it right, is easier than you think. It’s the terminology that scares a lot of people away. Back when “relational databases” were new, the mathematicians and scholars who developed the concepts didn’t worry about whether their ideas and definitions made sense to other people. Now, since we all depend on computerized information, we need to know how to handle it. This series of articles will take you step by step through the process of learning what databases are all about, analyzing and organizing your data, and finally creating and using your own database applications.

It doesn’t matter which database product you decide to use, since the principles of good database design are always the same. We’ll make use of the Alpha Four / Alpha Five Sig meetings to provide a workshop environment, where each month we’ll discuss and explain the topics outlined in that month’s magazine article. But the emphasis will be on content, not technique, so you can easily apply whatever you learn to your database product of choice. Please join us at HAL-PC headquarters at 6:30 p.m. on Monday, July 13th, for a hands-on demonstration of the things you’re about to learn from this article. Now, read on!

Let’s start with a definition of the term “database”. I like to call it “an organized collection of related information”. The real world is full of databases, and has been for a long time. We have telephone books (both white and yellow pages), library card catalogs, checkbooks, and so on. We organize this information in different ways, depending on how we expect to use it. Telephone books are alphabetic, by residential areas or by commercial categories. Card catalogs are arranged alphabetically by title, by author, and by subject, using slight variations of the same basic data. And check registers are normally sequenced by check numbers, which generally correspond to chronological dates, but not always. So what we have is a collection of facts and figures, arranged to answer predictable questions. The original data is the input; the information it provides is the output. The way the data is organized depends on the types of questions the data is supposed to answer.

In the world of computers, the term “database” has taken on several different meanings over the years. At one point it described just the data itself, stored in a particular format. Then the term was expanded to include not just the data, but all the peripheral files associated with the data that allowed users to enter, view, print, and modify their information. Now the word usually means a collection of several different but related sets of data, which can be accessed together or separately, along with all their associated peripheral files. This new interpretation of the term “database” meant that we had to come up with another word to describe the data itself, which is now usually referred to by the term “table”. This can be confusing at first, but once you’ve been introduced to the concepts, you’ll be able to understand the terminology and what it means.

Now, for database purposes, let’s define a “table”. Think of a spreadsheet, or any printed chart or layout that consists of rows and columns. A simple name-and-address list is a good example. Each row (or “record”) will include a last name and a first name, a street address, a city, state, and a zip code. Sometimes there will be a title, a middle name or initial, a second address line, and one (or more) phone numbers. Each row identifies one individual; each column (or “field”) has specific information about that individual, and all the entries in the same column contain the same type of data. Every column has a unique heading, so that you can distinguish between first and last names, primary and secondary address lines, and home and business phone numbers. And if any piece of information is missing, you’ll leave that column’s entry blank. You won’t move everything else over to fill in the gap, because doing that would put different kinds of data into the same column, which you don’t want to do.

If a layout like this meets the following requirements, it qualifies as a legitimate database table:

(1) The sequence of columns from left to right doesn’t matter, as long as you know from the unique column headings which field is which.

(2) The order of rows from top to bottom doesn’t matter, because each record is independent of every other record. In our example, if you have two people sharing the same address, you’ll enter complete address information for each of them; then neither record is dependent upon the other one.

(3) There are no duplicate rows; each row is different from every other row. If necessary, you will include an additional unique “key” field (like a social security number, for example) to make sure that this is the case.

(4) Data doesn’t repeat itself within a field. For instance, if you need both day and evening phone numbers, you’ll create two separate fields, instead of trying to stuff both numbers into the same field.

(5) Each field contains only the data it’s designed to hold; no arbitrary mixing and matching is allowed. If a new data item appears on the scene, you’ll create a new column for it; you won’t try to substitute it for something else.

(6) Then, given adequate data to identify the correct row in the table, you can select any one column from that row in order to retrieve any piece of information about that entry that you need.

There are some other practical things to consider. The first one is the type of data that’s to be stored in each field. Text fields are designed to hold short strings of character data, such as names and addresses. These fields can include either letters, or numbers, or both. Some fields may have just numbers, such as 5-digit zip codes, or numbers with other special characters like dashes or parentheses, as in 9-digit zip codes or telephone numbers. But even if a field contains only digits, it should still be defined as a text or character field unless you expect to do arithmetic on the number the field represents. If that’s the case, define the field as numeric, specifying the right number of decimal places (if any), and then you’ll be able to make use of appropriate numeric formatting functions such as dollar signs and commas when you display or print the field.

You may want to include date and/or time fields in your table. Different database products handle dates and times in various ways, but take advantage of whatever options are available to you. It’s easier (and more accurate) than trying to do it yourself. Some products also include logical or Boolean fields. These fields accept only “Yes or No” options, sometimes limited to one character such as Y or N, T or F, 1 or 0, but they can be convenient for making choices. And to store long comments or descriptive information, try memo fields. These fields are designed to be variable in length, and they make more efficient use of computer disk space than long character fields that may frequently be blank. And you can even include pictures with some software! These fields can contain actual photographs, or corporate logos, or various other symbols or diagrams.

With all these options, you need a little bit of control. The most important consideration is probably the order in which you view your records. Different database products use different techniques, but they usually offer you several choices. You can view the records in the order in which they were originally entered, or you can specify one or more fields to be used for sorting or sequencing the records. Sometimes this involves physically rearranging the records, but it can also be done more efficiently by using a method called “indexing”. An index file consists of a copy of the field (or fields) you’ve chosen to sort on, along with a pointer to the actual record from which the copied fields came. Since the index file is smaller than the original table, it can be physically sorted more easily. Then the software reads the sorted index file sequentially, using the file’s pointers to access the corresponding records from the actual table in the order defined by the index file. All this goes on behind the scenes, so it looks the same to the user, but it’s faster than sorting the real data every time you want to view it in a different way.

That’s enough for now. We’ve defined a “database”, which is made up of one or more tables, and which may include other computer files. We’ve described a “table”, which consists of rows and columns of data and conforms to certain rules. We’ve talked about the “records” that make up the table, and the “fields” that are included in each record. We’ve discussed “key” fields, which guarantee that records are unique. And we’ve mentioned “indexes”, which provide a way to sort or sequence a table. Next month we’ll talk about analyzing your data and breaking it out into separate tables, deciding what belongs together and what doesn’t, and determining how you want to use the information your data can provide. In the meantime, plan to come to HAL-PC headquarters for the Alpha Four / Alpha Five Sig meeting at 6:30 p.m. on Monday, July 13th, and we’ll try to answer any questions you may have about these fundamental database design terms and concepts. See you then!

Jo K. Horan is a database developer, consultant, and trainer, specializing in Alpha Four and Alpha Five. She is leader of the Alpha Four / Alpha Five Special Interest Group within HAL-PC, the Houston Area League of PC Users. She may be reached at 713-723-6709 or by mail at DataBasics, 9806 Cliffwood, Houston, TX 77096.

E-mail me at webmaster@hal-pc.org with any comments you have and tell me what you want to see here.

Back to the Magazine Home Page