by Jo K. Horan
Designing a database, and doing it right, is easier than you think. Its the terminology that scares a lot of people away. Back when relational databases were new, the mathematicians and scholars who developed the concepts didnt worry about whether their ideas and definitions made sense to other people. Now, since we all depend on computerized information, we need to know how to handle it. This series of articles will take you step by step through the process of learning what databases are all about, analyzing and organizing your data, and finally creating and using your own database applications.
It doesnt matter which database product you decide to use, since the principles of good database design are always the same. Well make use of the Alpha Four / Alpha Five Sig meetings to provide a workshop environment, where each month well discuss and explain the topics outlined in that months magazine article. But the emphasis will be on content, not technique, so you can easily apply whatever you learn to your database product of choice. Please join us at HAL-PC headquarters at 6:30 p.m. on Monday, July 13th, for a hands-on demonstration of the things youre about to learn from this article. Now, read on!
Lets start with a definition of the term database. I like to call it an organized collection of related information. The real world is full of databases, and has been for a long time. We have telephone books (both white and yellow pages), library card catalogs, checkbooks, and so on. We organize this information in different ways, depending on how we expect to use it. Telephone books are alphabetic, by residential areas or by commercial categories. Card catalogs are arranged alphabetically by title, by author, and by subject, using slight variations of the same basic data. And check registers are normally sequenced by check numbers, which generally correspond to chronological dates, but not always. So what we have is a collection of facts and figures, arranged to answer predictable questions. The original data is the input; the information it provides is the output. The way the data is organized depends on the types of questions the data is supposed to answer.
In the world of computers, the term database has taken on several different meanings over the years. At one point it described just the data itself, stored in a particular format. Then the term was expanded to include not just the data, but all the peripheral files associated with the data that allowed users to enter, view, print, and modify their information. Now the word usually means a collection of several different but related sets of data, which can be accessed together or separately, along with all their associated peripheral files. This new interpretation of the term database meant that we had to come up with another word to describe the data itself, which is now usually referred to by the term table. This can be confusing at first, but once youve been introduced to the concepts, youll be able to understand the terminology and what it means.
Now, for database purposes, lets define a table. Think of a spreadsheet, or any printed chart or layout that consists of rows and columns. A simple name-and-address list is a good example. Each row (or record) will include a last name and a first name, a street address, a city, state, and a zip code. Sometimes there will be a title, a middle name or initial, a second address line, and one (or more) phone numbers. Each row identifies one individual; each column (or field) has specific information about that individual, and all the entries in the same column contain the same type of data. Every column has a unique heading, so that you can distinguish between first and last names, primary and secondary address lines, and home and business phone numbers. And if any piece of information is missing, youll leave that columns entry blank. You wont move everything else over to fill in the gap, because doing that would put different kinds of data into the same column, which you dont want to do.
If a layout like this meets the following requirements, it qualifies as a legitimate database table:
(1) The sequence of columns from left to right doesnt matter, as long as you know from the unique column headings which field is which.
(2) The order of rows from top to bottom doesnt matter, because each record is independent of every other record. In our example, if you have two people sharing the same address, youll enter complete address information for each of them; then neither record is dependent upon the other one.
(3) There are no duplicate rows; each row is different from every other row. If necessary, you will include an additional unique key field (like a social security number, for example) to make sure that this is the case.
(4) Data doesnt repeat itself within a field. For instance, if you need both day and evening phone numbers, youll create two separate fields, instead of trying to stuff both numbers into the same field.
(5) Each field contains only the data its designed to hold; no arbitrary mixing and matching is allowed. If a new data item appears on the scene, youll create a new column for it; you wont try to substitute it for something else.
(6) Then, given adequate data to identify the correct row in the table, you can select any one column from that row in order to retrieve any piece of information about that entry that you need.
There are some other practical things to consider. The first one is the type of data thats to be stored in each field. Text fields are designed to hold short strings of character data, such as names and addresses. These fields can include either letters, or numbers, or both. Some fields may have just numbers, such as 5-digit zip codes, or numbers with other special characters like dashes or parentheses, as in 9-digit zip codes or telephone numbers. But even if a field contains only digits, it should still be defined as a text or character field unless you expect to do arithmetic on the number the field represents. If thats the case, define the field as numeric, specifying the right number of decimal places (if any), and then youll be able to make use of appropriate numeric formatting functions such as dollar signs and commas when you display or print the field.
You may want to include date and/or time fields in your table. Different database products handle dates and times in various ways, but take advantage of whatever options are available to you. Its easier (and more accurate) than trying to do it yourself. Some products also include logical or Boolean fields. These fields accept only Yes or No options, sometimes limited to one character such as Y or N, T or F, 1 or 0, but they can be convenient for making choices. And to store long comments or descriptive information, try memo fields. These fields are designed to be variable in length, and they make more efficient use of computer disk space than long character fields that may frequently be blank. And you can even include pictures with some software! These fields can contain actual photographs, or corporate logos, or various other symbols or diagrams.
With all these options, you need a little bit of control. The most important consideration is probably the order in which you view your records. Different database products use different techniques, but they usually offer you several choices. You can view the records in the order in which they were originally entered, or you can specify one or more fields to be used for sorting or sequencing the records. Sometimes this involves physically rearranging the records, but it can also be done more efficiently by using a method called indexing. An index file consists of a copy of the field (or fields) youve chosen to sort on, along with a pointer to the actual record from which the copied fields came. Since the index file is smaller than the original table, it can be physically sorted more easily. Then the software reads the sorted index file sequentially, using the files pointers to access the corresponding records from the actual table in the order defined by the index file. All this goes on behind the scenes, so it looks the same to the user, but its faster than sorting the real data every time you want to view it in a different way.
Thats enough for now. Weve defined a database, which is made up of one or more tables, and which may include other computer files. Weve described a table, which consists of rows and columns of data and conforms to certain rules. Weve talked about the records that make up the table, and the fields that are included in each record. Weve discussed key fields, which guarantee that records are unique. And weve mentioned indexes, which provide a way to sort or sequence a table. Next month well talk about analyzing your data and breaking it out into separate tables, deciding what belongs together and what doesnt, and determining how you want to use the information your data can provide. In the meantime, plan to come to HAL-PC headquarters for the Alpha Four / Alpha Five Sig meeting at 6:30 p.m. on Monday, July 13th, and well try to answer any questions you may have about these fundamental database design terms and concepts. See you then!
Jo K. Horan is a database developer, consultant, and trainer, specializing in Alpha Four and Alpha Five. She is leader of the Alpha Four / Alpha Five Special Interest Group within HAL-PC, the Houston Area League of PC Users. She may be reached at 713-723-6709 or by mail at DataBasics, 9806 Cliffwood, Houston, TX 77096.
E-mail me at webmaster@hal-pc.org with any comments you have and tell me what you want to see here.