Databases

SAN JOSÉ STATE UNIVERSITY
ECONOMICS DEPARTMENT
Thayer Watkins

Databases

A database is an organized store of data. Without organization the data would be useless. There are two forms of databases, hierarchical and relational. This classification is really: relational databases and everything else. It was the genius of E.F. Codd of IBM-San Jose that is responsible for recognizing the desirability of relational databases. Before Codd people conceived of a database as being a system built upon the special characteristics of the application. This meant that the designer of the database would have to know the details of the information in all of its idiosyncracies. Furthermore, the program who wrote the progams for using and updating the database would also have to know the details of the system thoroughly. This meant that not only the original programmer but all subsequent programmers would have to know the gritty detail of the system. As a database grew and changed over time this became a more and more daunting task. This is the weakness of the hierarchical database.

E.F. Codd recognized the necessity of a uniform and simple structure for databases. He advocated using the concept of relation from mathematics. Strictly speaking, a relation is a set of ordered n-tuples. What this means in practice is that relations are tables, structures with rows and columns. The columns correspond to the variables or attributes of the data. The rows correspond to the records or items in the data.

By virtue of relations being sets, relations can be combined using set operations like union and intersection to create new relations. This corresponds to combining tables "vertically." The more interesting way to combine tables, however, is to combine them "horizontally." This involves combining tables on the basis of the columns they have in common. This operation is called a "join" of tables. The following illustrates this operation.

X1	X2
a	p
b	q

+

X1 Y1
a s

b t

=

X1 X2 Y1
a p s

b q t

This is called a "natural" or "inner" join of the tables. The common column X1 contained in both tables is called the key or primary key of the two tables.
The major advantage of the relational database form is that by having the information in a definite structure a general program can be written for carrying out database operations. Thus the database user does not have to rely upon professional programmers to provide access to the information in the database. The elimination of the professional programmers from the process of using a database saves enormous amounts of time and money and reduces the chance of error from miscommunication between the user and the programmer.
The availability of a standard structure for a database made it possible to have a general query language. The accessing of information from the database reduces to the simple, straightforward process of creating a subtable. There are several languages for carringout this operation but the alternative that appears to becoming the standard is SQL, Standard Query Language. The characteristics of this language were formulated at IBM-San Jose but IBM did not envisage the commercial utilization of such a language in the immediate future of its conceptualization. It fell to Larry Ellison of Oracle Systems to implement SQL.