Center for Research in Water Resources
This module provides information to help you get started on designing your own databases. Data development is often one of the overlooked steps in environmental management applications. However, it is also one of the most important. Without thorough data development, any analysis using the data will be incomplete. This module describes how to build a relational database, how to build a spatial database, and how to connect these databases in a GIS. While the steps describe using Microsoft Access and ArcView, they could also apply to other database programs and GIS interfaces.
A database is simply a collection of information that relates to a particular subject or purpose. The database is stored in one file, but this file can contain multiple tables, all of which relate to the particular subject but contain different data related to that subject. For instance, a database for a retail company could contain a table for customer information, another table for product information, and a final table for shipping orders. All three tables relate to the operations of the company, but they contain different data sets that are easier to manage and understand through the use of separate tables. Tables are the fundamental elements of a database, and all database operations are performed through actions on tables. Tables are essentially storage containers, and database programs such as Microsoft Access can perform various operations on these containers. Data in tables can be managed in the following ways with Access:
In order to combine data from separate tables into one form, query, or report, relationships or links can be set between each table. In the example described above, a common field such as a customer id could exist in both the customer information table and in the shipping orders table. By joining this field between tables, data can be retrieved from each table and combined into another table (a query or report). For example, one could determine all the shipping orders in March for a specific customer. Queries and reports can also perform calculations on data. To extend the above example even further, one could create a report that would calculate the money spent by each customer for orders in March. To accomplish this task, Access (or an equivalent database utility) would determine all the orders and products bought for each customer in March from the shipping orders table and then calculate a sum of $'s spent using prices in the product information table. This figure displays the data management operations in Access:
(1 From Access help menu)
Relational databases can also be used for storing environmental data. For instance, an environmental database could be built for a superfund site. This database could include information such as area identification information, monitoring well data, soil boring data, and chemical analytical results. By creating a query, one could determine where a specific well is located, information about that well, and what groundwater level and chemical measurements have been taken from the well. Databases work in both directions, so one could also find all the locations at which a specific chemical of concern (COC) was measured.
This section describes how to organize data into an easy-to-manage database structure, and it assumes that some site-specific data is already available for the area of interest, such as well logs and monitoring data. The requirements for building a database will vary depending on the amount of available data and the format of the data. For instance, in some cases, the data will have already been compiled within a database program and will just require some reorganization. In other cases, the data will be scattered in boxes of reports, and the user will be required to search through these reports and manually input the data into the database.
The first step is to organize the data you have and determine how each data set relates to another. You should begin developing a format structure. The easiest way to complete this task is to build an outline. There should be one subject to which all the data relates, such as "Environmental Data at a Superfund Site." More specific topics that relate to this subject such as groundwater levels, chemical results, and stratigraphy data will comprise the database tables. Each table can be used in stand-alone work, but it should also have a common field which links to another table in the database. Here is an example figure for an oil refinery database showing the tables and the links between each table:

The Groundwater_Levels, Well, and Sample_Collection tables are all linked to the location table through the common LOC_ID field. Each physical location at the refinery, including wells and soil borings, is described in the Location table. The Groundwater_Levels table contains only information about groundwater elevation measurements at certain locations. Similarly, the Well table contains more specific information about each location that is a well, and the Sample_Collection table lists all the samples collected at each location. The Location table is the focal point through which all other tables are linked (both directly and indirectly), and the structure can be multi-leveled. For example, the Results table describes the chemical concentrations in each sample and is linked to the Sample_Collection table, but it is not directly linked to the Location table.
After arranging the data and devising an outline, the next step is the actual compiling of the database. Set up your tables and then enter the data directly into the table or enter the data with a form. In some cases, the data may be organized in Excel spreadsheets. These spreadsheets can be imported into Access and then linked together. If you are unfamiliar with the database program you are using, this step will take some time. To get a better feel for what a typical database looks like, the Access database described above is provided for you (Microsoft Access '97 is required to view the file once you have downloaded it):
In order to model environmental data in a GIS, it will also be necessary to build a spatial database. This database could include both facility scale and regional scale information. For instance, referring back to the refinery example, it would be useful to know the different land uses surrounding the refinery, the soil characteristics of the region, and the rivers in the nearby area. It would also be useful to know where each building is at the refinery, where the sewers are found, and where the process areas are located. Both types of information can be used during analysis procedures. Andrew Romanek's thesis includes a description using both regional and facility scale data to develop surface water runoff models.
With the expansion of the Internet, a large amount of spatial data can be obtained for free. A regional spatial database could be developed for about any area of the United States or the world. Here are a few web sites to help get you started:
Building a facility level database requires a site-specific approach. Aerial survey companies can be contracted to fly the area of interest and produce orthophotos and/or coverages which describe the site. These orthophotos and coverages can then be utilized by a GIS for analysis and evaluation. Alternatively, a facility-level database can be constructed using CAD drawings or by digitizing drawings. ArcView's CAD Reader extension allows one to import and display CAD files in the View window. Here is an example of a CAD drawing that has been separated into layers and then displayed in ArcView:

When compiling your spatial database, be sure each coverage is in the correct projection. Otherwise, these coverages will not line up properly. For more information on map projections, view the class exercise:
After compiling the relational and spatial databases, it would be useful to connect these two sets of data so that a better understanding of the site characteristics can be developed. A query of concentration values at each location can be created using a database program, but it would be difficult to visualize where each location is and how close one location is to another looking simply at a table. Similarly, a nice map of the site would not reveal very much about large chemical spills in the past. One of the great features about a GIS is its ability to connect spatial data with tabular data. In a GIS, one could see where the highest measured concentration of benzene is on the site and whether this location is close to a storage tank that could have leaked gasoline (benzene is a primary constituent of gasoline).
The SQL Connect feature located under the Project menu allows ArcView to connect with external database programs such as Microsoft Access and FoxPro. For instance, data can be imported into a table within ArcView from a table or query in Access. This table can then be joined to an existing coverage or used to generate a new coverage. All of these tasks are described in the comprehensive exercise entitled Mapping Environmental Data Stored in Microsoft Access. This exercise repeats a lot of the information described above on database structure, but it also includes sections on how to import data into ArcView and then how to model and map this data.
Julie Kim
Center for Research in Water Resources
The University of Texas at Austin
Austin, TX 78712
E-mail: juki@mail.utexas.edu