Back to My CE 394K Course Materials

CE 394K: GIS in Water Resources Final Project

 

 

Municipal Water Consumption and Health Issues in Texas

Mark Strahota

Final Report

 

3 December 2004

 

 

 

 

 

List of Figures

 

Figure 1..................................................................................Municipal Water Use by County

Figure 2............................................................................................Fatality Rates by County

Figure 3................................................................Overall Death Rate vs. Municipal Water Use

Figure 4........................................................................................Total Population by County

Figure 5.......................................................................................Low Birth Weight by County

Figure 6................................................................Influenza and Pneumonia Deaths by County

Figure 7........................................................................Alzheimer’s disease Deaths by County

Figure 8.................................................................................................Homicides by County

Figure 9.................................................................................Location of “High-Risk” Counties

Figure 10.............................................................Creation of the Colored Maps Using ArcMap

Figure 11..................................................Natural Breaks (Jenks) in Municipal Water Use Data

Figure 12.............................................................Municipal Water Use Map With Five Classes

Figure 13.........................................................................Fatality Rate Map With Five Classes

Figure 14...................................................Choosing the “Equal Interval” Classification Method

Figure 15................. Municipal Water Use Data Using the “Equal Interval” Classification Method

Figure 16................................... Fatality Rates Using the “Equal Interval” Classification Method

Figure 17...........Municipal Water Data Using “Equal Interval” and Five Classes (Optimal Display)

Figure 18.......................Fatality Rates Using “Equal Interval” and Five Classes (Optimal Display)

 

 

 

 

 

List of Tables

 

Table 1.........................................................................Top Five Counties Per Cause of Death

Table 2...............................................................Symbology Classification Methods in ArcMap


Introduction

 

All doctors agree that drinking lots of clean, fresh water has an overall positive impact on a person’s health. In contrast, history tells us that an improper or insufficient water supply can directly cause and/or contribute to unhealthy conditions for the general population. The original goal of this term project was to find a relationship between “per capita” water consumption and health statistics in Texas on a large scale. Water consumption statistics from the Texas Water Development Board (TWBD) were compared on a geographic basis to occurrences of low birth weight, overall death rate, and death rates attributed to specific causes. All health data was obtained from the Texas Department of Health Services (TDHS), and both the water use data and health data are from the year 2002.

 

 

Background

 

The data from TWDB and TDHS were given a spatial association by using the Join and Relate function in ArcMap and joining the data to a Texas county shapefile provided in CE 394K Exercise 1. Maps were created using a color spectrum to show municipal water use and health data, classified and ranked according to the data for all the 255 counties in Texas. In Figures 1 and 2, municipal water use and overall death rates are respectively shown using maps created in ArcMap. Consider that, as the value of water use or fatality rate increases, the color of the county shifts from green to red, so that the counties with the highest values are shown in red. When this was completed, it appeared that there was some correlation between municipal water use and public health. Indeed, the maps indicated that those counties with higher municipal water use tended to enjoy lower fatality rates and less occurrences of low birth weight. Note that the “red counties” seem to be roughly opposite each other when comparing Figures 1 and 2, indicating perhaps an indirectly proportional relationship between death rates and water use.

 

 

                             

 

Figure 1: Municipal water use by county                          Figure 2: Fatality rates by county

(Gallons per capita per day)                                            (Deaths per 1000 people)

 

In light of this, a statistical comparison was performed in Microsoft Excel to find just how much correlation could be seen. Graphs were created with municipal water use on the abscissa and health statistics on the ordinate to compare the two sets of data, as shown in Figure 3. However, when this analysis was completed, no correlation between municipal water use and public health could be found. For example, see Figure 3, which is a Microsoft Excel Chart showing the same data as that on the maps above; recall that as the R2 value approaches 1.0, more correlation exists.

 

Figure 3: Overall Death Rate vs. Municipal Water Use

 

Graphs showing comparisons of other health data to municipal water use displayed a similar lack of correlation. Taking these results into consideration, it was decided that the focus of this project should be shifted for the remainder of the semester. A study was to be conducted to figure out why the ArcMap maps did not accurately reflect the true correlation of the data (or lack thereof) and what might be changed in the display options or statistical analysis of ArcMap to more effectively convey the results. In other words, a secondary objective was added: to determine how to configure a statistical map to avoid misleading the analyst.

 

 

Primary Objective: Municipal Water Use and Health

 

While no apparent correlation was discovered between municipal water use and human health, several interesting results were obtained, particularly to Texas residents. A series of maps and tables are shown below that provide more of the output of the statistical analysis. Maps of municipal water use and overall death rate are not included as they are shown above in Figures 1 and 2. Recall that the values on the maps increase as the color of the county progresses from green to red.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 4: Total Population by County (Most: 1. Harris County - Houston metro area, 2. Dallas County - Dallas metro area, Least: Loving County - Northwest elbow)

 

                         

Figure 5: Low Birth Weight                                              Figure 6: Influenza and Pneumonia

(Occurrences per 1000 people)                                        (Deaths per 1000 people)

                         

Figure 7: Alzheimer’s Disease                                         Figure 8: Homicides

(Deaths per 1000 people)                                                (Deaths per 1000 people)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Table 1: Microsoft Excel spreadsheet showing top five counties according to cause of death (highlighted entries appear more than three times on different lists; see map below for locations of these counties)

 

 

Figure 9: Location of “High-Risk” Counties (appear on the lists in Table 1 more than three times)

 

When this phase of the project was completed, several possible sources of error came to light. One of these sources is the possibility of misinterpretation of the term “municipal” water use, or miscalculation of water use in the municipal realm. As shown in Figure 1, many of the counties with the highest municipal water consumption (red counties) are in the western, semi-arid regions of the state, where more water for irrigation and industrial utilities would be needed. There is no obvious reason why more water would be needed for personal use in these areas, especially in the relative amounts indicated in the data. This signifies the possibility of water used for irrigation or other industrial uses being included under the “municipal” classification of water use.

 

Another possible source of error is the potential of low population numbers skewing the “per capita” data one way or the other. For instance, there were 65 people residing in Loving County in 2002, and zero people died, which, while not likely in a given year, is very plausible. However, the zero percent fatality rate skews the data for that county to the point of eliminating its applicability to this project altogether. There are several other counties, especially in west Texas, that have similar populations and such extreme fatality rates from only a few deaths. These low populations can also cause some high water use statistics on a “per capita” basis, which contributes to error in data correlation.

 

The other source of error considered involves the color classification of the maps displayed in ArcMap. All of the colored maps shown above were created in ArcMap using a color spectrum, with twenty-five class divisions calculated according to the default method, which is “Natural Breaks (Jenks).” This is implemented using the Layer Properties feature, under the Symbology tab, as shown in Figure 10.

 

Figure 10: Creation of the colored maps using ArcMap

 

The “Natural Breaks (Jenks)” classification method is applied within ArcMap as described in the Help file:

 

Classes are based on natural groupings inherent in the data. ArcMap identifies break points by picking the class breaks that best group similar values and maximize the differences between classes. The features are divided into classes whose boundaries are set where there are relatively big jumps in the data values.

 

Figure 11: Natural Breaks (Jenks) in municipal water use data

 

Figures 10 and 11 show five classes for simplicity, but the maps created above actually employed twenty-five classes. As shown in Figure 11, in this classification method, the spacing between the breaks is not even. The number of data entries at each value is on the ordinate; the data values are on the abscissa. This statistical interpretation was considered to be the cause of the misunderstanding of the maps at the beginning of the project. The Secondary Objective was assigned to figure out if this indeed was the cause and, if so, how it can be avoided in the future.

 

 

Secondary Objective: Use of Color in ArcGIS Maps

 

In the first attempt to more accurately display statistical maps in ArcGIS, the number of symbology classes was decreased from twenty-five to five. The classification was kept according to the “Natural Breaks (Jenks),” method, however. The idea was that fewer colors would make the maps easier to read, and it would be a simpler task to distinguish the relative amounts between the counties.

 

When the change was made for the municipal water use map (compare to Figure 1), the result was Figure 12. As you can see, this change made little difference in the resulting map except that there are fewer red counties, and more counties in the intermediate color ranges (oranges and yellows). In Figure 13, the fatality rates are shown (compare to Figure 2) and, once again, it appears that the “red counties” are opposite each other when contrasting Figures 12 and 13. As we know from the statistical analysis, this is not the case. The conclusion, therefore, was that decreasing the number of classes did not effectively show the relative values among different areas.

 

                         

Figure 12: Municipal water use map with                          Figure 13: Fatality rate map with five

five classes (compare to Figure 1)                                   classes (compare to Figure 2)

 

The next idea to improve the display of the maps was to change the classification method of the maps. The first effort at this was choosing the “Equal Interval” classification method, as shown in Figure 14 using five entries. This is performed by clicking the “Classify” button on the Symbology tab under Layer Properties (see Figure 10), and selecting the “Equal Interval” option under the Classification Method drop box (see Figure 14).

 

Figure 14: Choosing the “Equal Interval” classification method

 

The results of this change were encouraging, if only that the results were different from previous maps. In fact, the map created displayed significantly more green counties, with only a few in the red class. Figure 15 shows the municipal water use data map again, this time using the “Equal Interval” classification method with twenty-five classes.

 

Figure 15: Municipal water use data using the “Equal Interval” classification method

 

Figure 16 shows fatality rates again, but this time using the “Equal Interval” classification method, and it can be seen that again a significant change takes place (compare to Figure 2). When comparing Figures 15 and 16, there appears to be no correlation at all between the two, which is, in fact, the case. It was concluded, therefore, that the “Equal Interval” classification method is the most accurate portrayal of the data for this project.

 

 

Figure 16: Fatality rates using the “Equal Interval” classification method

 

The other classification methods available in ArcMap were also applied to this project and all except the “Defined Interval” method displayed misleading results similar to the “Natural Breaks (Jenks)” method. The “Defined Interval” method is the same as the “Equal Interval” method, except that the user identifies the size of each interval rather than the number of intervals. As a reference, the different classification methods are provided in Table 2 with short descriptions.

 

After the “Equal Interval” method was applied, it was realized that the map may be more readable with fewer classes. Since decreasing the number of classes did not significantly affect the display of the maps previously, this was applied and the display was again relatively unchanged. Therefore, the optimal map display for these data was determined to be with implementation of the “Equal Interval” classification method and five classes, as shown in Figures 17 and 18. Of course, the number of classes is somewhat a matter of opinion and is open to the interpretation of the analyst.

 

 

 

 

 

 

Figure 17: Municipal water data using “Equal Interval” and five classes (optimal display)

 

 

 

Figure 18: Fatality rates using “Equal Interval” and five classes (optimal display)

 

 

 

 

Table 2: Symbology Classification Methods in ArcMap

Method

Description

Diagram

Natural Breaks (Jenks)

Classes are based on natural groupings inherent in the data. ArcMap identifies break points by picking the class breaks that best group similar values and maximize the differences between classes. The features are divided into classes whose boundaries are set where there are relatively big jumps in the data values.

Quantile

Each class contains an equal number of features. A quantile classification is well suited to linearly distributed data. Because features are grouped by the number in each class, the resulting map can be misleading. Similar features can be placed in adjacent classes, or features with widely different values can be put in the same class. You can minimize this distortion by increasing the number of classes.

Equal Interval

This classification scheme divides the range of attribute values into equal-sized subranges, allowing you to specify the number of intervals while ArcMap determines where the breaks should be. For example, if features have attribute values ranging from 0 to 300 and you have three classes, each class represents a range of 100 with class ranges of 0–100, 101–200, and 201–300. This method emphasizes the amount of an attribute value relative to other values, for example, to show that a store is part of the group of stores that made up the top one-third of all sales. It’s best applied to familiar data ranges such as percentages and temperature.

Defined Interval

This classification scheme allows you to specify an interval by which to equally divide a range of attribute values. Rather than specifying the number of intervals as in the equal interval classification scheme, with this scheme, you specify the interval value. ArcMap automatically determines the number of classes based on the interval.

Standard Deviation

This classification scheme shows you how much a feature’s attribute value varies from the mean. ArcMap calculates the mean value and the standard deviations from the mean. Class breaks are then created using these values. A two-color ramp helps emphasize values above and below the mean.

 

 

 

 

Conclusions

 

As stated above, there was no apparent correlation between municipal water use data and overall health. This is contrary to the trend seen in statistical maps prepared according to the “Natural Breaks (Jenks)” classification method (the default method) in ArcMap. Therefore, it was concluded that the “Equal Interval” classification method is better for analyzing statistical data. Also, for this project, 5-10 “Equal Interval” classes proved to be the most effective way to view the maps.

 

Although the “Standard Deviation” classification method displayed results that were more similar to the “Equal Interval” method, it still produced a misleading map. The description of the “Quantile” method actually states that “the resulting map can be misleading,” and the “Natural Breaks (Jenks)” method appears to have a relatively similar classification. As might be inferred from this paper, this user would expect a color-coded map to be classified according to a system such as the “Equal Interval” method. Perhaps future versions of ArcMap would use this method as the default method to avoid misleading maps.