Back to My CE 394K
Course Materials
CE 394K: GIS in Water Resources Final
Project
Municipal Water Consumption and Health Issues in
Mark Strahota
Final Report
List of Figures
Figure 1..................................................................................Municipal Water Use by County
Figure 2............................................................................................Fatality Rates by County
Figure 3................................................................Overall Death Rate vs. Municipal
Water Use
Figure 4........................................................................................Total Population by County
Figure 5.......................................................................................Low Birth Weight by County
Figure 6................................................................Influenza and Pneumonia Deaths by
County
Figure 7........................................................................Alzheimer’s disease Deaths by
County
Figure 8.................................................................................................Homicides by County
Figure 9.................................................................................Location of “High-Risk”
Counties
Figure 10.............................................................Creation of the Colored Maps Using
ArcMap
Figure 11..................................................Natural Breaks (Jenks) in Municipal
Water Use Data
Figure 12.............................................................Municipal Water Use Map With Five
Classes
Figure 13.........................................................................Fatality Rate Map With
Five Classes
Figure 14...................................................Choosing the “Equal
Interval” Classification Method
Figure 15................. Municipal Water Use Data Using the
“Equal Interval” Classification Method
Figure 16................................... Fatality Rates Using the
“Equal Interval” Classification Method
Figure 17...........Municipal Water Data
Using “Equal Interval” and Five Classes (Optimal Display)
Figure 18.......................Fatality
Rates Using “Equal Interval” and Five Classes (Optimal Display)
List of Tables
Table 1.........................................................................Top Five Counties Per
Cause of Death
Table 2...............................................................Symbology Classification Methods in
ArcMap
Introduction
All
doctors agree that drinking lots of clean, fresh water has an overall positive
impact on a person’s health. In contrast, history tells us that an
improper or insufficient water supply can directly cause and/or contribute to
unhealthy conditions for the general population. The original goal of this term
project was to find a relationship between “per capita” water
consumption and health statistics in
Background
The data
from TWDB and TDHS were given a spatial association by using the Join and Relate function in ArcMap and joining the data to a Texas county shapefile
provided in CE 394K
Exercise 1. Maps were created using a color spectrum to show municipal
water use and health data, classified and ranked according to the data for all
the 255 counties in
Figure
1: Municipal water
use by county Figure
2: Fatality rates by county
(Gallons
per capita per day) (Deaths
per 1000 people)
In light
of this, a statistical comparison was performed in Microsoft Excel to find just
how much correlation could be seen. Graphs were created with municipal water
use on the abscissa and health statistics on the ordinate to compare the two
sets of data, as shown in Figure 3.
However, when this analysis was completed, no correlation between municipal
water use and public health could be found. For example, see Figure 3,
which is a Microsoft Excel Chart showing the same data as that on the maps
above; recall that as the R2 value approaches 1.0, more correlation
exists.
Figure
3: Overall Death
Rate vs. Municipal Water Use
Graphs
showing comparisons of other health data to municipal water use displayed a
similar lack of correlation. Taking these results into consideration, it was
decided that the focus of this project should be shifted for the remainder of
the semester. A study was to be conducted to figure out why the ArcMap maps did
not accurately reflect the true correlation of the data (or lack thereof) and
what might be changed in the display options or statistical analysis of ArcMap
to more effectively convey the results. In other words, a secondary objective
was added: to determine how to configure a statistical map to avoid misleading
the analyst.
Primary
Objective: Municipal Water Use and Health
While no apparent correlation was discovered between
municipal water use and human health, several interesting results were
obtained, particularly to

Figure
4: Total Population
by County (Most: 1.
Figure
5: Low Birth Weight Figure
6: Influenza and Pneumonia
(Occurrences
per 1000 people) (Deaths
per 1000 people)
Figure
7: Alzheimer’s Disease Figure 8: Homicides
(Deaths
per 1000 people) (Deaths
per 1000 people)
Table
1: Microsoft Excel
spreadsheet showing top five counties according to cause of death (highlighted
entries appear more than three times on different lists; see map below for
locations of these counties)
Figure
9: Location of
“High-Risk” Counties (appear on the lists in Table 1 more
than three times)
When this
phase of the project was completed, several possible sources of error came to
light. One of these sources is the possibility of misinterpretation of the term
“municipal” water use, or miscalculation of water use in the
municipal realm. As shown in Figure 1,
many of the counties with the highest municipal water consumption (red
counties) are in the western, semi-arid regions of the state, where more water
for irrigation and industrial utilities would be needed. There is no obvious
reason why more water would be needed for personal use in these areas,
especially in the relative amounts indicated in the data. This signifies the
possibility of water used for irrigation or other industrial uses being
included under the “municipal” classification of water use.
Another
possible source of error is the potential of low population numbers skewing the
“per capita” data one way or the other. For instance, there were 65
people residing in
The other
source of error considered involves the color classification of the maps
displayed in ArcMap. All of the colored maps shown above were created in ArcMap
using a color spectrum, with twenty-five class divisions calculated according
to the default method, which is “Natural Breaks (Jenks).” This is
implemented using the Layer Properties
feature, under the Symbology tab, as
shown in Figure 10.
Figure 10: Creation of the colored maps using
ArcMap
The
“Natural Breaks (Jenks)” classification method is applied within
ArcMap as described in the Help file:
Classes
are based on natural groupings inherent in the data. ArcMap identifies break
points by picking the class breaks that best group similar values and maximize
the differences between classes. The features are divided into classes whose
boundaries are set where there are relatively big jumps in the data values.
Figure 11: Natural Breaks (Jenks) in
municipal water use data
Figures 10 and 11 show five classes for simplicity, but the maps created above
actually employed twenty-five classes. As shown in Figure 11, in this classification method, the spacing between the
breaks is not even. The number of data entries at each value is on the
ordinate; the data values are on the abscissa. This statistical interpretation
was considered to be the cause of the misunderstanding of the maps at the beginning
of the project. The Secondary Objective was assigned to figure out if this
indeed was the cause and, if so, how it can be avoided in the future.
Secondary
Objective: Use of Color in ArcGIS Maps
In the
first attempt to more accurately display statistical maps in ArcGIS, the number
of symbology classes was decreased from twenty-five to five. The classification
was kept according to the “Natural Breaks (Jenks),” method,
however. The idea was that fewer colors would make the maps easier to read, and
it would be a simpler task to distinguish the relative amounts between the
counties.
When the
change was made for the municipal water use map (compare to Figure 1), the result was Figure 12. As you can see, this change
made little difference in the resulting map except that there are fewer red
counties, and more counties in the intermediate color ranges (oranges and
yellows). In Figure 13, the fatality
rates are shown (compare to Figure 2)
and, once again, it appears that the “red counties” are opposite
each other when contrasting Figures 12 and
13. As we know from the statistical
analysis, this is not the case. The conclusion, therefore, was that decreasing
the number of classes did not effectively show the relative values among
different areas.
Figure 12: Municipal water use map with Figure 13: Fatality rate map with five
five classes (compare to Figure 1) classes
(compare to Figure 2)
The next
idea to improve the display of the maps was to change the classification method
of the maps. The first effort at this was choosing the “Equal
Interval” classification method, as shown in Figure 14 using five entries. This is performed by clicking the
“Classify” button on the Symbology
tab under Layer Properties (see Figure 10), and selecting the
“Equal Interval” option under the Classification Method drop box (see Figure 14).
Figure 14: Choosing the “Equal
Interval” classification method
The
results of this change were encouraging, if only that the results were
different from previous maps. In fact, the map created displayed significantly
more green counties, with only a few in the red class. Figure 15 shows the municipal water use data map again, this time
using the “Equal Interval” classification method with twenty-five
classes.
Figure 15: Municipal water use data using the
“Equal Interval” classification method
Figure 16 shows fatality rates again, but
this time using the “Equal Interval” classification method, and it
can be seen that again a significant change takes place (compare to Figure 2). When comparing Figures 15 and 16, there appears to be no correlation at all between the two,
which is, in fact, the case. It was concluded, therefore, that the “Equal
Interval” classification method is the most accurate portrayal of the
data for this project.
Figure 16: Fatality rates using the
“Equal Interval” classification method
The other
classification methods available in ArcMap were also applied to this project
and all except the “Defined Interval” method displayed misleading
results similar to the “Natural Breaks (Jenks)” method. The
“Defined Interval” method is the same as the “Equal
Interval” method, except that the user identifies the size of each
interval rather than the number of intervals. As a reference, the different
classification methods are provided in Table
2 with short descriptions.
After the
“Equal Interval” method was applied, it was realized that the map
may be more readable with fewer classes. Since decreasing the number of classes
did not significantly affect the display of the maps previously, this was
applied and the display was again relatively unchanged. Therefore, the optimal
map display for these data was determined to be with implementation of the
“Equal Interval” classification method and five classes, as shown
in Figures 17 and 18. Of course, the number of classes is
somewhat a matter of opinion and is open to the interpretation of the analyst.
Figure 17: Municipal water data using
“Equal Interval” and five classes (optimal display)
Figure 18: Fatality rates using “Equal
Interval” and five classes (optimal display)
|
Table 2:
Symbology Classification Methods in ArcMap |
||
|
Method |
Description |
Diagram |
|
Natural Breaks
(Jenks) |
Classes are based on natural
groupings inherent in the data. ArcMap identifies break points by picking the
class breaks that best group similar values and maximize the differences
between classes. The features are divided into classes whose boundaries are
set where there are relatively big jumps in the data values. |
|
|
Quantile |
Each class contains an equal number
of features. A quantile classification is well
suited to linearly distributed data. Because features are grouped by the
number in each class, the resulting map can be misleading. Similar features
can be placed in adjacent classes, or features with widely different values
can be put in the same class. You can minimize this distortion by increasing
the number of classes. |
|
|
Equal Interval |
This classification scheme divides
the range of attribute values into equal-sized subranges,
allowing you to specify the number of intervals while ArcMap determines where
the breaks should be. For example, if features have attribute values ranging
from 0 to 300 and you have three classes, each class represents a range of
100 with class ranges of 0–100, 101–200, and 201–300. This
method emphasizes the amount of an attribute value relative to other values,
for example, to show that a store is part of the group of stores that made up
the top one-third of all sales. It’s best applied to familiar data
ranges such as percentages and temperature. |
|
|
Defined Interval |
This classification scheme allows
you to specify an interval by which to equally divide a range of attribute
values. Rather than specifying the number of intervals as in the equal
interval classification scheme, with this scheme, you specify the interval
value. ArcMap automatically determines the number of classes based on the
interval. |
|
|
Standard Deviation |
This classification scheme shows
you how much a feature’s attribute value varies from the mean. ArcMap
calculates the mean value and the standard deviations from the mean. Class
breaks are then created using these values. A two-color ramp helps emphasize
values above and below the mean. |
|
Conclusions
As stated
above, there was no apparent correlation between municipal water use data and
overall health. This is contrary to the trend seen in statistical maps prepared
according to the “Natural Breaks (Jenks)” classification method
(the default method) in ArcMap. Therefore, it was
concluded that the “Equal Interval” classification method is better
for analyzing statistical data. Also, for this project, 5-10 “Equal
Interval” classes proved to be the most effective way to view the maps.
Although
the “Standard Deviation” classification method displayed results
that were more similar to the “Equal Interval” method, it still
produced a misleading map. The description of the “Quantile”
method actually states that “the resulting map can be misleading,”
and the “Natural Breaks (Jenks)” method appears to have a
relatively similar classification. As might be inferred from this paper, this
user would expect a color-coded map to be classified according to a system such
as the “Equal Interval” method. Perhaps future versions of ArcMap would use this method as the default method to avoid
misleading maps.