Stage Gage Network Optimization in the South Florida Water Management District

Sergio I. Martínez, Venkatesh Merwade, and David Maidment, CRWR


Table of Contents

·         Introduction

·         Study Area

·         Network Optimization in Lakes

·         Network Optimization in Streams

·         Conclusions and Future Work

·         Supporting Material

·         Contact Information

 


Introduction

The South Florida Water Management District (SFWMD), one of the five water management districts of the state, is located on the southern tip of the territory of Florida. The SFWMD provides flood protection and water supply to almost six million inhabitants. The district also manages and restores the ecosystems of the region.  The SFWMD operates a surface-water distribution system, consisting of approximately 1800 miles of canals, over 450 managed structures, and over 1000 flow and stage gages (SFWMD, 2004). The rapid expansion of an already large water management system is pressing the SFWMD to optimize the current and future flow and stage gage network. The CRWR is participating in a pilot project that has as its objective the creation of a methodology to optimize the stage gage network. The optimization of the network is performed following two different optimization processes, one for water elevation gages in lakes and other for gages in streams. Given an admissible error, the research question that is addressed is which gages should be left in place in a studied region?

 

Study Area

In order to develop the tools to perform the stage network optimization over the entire SFWMD, a smaller section of the SFWMD was selected.  The area selected is the Kissimmee River Basin; it includes three of Florida's major lakes, Lake Okeechobee, Lake Istokpoga and Lake Kissimmee. The Kissimmee River Basin is located roughly between latitude 27° 00‘ N and 28° 30’ N and longitude 80° 37’ 30” W and 81° 37’ 30” W.

 

Figure 1. Kissimmee River Basin Study Area. Sub-basins, lakes and flow structures.

Network Optimization in Lakes

The goal of the optimization problem is to select a subset of r stations from a set of n water level monitoring stations that provides the biggest advantage accordingly to a specified objective function. This set of stations could be in a lake or along a stream. If the set is in a lake, a method of mathematical interpolation can be used to find the required interpolated values. The root mean square error of interpolated values of water surface elevation in a lake was selected as the objective function of its network optimization. Assume the existence of a set of n water level measuring stations from which is needed to choose a subset of r stations. The root mean square error is defined as

Where Ii is the interpolated value and Mi is the measured value, both for station i. The subset of r stations that minimize the value of RMSE is the optimal subset. If all the stations are taken into account in the determination of the optimal subset, the number of combinations (or subsets) to be tested will increase according with

This process makes the optimization a time and resources consuming process, requiring the application of an optimization method to find the optimal subset. In the optimization problem of the stations in a lake, where the water elevation varies slightly and continuously from point to point, the interpolation of the interpolated values of the objective function is computed using the following equation:

.

This equation is the base of the inverse square distance interpolation method. The variable di,j represents the distance between station i and station j. All stations j are in the subset of r stations (Figure 2). Then, the optimal subset of r stations taken from a set of n stations for a given measurement dataset is found using the genetic algorithm optimization method.

 

Figure 2. Distances between stations

 

Once all the data has been prepared, the method used to solve the problem of the spatial optimization of the stations in a lake involves four main steps (Figure 3). The user does the first two steps before using the created tools.

 

Figure 3. Network Optimization Process Steps

 

Choose stations and period of analysis. The first step requires the selection of a set of stations and the period of analysis. It is important to choose stations that share the same period of record, and whose operating conditions remain the same during the period of analysis. To have good results it is recommended to choose at least two years of data.

 

Choose admissible error. The second step requires choosing an admissible error. For preliminary assessments, the admissible RMSE suggested by SFWMD is between 0.05 and 0.1 ft.

 

Find daily optimal subsets. The third step identifies the optimal sets of stations for each day of the period of analysis. The genetic algorithm method of optimization is used each time a change occurs in the set of stations or in the required number of stations in the subset of minimum RMSE.  Genetic Algorithms are inspired in the way natural populations evolve. Parents give their children a mixed share of their genes (crossover) but sometimes there are mistakes (mutations) that may lead to better-fitted or unfitted individuals. The best-fitted individuals have better chances of passing their genes to the next generation. In our problem, an individual is a given subset of r stations taken of n stations. The parameter of fitness is the corresponding root mean squared error RMSE. In a given population, the best-fitted individual (the best set of r stations) is that which have the least value of RMSE. After many generations, one can find the optimal subset of r stations taken from that set of n stations.

 

Optimization of a set of stations for a given date may be performed by specifying the range of number of stations in the subset or by specifying an admissible RMSE. Figure 4 illustrates the extension of this process for a given period.

 

 

Figure 4 Identification of the daily optimal subsets of the period

 

Find the optimal subset. Finally, the optimal daily subsets of station results obtained in the third step are analyzed to get the optimal subset of stations for the entire period of analysis.  This analysis is a two-step process. The first process identifies day by day the optimal station subset that has simultaneously an RMSE smaller than or equal to the admissible RMSE and the smaller number of stations. The second process computes the relative frequency for the entire time series period where the station subset complies with the constraint of the admissible RMSE. The station subset with the highest relative frequency becomes the optimal subset of stations for the period of analysis.

 

The Network Optimization in Lakes toolbar, was developed to address this problem. This toolbar has four tools (Figure 5).

 

Figure 5. Network Optimization in Lakes Toolbar.

 

The first two tools OptByRange (Optimization by Range) and OptByAdmRMSE (Optimization by Admissible RMSE) identify optimal subsets of stations and the other two tools StatsOfDailyOptSubsets (Statistics of Daily Optimal Subsets) and StatsOfOptSubsetsAllPeriod (Statistics of Optimal Subsets for the entire period) compute statistics of the daily optimal subsets. The user could use either of the optimization tools once he/she has interactively selected the set of stations to optimize. A period of analysis, a range of r —the size of the optimal subset—, and in the case of the OptbyAdmRMSE tool, an admissible RMSE must be provided. The StatsOfDailyOptSubsets tool computes the statistics of the daily optimal subsets of the period using the results of any one of the optimization tools. The StatsOfOptSubsetsAllPeriod tool, the second statistical tool, uses previous results of the optimization tools and the first statistical tool to identify, finally, the optimal subset of stations.

 

The toolset for stations in lakes was developed in VBA and is implemented in an ArcMap document. The tools are invoked from a toolbar. The SFWMD’s DBHydro Web Page was mainly the source of the data. Later the data was loaded into a geodatabase. The geodatabase has several feature classes; the most important is a feature class of points called StageStations whose attributes are those of the water elevation stations of the study area. A time series table was also created using the ArcHydro format; it has water elevation data from October 1, 2001 to September 30, 2004. The application of the tools produces results that are stored in several text files.

 

The applicability of the Network optimization in lakes toolset was tested in the optimization of the stations in Lake Kissimmee. Lake Kissimmee is located in the northern part of the Kissimmee River Basin (Figure 6).

 

Figure 6. Kissimmee River Basin. Lake Kissimmee highlighted

 

Located inside Lake Kissimmee are five water stage-measuring stations: LKISS9, LKISS7, LKISS5B, S65_H, and S65NEW_H (Figure 7). The suffix _H in the name of a station, like S65_H or S65NEW_H, denotes that the station is measuring the water stage at the headwater side of a control structure. The suffix is _T denotes the tailwater side of a control structure.

 

Figure 7. Lake Kissimmee showing water stage stations

 

A stable set of stations –a set in which the number of stations and their working conditions remain unchanged–, may be selected to analyze the data for a period of one or two years in order to draw meaningful conclusions. A stable set of four stations (LKISS9, LKISS7, LKISS5B, and S65_H) was selected (Figure 8) for the period 10/1/2001 to 9/30/2003.

Figure 8. Selected stations in Lake Kissimmee

(LKISS9, LKISS7, LKISS5B, and S65_H)

 

Applying the optimization by range (OptByRange) tool and the statistics of daily optimal subsets (StatsOfDailyOptSubsets) tool with an admissible RMSE of 0.1 ft to this set of stations, the results shown in Figure 9 were found.

 

Figure 9. Summary File with the Daily Optimal Subsets of Lake Kissimmee

 

The selection of an optimal subset of a series of daily optimal subsets cannot be done only with the StatsOfDailyOptSubsets tool because ordinarily there is not a subset that is present in the all the days of the period or even in most of the days of the period. For example, in Figure 9, the most frequent subset is present in 228 out of the 730 days; the second most frequent is present in 192 out of the 730 days, and so on. Nevertheless, what happen in the days they are not optimal?

 

An additional statistics tools must be applied to discriminate among daily optimal subsets to be able to choose only one subset that can be called the (daily) optimal subset for the period. The main idea about this selection is to study how the RMSE behaves in all the daily optimal subsets during the entire period of analysis, not only in the days they are optimal. Therefore, the daily optimal subset that reports, for the entire period, the greatest relative frequency of a RMSE smaller or equal than the admissible RMSE previously chosen should be considered the optimal subset of stations. A check can be done comparing the relative frequency with a theoretical probability computed with a suitable distribution, such as the lognormal distribution and the parameters of the corresponding RMSE population.  The Statistics of Optimal Subsets for the Entire Period tool (StatsOfOptSubsetsAllPeriod tool carries on all this process and in the case of Lake Kissimmee’s stations finds that the optimal subset of stations is LKISS7, LKISS9 and S65_H. In Figure 10, the optimal subset of the stations (LKISS7, LKISS9 and S65_H) of Lake Kissimmee for the period of analysis (10/1/2001 to 9/30/2003) is shown.

 

Figure 10. The optimal subset of stations of Lake Kissimmee is LKISS7, LKISS9 and S65_H.

 

The application of the toolset for optimizing stations in lakes was satisfactory, the optimal subset of stations was identified and a new tool or a modification of the current tools was not needed.

 

Network Optimization in Streams

As was mentioned before, the goal of the optimization problem is to select a subset of r stations from a set of n water level monitoring stations. This set of stations could be in a lake or along a stream. A set of stations along a stream are, in a general case, not well suited for the application of a mathematical interpolation method but to a “hydraulic interpolation method”, that is to the computation of a hydraulic profile. In the optimization problem of the stations in a stream, where the water elevation has important changes from point to point, the interpolation of the interpolated values is computed using HEC-RAS, the River Analysis System Software developed by the U.S. Corps of Engineers’ Hydrologic Engineering Center. Then, the optimal subset of r stations taken from a set of n stations for a given measurement dataset is found using a sequential optimization method.

 

The method used to solve the problem of optimization of the stations in streams involves the following main steps.

 

Choose stations and period of analysis. Select the set of n stations along a stream and the period of analysis. These stations and period of analysis should have the same conditions we have mentioned before for the case of stations in lakes.

 

Choose admissible error. The second step requires choosing an admissible error. Here is important to consider that the square of the flow’s error is proportional to the error of the head at the hydraulic structure. The study of the RMSE of the base subset of stations might be complemented with the study of the errors in individual stations.

 

Model the system in HEC-RAS. In the HEC-RAS model, all the necessary internal known water elevations must be set. In subcritical flow, these elevations are at the headwater side of all the hydraulic structures inside the studied stream. These stations constitute the first subset of r stations taken from a set of n stations. The HEC-RAS modelation process must include the calibration of the computed profiles changing the Manning’s roughness and the roughness factors to obtain computed profiles as similar as possible to the observed profiles. The modelation process can be initially done with a limited number of daily datasets. Once all the data of the period is loaded in the appropriate files, a second round of calibration can be done until the average RMSE does not keep falling.

 

Optimize the station network along streams. This process begins with the initial subset of r stations. In each loop, a new station is added to the subset until an exit condition holds.  In Figure 11 is shown a chart flow that explains the process.

 

Figure 11. Flow Chart of the optimization of stations in streams process.

 

A valid date is a date where all the stations in the base subset have data. The maximum number of stations in the base subset, rmax, must be smaller than n, the number of stations. In this context, RMSEst is the root mean square error of the estimated water elevation in a given station along the period of analysis.  Similarly to the case of the optimization of stations in lakes, another exit condition can be established, instead of considering a range of sizes of subset, one can consider to end the process when an admissible RMSEst is reached.

 

Interpret Results and Find the Optimal Base Subset. According to the chosen admissible measure of error interpret the optimization results and decide if one of the identified base subsets could be the optimal base subset of stations.

 

The Network Optimization in Streams toolbar has two tools (Figure 12).

 

Figure 12. Network Optimization in Streams Toolbar.

 

The NetOptStreams tool (Network Optimization in Streams) implements the sequential optimization process explained before. It computes profiles using HEC-RAS, RMSEs by date, and errors by station, for the range of sizes in the base subset of stations. The user can choose interactively the name of the HEC-RAS project in which the streams have been previously modeled, and the prefix of the several files to be used to store results. In addition, he/she can set the initial base subset of stations and the maximum size of the base subset. Finally, the user can launch the optimization process. Once the optimization process is finished, the user can use the NetOptStreamsGr tool. This tool plots three types of graphs: errors by station, RMSE for all Stations, and RMSEst by size of base subset.

 

The first tool for stations in streams was developed in VBA while the second tool was developed using VB. Both tools are implemented in an ArcMap document. The tools are invoked from a toolbar. The flow and water elevations that come within the arcMap document were provided by the SFWMD or downloaded from DBHydro. Later the data was loaded into two dbf tables (RASTABLE.dbf and CrossSections.dbf) ready to be queried when the NetOptStreams tool is activated. The RASTABLE.dbf file provides the tool with the data to build the HEC-RAS flow file needed to compute profiles. While the CrossSecctions.dbf file stores the measured and computed water elevations that are used to compute errors in each cycle of optimization.  The tables have data from October 1, 2001 to December 30, 2003. The application of this tool produces results that are stored in several text files. From the produced text files, the NetOptStreamsGr tool creates the mentioned graphs of results.

 

The study case used to develop and test the tool for the optimization of stations in streams is the C-38 Canal–Kissimmee River–C-38 Canal. This stream is the main stream of the Pool AE area, from station S65_T to station S65E_H. The Pool AE area is located in the central region of the Kissimmee River Basin (Figure 13). The stream has three reaches. The first reach and the third reaches are channelized while the central reach is naturalized. The first reach has a length of 22 miles; this reach ends 1.5 miles downstream of structure WEIR_1. The second reach is 11 miles long, it ends 1 mile downstream of station PC33. The final reach has a length of 18.5 miles. The period of data that was chosen is from 10/01/2001 to 12/31/2003.

 

Figure 13. Water Stage Stations along the main stream of Pool AE

 

One initial study was done, it followed the next steps: creation of geometry file, calibration with a limited number of datasets, calibration with data of the entire period and estimation of water elevations in selected stations.

 

The creation of the geometry file is needed because HEC-RAS uses this file in the computation of profiles. Figure 14 shows the general aspect of the stream as it is seen inside a HEC-RAS interface. A typical section of the naturalized reach is also shown in Figure 15.

 

Figure 14. Pool AE main stream represented inside HEC-RAS.

 

Figure 15. Pool AE’s mainstream represented inside HEC-RAS.

 

Table 1 shows the 20 stations with water elevations data and their distance from the downstream end of C-38 at Lake Okeechobee.

 

Table 1. Water Elevation Measuring Stations and their distance along the stream. The lowest distance corresponds to the downstream end of the stream.

 

Station

Section

(Distance (ft))

1

S65_T

320105.8

2

KRFNS

314076.8

3

S65A_H

263639.2

4

S65A_T

263634.2

5

WEIR3_H

232169.0

6

WEIR3_T

232164.0

7

WEIR2_H

225189.9

8

WEIR2_T

225184.9

9

WEIR1_H

211725.5

10

WEIR1_T

211720.5

11

KRDRS

188833.2

12

KRBNS

171777.7

13

PC33

149107.5

14

PC11R

139678.6

15

S65C_H

134404.5

16

S65C_T

134399.5

17

C38BAS

110014.5

18

S65D_H

87366.08

19

S65D_T

87361.08

20

S65E_H

47964.26

 

A manual iterative process of calibration was needed to have good estimates of Manning’s n and roughness factors for different values of discharge along reaches. Calibration consists in to change Manning’s and/or roughness factors to obtain computed profiles as similar as observed profiles for given datasets. The reach between stations WEIR1_H and S65C_H was the most difficult to calibrate. Eight Datasets of flows and water stages were used in the initial calibration process. One dataset was generated with the average condition for the entire period. Seven more datasets were chosen to reflect the variation of discharge in the different reaches. The resulting profile for the average conditions is shown in Figure 16.

 

Figure 16. Profile for Average Conditions.

 

The calibration for the average conditions yield a Manning’s n for the channel’s sections equal to 0.015 and the corresponding n for the naturalized sections is 0.065. The input data for HEC-RAS for the average conditions is summarized in Table 2.

 

Table 2. Average Conditions’ Input data for HEC-RAS and computed water elevations at stations of interest.

 

Flows from stated section to the next downstream section

where a change of flow occurs.

Section

Flow (cfs)

320105.8

1676.85

291828.1

 1728.86

247511.5

  553.33

228999.1

  163.58

219317.8

   300.8

210599.5

 2265.61

110374.8

 2236.63

67498.15

 2438.91

 

Water elevations at internal locations and downstream boundary condition.

Section

Water elevation (ft)

263639.2

46.29

232169.0

39.72

225189.9

39.64

211725.5

39.39

134404.5

34.92

87366.08

26.63

47964.26

21.09

 

Water elevation observed and computed values.

Station

Section

Observed

Water Elev. (ft)

Computed

Water Elev. (ft)

S65_T

320105.8

46.36

46.31

KRFNS

314076.8

46.38

46.31

S65A_H

263639.2

46.29

46.29

S65A_T

263634.2

39.89

39.72

WEIR3_H

232169.0

39.72

39.72

WEIR3_T

232164.0

39.63

39.64

WEIR2_H

225189.9

39.64

39.64

WEIR2_T

225184.9

39.62

39.39

WEIR1_H

211725.5

39.39

39.39

WEIR1_T

211720.5

39.31

38.95

KRDRS

188833.2

37.96

38.26

KRBNS

171777.7

37.08

37.24

PC33

149107.5

35.57

35.34

PC11R

139678.6

34.98

34.92

S65C_H

 134404.5

34.92

34.92

Station

134399.5

26.86

26.64

S65_T

110014.5

26.91

26.63

KRFNS

87366.08

26.63

26.63

S65A_H

87361.08

20.6

21.09

S65A_T

47964.26

21.09

21.09

 

The root mean square error for this data is 0.24 ft. As another example, the profile for Oct 1, 2001 is shown in Figure 17.

Figure 17. Profile for Oct 1, 2001. Stations along the stream are indicated.

 

Once this initial calibration was done, the entire period could be used to check the variation of roughness of the three reaches of the stream. From all the datasets of 10/1/2001 to 12/31/2003 were chosen only the datasets that have complete data for all the stations at the head side of a control structure or a weir, and no negative flows in the hydraulic structures. These two constraints left 428 out of 822 datasets.

 

The computation of profiles in an automatic way was accomplished with a tool implemented in ArcMap. When the tool is invoked, it prompts for a HEC-RAS project file, previously prepared. This project file refers to the geometric file and the flow and water elevations input of the studied stream. Once the project file is accepted the tool reads from a table (RASTable.dbf) the flow and water elevation data that defines the problem of the computation of profiles date by date of the period supplied, computing the profile sequentially for each day. Results are stored in other table (CrossSections.dbf). From the second table the root mean square error for each date was computed using another tool. In addition, other tool computed the root mean square error of each station for the entire period.

 

After each run of the HEC-RAS automaton tool, the errors in the stations were computed. The dates with the biggest errors were identified and used to change the roughness factors to recalibrate the computed profiles. Six runs of calibration were needed before having a negligible reduction of errors; that marked the end of the calibration process.

 

When the calibration process was ended, the estimation of water elevations in selected stations for different subsets of stations could be done. The results of the last calibration run are valid for the estimation of water elevations for the case of the subset constituted by the seven stations at the head side of the hydraulic structures of Pool AE. To estimate how the errors evolve if we use more stations in the subset of base stations we chose two more subsets. The results of the three subsets are show in table 3. Figure 18 an Excel graph that shows how the root mean square error diminishes as more base stations are considered.

 

Table 3. RMSE of all period (Oct 1, 01-Dec 30, 03).

 

RMSE of each station for all period. Shaded Stations are the initial base stations.

 

 

 

 

 

Subsets of

 

no.

Station Name

Section

no.days

7 stations

 RMSE(ft)

8 stations

RMSE (ft)

9 stations

 RMSE (ft)

1

S65_T

320105.8

428

0.07

0.07

0.07

2

KRFNS

314076.8

419

0.08

0.08

0.08

3

S65A_H

263639.2

428

 

 

 

4

S65A_T

263634.2

428

0.21

0.21

0.21

5

WEIR3_H

232169

428

 

 

 

6

WEIR3_T

232164

428

0.05

0.05

0.05

7

WEIR2_H

225189.9

428

 

 

 

8

WEIR2_T

225184.9

428

0.22

0.22

0.22

9

WEIR1_H

211725.5

428

 

 

 

10

WEIR1_T

211720.5

428

1.24

 

 

11

KRDRS

188833.2

419

0.90

0.90

0.43

12

KRBNS

171777.7

428

0.65

0.65

 

13

PC33

149107.5

353

0.71

0.71

0.71

14

PC11R

139678.6

428

0.14

0.14

0.14

15

S65C_H

134404.5

428