Stage Gage Network Optimization Tools based on Tabu Search

Sergio I. Martínez and David R. Maidment, CRWR


Table of Contents

·         Introduction

·         Study Area

·         Network Optimization in Lakes

·         Network Optimization in Streams

·         Conclusions and Future Work

·         Supporting Material

·        Primary Contact

 


Introduction

 

The South Florida Water Management District (SFWMD), one of the five water management districts of the state, is located on the southern tip of the territory of Florida. The SFWMD provides flood protection and water supply to almost six million inhabitants. The district also manages and restores the ecosystems of the region.  The SFWMD operates a surface-water distribution system, consisting of approximately 1800 miles of canals, over 450 managed structures, and over 1000 flow and stage gages (SFWMD, 2004). The rapid expansion of an already large water management system is pressing the SFWMD to optimize the current and future flow and stage gage network. The CRWR participated in 2004-2005 in a pilot project that had as its objective the creation of a methodology to optimize the stage gage network. As a result of that project two optimization methodologies and accompanying toolsets were developed, one for networks in lakes and the other for networks in lakes. The underlying optimization methods were a genetic algorithm for the case of lakes and a forward selection algorithm for streams. During this year (2006), new versions of the two methodologies, now based in tabu search, were developed in the CRWR. In this page, the main emphasis is in the presentation of the two optimization toolsets developed to be used with the new methodologies.

 


Study Area

In order to develop the tools to perform the stage network optimization over the entire SFWMD, a smaller section of the SFWMD was selected.  The area selected is the Kissimmee River Basin; it includes three of Florida's major lakes, Lake Okeechobee, Lake Istokpoga and Lake Kissimmee. The Kissimmee River Basin is located roughly between latitude 27° 00‘ N and 28° 30’ N and longitude 80° 37’ 30” W and 81° 37’ 30” W.

 

Figure 1. Kissimmee River Basin Study Area. Sub-basins, lakes and flow structures.

 


Network Optimization in Lakes

The goal of the optimization problem is to select a subset of r stations from a set of n water level monitoring stations that provides the biggest advantage accordingly to a specified objective function. This set of stations could be in a lake or along a stream. If the set is in a lake, a method of mathematical interpolation can be used to find the required interpolated values. The root mean square error of interpolated values of water surface elevation in a lake was selected as the objective function of its network optimization. Assume the existence of a set of n water level measuring stations from which is needed to choose a subset of r stations. The root mean square error RMSEt is defined, for a given day t, as

Where It,i is the interpolated value on day t and Mt,i is the measured value on day t, both for station i , both for station i. The RMSEt can be used to compute the average RMSE for the entire period of analysis

Where, the new variable is T is the number of valid days of the period of analysis. A day is valid day if the subset of r stations has measurements in all of them. The subset of r stations that minimize the value of RMSE is the optimal subset. If all the stations are taken into account in the determination of the optimal subset, the number of combinations (or subsets) to be tested will increase according with

This process makes the optimization a time and resources consuming process, requiring the application of an optimization method to find the optimal subset. In the optimization problem of the stations in a lake, where the water elevation varies slightly and continuously from point to point, the interpolation of the interpolated values of the objective function is computed using the following equation:

.

This equation is the base of the inverse square distance interpolation method. The variable di,j represents the distance between station i and station j. All stations j are in the subset of r stations (Figure 2). Then, the optimal subset of r stations taken from a set of n stations for a given measurement dataset is found using the tabu search optimization method.

 

Figure 2. Distances between stations

 

Once all the data has been prepared, the method used to solve the problem of the spatial optimization of the stations in a lake involves three main steps (Figure 3). The user does the first two steps before using the created tools.

 

Figure 3. Network Optimization Process Steps

 

Choose stations and period of analysis. The first step requires the selection of a set of stations and the period of analysis. It is important to choose stations that share the same period of record, and whose operating conditions remain the same during the period of analysis. To have good results it is recommended to choose at least two years of data.

 

Choose admissible error. The second step requires choosing an admissible average RMSE. This step is previously performed by the analyst to eventually choose the corresponding optimal subset from the optimal subsets found in the next step.

 

Find optimal subset. The third step identifies the optimal subsets of stations of all possible sizes and then the optimal subset that complies with the admissible average RMSE previously adopted is chosen by the analyst. In the most general case, tabu search, the implemented optimization method, finds the optimal subset of one station. Then, using previous results proceeds to find the optimal subset of two stations, and so on, until the optimal subset of n – 1 stations is found. Tabu search is particularly suited for problems with a finite number of solutions, such as this network optimization problem. The main characteristics of tabu search is the prohibition of visiting previously considered solutions, at least for several iterations, and to guide the search of solutions along the best direction of the current neighborhood.

 

The TS Network Optimization in Lakes toolbar was developed to address the problem of finding the optimal subsets of stations in lakes. This toolbar has two tools (Figure 4).

 

Figure 4. Network Optimization in Lakes Toolbar.

 

The first tool, TimeSeries, creates a time series file of the stages measured in the set of selected stations for the period of interest and a file with several parameters that must be used in the second tool. Once the data files created by the first tool, the results file and several parameters are chosen, the optimization is performed with the second tool, TSNetOptLakes

 

The toolset for stations in lakes was developed in VBA and is implemented in an ArcMap document. The tools are invoked from a toolbar. The SFWMD’s DBHydro Web Page was mainly the source of the data. Later the data was loaded into a geodatabase. The geodatabase has several feature classes; the most important is a feature class of points called StageStations whose attributes are those of the water elevation stations of the study area. A time series table was also created using the ArcHydro format; it has water elevation data from October 1, 2001 to September 30, 2004. The application of the tools produces results that are stored in one text file.

 

An example to illustrate the use of the TS Network Optimization in Lakes toolset can be formulated by the optimization of the stations of Lake Okeechobee. Lake Okeechobee is located in the southern part of the Kissimmee River Basin (Figure 5).

Figure 5. Kissimmee River Basin. Lake Okeechobee is highlighted

 

Located inside Lake Okeechobee are fourteen water stage-measuring stations: CULV10A_H, CULV5A_H, L_OKEE, L_OKEE.M_G, L001, L005, L006, LZ40, S127_T, S129_T, S133_T, S135_T, S191_T and S3_T (Figure 6). The suffix _H in the name of a station, like S65_H or S65NEW_H, denotes that the station is measuring the water stage at the headwater side of a control structure. The suffix is _T denotes the tailwater side of a control structure.

 

 

 

 

Figure 6. Lake Kissimmee showing its water stage stations

 

These 14 stations form a stable set of stations –a set in which the number of stations and their working conditions remain unchanged–, and are selected to analyze the data for a period of two years in order to draw meaningful conclusions.  The period selected is from 10/1/2001 to 9/30/2003.

 

Opening ArcMap and loading the map document TSNetOptAvgRMSE.mxd (See Supporting Material), appropriately saved in folder C:\NetOpt\TSNetOptAvgRMSE as the geodatabase TSNetOpt.mdb we obtained the map shown in Figure 7. In the map the fourteen stations of Lake Okeechobee are already selected.

 

Figure 7. TSNetOpAvgRMSE.mxd showing the TSNetOptLakes toolbar and the 14 stations of Lake Okeechobee selected.

 

Clicking in the TimeSeries button of the TSNetOptLakes toolbar the form in Figure 8 appears. The first thing is to check that 14 stations are selected. Then, choose the period of interest (10/1/2001 – 9/30/2003) and the prefix of the files in which the data should be saved, in this example is Okee01-03_.txt (Navigate in the save dialog to the appropriate folder that can be the same folder were the map is stored). This prefix is used to create the files Okee01-03_Par.txt for parameters and Okee01-03_TS.txt for the time series stage data.

 

Figure 8. Time Series Generation Form showing the inputs for the example.

 

Once all the required data are entered, one has to click on the Execute bottom. The advance of the creation of the time series file is shown in an advance bar in the left side of the task bar. When the time series file is created, the message box of Figure 9 appears.

 

Figure 9. Message box of the end of the generation of time series.

 

Click on the OK button and the Quit button of the Time Series Generation form. To begin the optimization of the network of Lake Okeechobee, click on the TSNetOptLakes button of the TSNetOptLakes toolbar. The form in Figure 10 appears.

 

Figure 10. Network Optimization using Tabu Search form. The typed data corresponds to the example.

 

Click on the Prefix for Data Files button. Type the prefix of the data files used in the first tool, in this example is Okee01-03_.txt. In the text boxes of dates appear the initial and final date of the period (10/1/2001 and 9/30/2003). Click on the Results file button. Type a name for the results file; in this case it can be Okee01-03_Res.txt. The minimum size of subset is initialized to one and the maximum to n – 1 = 13. Let both values alone.  The suggested values of the maximum number of iterations frame give an idea of how many iterations may be needed to solve a problem. They may need to be adjusted by trial and error. For the example, override the suggested values of the maximum number of iterations, typing those in Figure 10.  The detailed results check mark must be set if detailed results are needed; in this example, left it unset. To launch the optimization, click on the Execute button. The process stops after finding the optimal subset of each considered size. The advance is numerically shown in the form, the results of the subset in a message box and the map shows the stations of the optimal subset.  Figure 11 presents the results for the optimal subset of size 5. Click the OK of the subset’s results message box to continue the process.

 

Figure 11. Network Optimization using Tabu Search form. The typed data corresponds to the example.

 

After several minutes the process ends with a “Tool Successfully Executed!” message. Click on this message box’s OK button. To dismiss the second tool, click on the Quit button. ArcMap can be closed. The results of the optimization process are saved in the file Okee01-03_Res.txt. In Windows Explorer, double click on Okee01-03_Res.txt to see the results as is shown in Figure 12.

 

Figure 12. Results of the optimal subset with five stations.

 

The results of all considered sizes are sequentially saved from the minimum size to the maximum size. The results comprise the parameters shown in Figure 12. In the figure, one can see that the best objective function is equal to the average RMSE. The number of iterations and the number of function evaluations may differ from run to run because the subset which starts the optimization process is randomly generated. Finally, Figure 13 shows the average RMSE for the optimal subsets of all sizes.

 

Figure 13. Average RMSE of the optimal subsets of Lake Okeechobee.

 

 


Network Optimization in Streams

As was mentioned before, the goal of the optimization problem is to select a subset of r stations from a set of n water level monitoring stations. This set of stations could be in a lake or along a stream. A set of stations along a stream are, in a general case, not well suited for the application of a mathematical interpolation method but to a “hydraulic interpolation method”, that is to the computation of a hydraulic profile. In the optimization problem of the stations in a stream, where the water elevation has important changes from point to point, the interpolation of the interpolated values is computed using HEC-RAS, the River Analysis System Software developed by the U.S. Corps of Engineers’ Hydrologic Engineering Center. The optimal subset of r stations taken from a set of n stations for a given measurement dataset is also found using tabu search.

 

The method used to solve the problem of optimization of the stations in streams involves the following main steps.

 

Choose stations and period of analysis. Select the set of n stations along a stream and the period of analysis. These stations and period of analysis should have the same conditions we have mentioned before for the case of stations in lakes.

 

Choose admissible error. The second step requires choosing an admissible average RMSE. Here is important to consider that the square of the flow’s error is proportional to the error of the head at the hydraulic structure. The study of the RMSE of the base subset of stations might be complemented with the study of the errors in individual stations.

 

Model the system in HEC-RAS. In the HEC-RAS model, all the necessary internal known water elevations must be set. In subcritical flow, these elevations are at the headwater side of all the hydraulic structures inside the studied stream. These stations constitute the first subset of r = rb stations taken from a set of n stations. The HEC-RAS modelation process must include the calibration of the computed profiles changing the Manning’s roughness and the roughness factors to obtain computed profiles as similar as possible to the observed profiles. The modelation process can be initially done with a limited number of daily datasets. Once all the data of the period is loaded in the appropriate files, a second round of calibration can be done until the average RMSE does not keep falling.

 

Optimize the station network along streams. This process begins with the initial subset of rb stations. In general, as in the case of lakes, the optimal subset of rb + 1 stations is identified. Then, the optimal subset of rb  + 2 stations is identified. And so on, until r = n – 1.

 

Interpret Results and Adopt the Optimal Base Subset. According to the chosen admissible measure of error interpret the optimization results and decide if one of the identified base subsets could be the optimal base subset of stations.

 

The TS Network Optimization in Streams toolbar has two tools (Figure 14).

 

Figure 14. Network Optimization in Streams Toolbar.

 

The first tool, the Period tool may create a partial table RASTable.dbf taken from table RASTableAll.dbf, a global table, of the data needed by HEC-RAS to allow the optimization of the network with data of a given partial period. That partial period is interactively chosen by the user. The second tool, the TSNetOptStreams tool, computes profiles using HEC-RAS, from the profiles computes the average RMSE and eventually it finds the optimal subsets of the sizes required. The user can choose interactively the name of the HEC-RAS project in which the streams have been previously modeled, and the prefix of the files to be used to store data and results. In addition, he/she can set several parameters needed in the optimization process. Finally, the user can launch the optimization process.

 

Both tools for stations in streams were developed in VBA and are implemented in an ArcMap document. The tools are invoked from a toolbar. The flow and water elevations that come within the arcMap document were provided by the SFWMD or downloaded from DBHydro. Later the data was loaded into two dbf tables (RASTableAll and CrossSections.dbf) ready to be queried when the Period and TSNetOptStreams tools are activated. The table RASTable.dbf, produced from RASTableAll.dbf by the first tool, provides the second tool with the data to build the HEC-RAS flow file needed to compute profiles. While the CrossSecctions.dbf file stores the measured and computed water elevations that are used to compute errors for each objective function evaluation.  The tables have data from October 1, 2001 to December 30, 2003. The application of this tool produces results that are stored in several text files.  Additionally the TSRMSE.dbf table is created to store RMSE values for specific combinations of subsets and dates. The RMSE values stored in TSRMSE.dbf may be used in future computations without the need of computing them again.

 

The study case used to develop and test the tool for the optimization of stations in streams is the C-38 Canal–Kissimmee River–C-38 Canal. This stream is the main stream of the Pool AE area, from station S65_T to station S65E_H. The Pool AE area is located in the central region of the Kissimmee River Basin (Figure 15). The stream has three reaches. The first reach and the third reaches are channelized while the central reach is naturalized. The first reach has a length of 22 miles; this reach ends 1.5 miles downstream of structure WEIR_1. The second reach is 11 miles long, it ends 1 mile downstream of station PC33. The final reach has a length of 18.5 miles. The period of data that was chosen is from 10/01/2001 to 12/31/2003.

 

Figure 15. Water Stage Stations along the main stream of Pool AE

 

One initial study was done, it followed the next steps: creation of geometry file, calibration with a limited number of datasets, calibration with data of the entire period and estimation of water elevations in selected stations.

 

The creation of the geometry file is needed because HEC-RAS uses this file in the computation of profiles. Figure 16 shows the general aspect of the stream as it is seen inside a HEC-RAS interface. A typical section of the naturalized reach is also shown in Figure 17.

 

Figure 16. Pool AE main stream represented inside HEC-RAS.

 

Figure 17. Pool AE’s mainstream represented inside HEC-RAS.

 

Table 1 shows the 20 stations with water elevations data and their distance from the downstream end of C-38 at Lake Okeechobee.

 

Table 1. Water Elevation Measuring Stations and their distance along the stream. The lowest distance corresponds to the downstream end of the stream.

 

Station

Section

(Distance (ft))

1

S65_T

320105.8

2

KRFNS

314076.8

3

S65A_H

263639.2

4

S65A_T

263634.2

5

WEIR3_H

232169.0

6

WEIR3_T

232164.0

7

WEIR2_H

225189.9

8

WEIR2_T

225184.9

9

WEIR1_H

211725.5

10

WEIR1_T

211720.5

11

KRDRS

188833.2

12

KRBNS

171777.7

13

PC33

149107.5

14

PC11R

139678.6

15

S65C_H

134404.5

16

S65C_T

134399.5

17

C38BAS

110014.5

18

S65D_H

87366.08

19

S65D_T

87361.08

20

S65E_H

47964.26

 

A manual iterative process of calibration was needed to have good estimates of Manning’s n and roughness factors for different values of discharge along reaches. Calibration consists in to change Manning’s and/or roughness factors to obtain computed profiles as similar as observed profiles for given datasets. The reach between stations WEIR1_H and S65C_H was the most difficult to calibrate. Eight Datasets of flows and water stages were used in the initial calibration process. One dataset was generated with the average condition for the entire period. Seven more datasets were chosen to reflect the variation of discharge in the different reaches. The resulting profile for the average conditions is shown in Figure 18.

 

Figure 18. Profile for Average Conditions.

 

The calibration for the average conditions yield a Manning’s n for the channel’s sections equal to 0.015 and the corresponding n for the naturalized sections is 0.065. The input data for HEC-RAS for the average conditions is summarized in Table 2.

 

Table 2. Average Conditions’ Input data for HEC-RAS and computed water elevations at stations of interest.

 

Flows from stated section to the next downstream section

where a change of flow occurs.

Section

Flow (cfs)

320105.8

1676.85

291828.1

 1728.86

247511.5

  553.33

228999.1

  163.58

219317.8

   300.8

210599.5

 2265.61

110374.8

 2236.63

67498.15

 2438.91

 

Water elevations at internal locations and downstream boundary condition.

Section

Water elevation (ft)

263639.2

46.29

232169.0

39.72

225189.9

39.64

211725.5

39.39

134404.5

34.92

87366.08

26.63

47964.26

21.09

 

Water elevation observed and computed values.

Station

Section

Observed

Water Elev. (ft)

Computed

Water Elev. (ft)

S65_T

320105.8

46.36

46.31

KRFNS

314076.8

46.38

46.31

S65A_H

263639.2

46.29

46.29

S65A_T

263634.2

39.89

39.72

WEIR3_H

232169.0

39.72

39.72

WEIR3_T

232164.0

39.63

39.64

WEIR2_H

225189.9

39.64

39.64

WEIR2_T

225184.9

39.62

39.39

WEIR1_H

211725.5

39.39

39.39

WEIR1_T

211720.5

39.31

38.95

KRDRS

188833.2

37.96

38.26

KRBNS

171777.7

37.08

37.24

PC33

149107.5

35.57

35.34

PC11R

139678.6

34.98

34.92

S65C_H

 134404.5

34.92

34.92

Station

134399.5

26.86

26.64

S65_T

110014.5

26.91

26.63

KRFNS

87366.08

26.63

26.63

S65A_H

87361.08

20.6

21.09

S65A_T

47964.26

21.09

21.09

 

The root mean square error for this data is 0.24 ft. As another example, the profile for Oct 1, 2001 is shown in Figure 19.

Figure 19. Profile for Oct 1, 2001. Stations along the stream are indicated.

 

Once this initial calibration was performed, the entire period could be used to check the variation of roughness of the three reaches of the stream. From all the datasets of 10/1/2001 to 12/31/2003 were chosen only the datasets that have complete data for all the stations at the head side of a control structure or a weir, and no negative flows in the hydraulic structures. These two constraints left 428 out of 822 datasets.

 

The computation of profiles and errors in an automatic way was accomplished with two prototype tools implemented in ArcMap, which are the base of the optimization tool. The dates with the biggest errors were identified and used to change the roughness factors to recalibrate the computed profiles. Six runs of calibration were needed before having a negligible reduction of errors; that marked the end of the calibration process. The results of the last calibration run are valid for the estimation of water elevations for the case of the subset constituted by the seven stations at the head side of the hydraulic structures of Pool AE.

 

Once the calibration is completed, the optimization process is feasible. Accordingly to the methodology previously explained, the remaining steps to complete the optimization of the network along the main stream of Pool AE is the identification of the optimal subsets of all sizes, the interpretation of results and the adoption of the optimal subset. The initial base subset is shown in Table 4.

 

Table 4. Initial Base Subset (size, rb = 7).

Station

 

Section

S65A_H

263639.2

WEIR3_H

232169

WEIR2_H

225189.9

WEIR1_H

211725.5

S65C_H

134404.5

S65D_H

87366.08

S65E_H

47964.26

 

All the stations outside the base subset could be considered to go inside it because the optimization tool can identify the dates when all base stations have data (valid dates) and proceed using only data of that dates.

 

Assume that we want, for example, to find the optimal subsets defined with the daily data recorded from 6/16/2002 to 7/15/2002. First, the installation of the HEC-RAS software is required. Second, save all the files contained in the appropriate supporting material in folder C:\NetOpt\TSHRNetOptAvgRMSE. The main files are the map document TSHRNetOptAvgRMSE.mxd, the tables RASTableAll.dbf, RASTable.dbf, CrossSections.dbf and TSRMSE.dbf, the files PoolAE.prj, PooAE.g01, PoolAE f01, and PoolAE.p01, used by HEC-RAS to perform the computation of water surface profiles (in the PoolAE’s stream), and the geodatabase HRNetOpt.mdb. Once everything is ready, run ArcMap. Load the map document TSHRNetOptAvgRMSE.mxd (See Figure 20).

 

Figure 20. Arcmap document TSHRNetOptAvgRMSE.mxd with the TSNetOptStreams toolbar ready to be run.

 

Click on the Period button of the TSNetOptStreams tool. The message box shown in Figure 21 appears.

 

Figure 21.Message box shows the current period in the table RASTable.dbf.

 

Click on the OK button and the small dialog box of Figure 22 appears. Change the date values to those shown in the figure.

 

Figure 22. Period of Analysis dialog shown the period of interest.

 

Click on the Execute button and a message box shown the number of days selected appears. Click on the OK button of that message box and click on the Quit button of the Period of Analysis dialog. The table RASTable.dbf with the data of the period of 30 days between 6/16/2002 and 7/15/2002 has been created and is ready to be used in the optimization process.

 

The user interface of the TSNetOptStreams tool is shown in Figure 23. It is invoked clicking on the TSNetOptStreams button of the TSNetOpTStreams Toolbar. Select the files shown in the figure.

 

Figure 23.TS Network Optimization in Streams User Interface.

 

Click on the Set General Parameters button. A message box that shows the period of interest appears. After dismissing that box, the user interface of Figure 24 appears.

 

 

Figure 24. Initial subset and parameters user interface.

 

The file ExamplePar.txt was previously prepared and stores the parameters in the figure. The maximum number of iterations can be changed, but in this example it is good to left alone the values already chosen. Setting the Detailed Results check mark produces an extensive file of results. The initial subset is the subset defined with the stations in Table 4. To change the status of a given station, select the row of that station and then click on the Change Status of Selected Station button. A station with the subset status is in the initial subset, while a station with the observed status is not in the initial subset. The minimum and maximum sizes of subset depend on the number of stations in the base subset and on the total number of stations, minimum r >= rb + 1 and maximum r <= n – 1.  In the example, rb = 7 and n = 20. The initial size of subset can be any integer between the minimum and the maximum size. The optimal subsets to be identified are those from the initial size to the maximum size. If the average RMSE of the base subset is required one has to check mark that option.

 

Once all the parameters and options are selected, click on the Accept Changes button. The message box of Figure 25 appears.

 

Figure 25. Properties of the base subset message box.

 

The table TSRMSE.dbf was preloaded with the daily values of RMSE for the base subset and the period of the example, it means that there were no water profile computations for this subset. Dismiss this message box. Click on the Execute Optimization button. A snapshot of the computational activity is shown in Figure 26. The user interface shows the advance of the process and the Loading File window makes apparent the behind-the-scenes activity of the HEC-RAS software.

Figure 26. Early advance of the optimization process. The activity of HEC-RAS is apparent when the Loading File advance window is momentarily displayed.

 

After many hours (in our workstations it takes around 50 hours), the process of optimization ends. If for some reason the process is aborted and restarted again, values of RMSE, for given combinations of subset and date, previously computed are not computed again. These values are retrieved from the table TSRMSE.dbf and taken into account in the average RMSE computations. The optimization tool should be closed clicking on the proper buttons. Then, ArcMap may be closed. The results are stored in the text file Example.txt (Several files with results are created, their names self explain their content). The optimal subsets and the properties of the RMSE values associated should be equal to those of Figure 27, but the number of iterations and function evaluations may be different because the subset which begins the optimization process is randomly created.

Figure 27. Results of the example for the optimal subset of size equal to twelve.

 

Finally, Figure 28 shows a graph of the average RMSE values of the optimal subsets.

 

Figure 28. Results of the example for the all possible sizes.

 


 

Conclusions and Future Work