CUAHSI Hydrologic Information System Mission
March 2005
The Consortium of Universities for the Advancement of Hydrologic Science, Inc (CUAHSI) is an organization representing about 100 universities in the United States to develop infrastructure and services to support the advancement of hydrologic science and education. CUAHSI’s core operations are supported by a five-year grant from the Earth Sciences Division of the National Science Foundation. The CUAHSI Hydrologic Information System (HIS) project, also supported by the National Science Foundation, has been in operation since April 2004 and will run for a period of two years. This project is being conducted by a group of academic hydrologists collaborating with the San Diego Supercomputer Center as a technology partner, as shown in Figure 1. The HIS project is intended to produce a prototype Hydrologic Information System to perform the most critical functions needed to advance hydrologic science in US academic institutions, and to define the scope and extent of a more complete CUAHSI Hydrologic Information System that could be created with further investment by NSF after the end of this project. CUAHSI is anticipating that NSF will hold a competition in 2005 for a major investment in Hydrologic Observatories for which 24 candidate watershed regions have been proposed by groups representing CUAHSI member universities throughout the United States.

Figure 1. The CUAHSI Hydrologic Information System project team
The purpose of this memorandum is to define the CUAHSI HIS mission as it is seen by the HIS project team. This definition is informed by a review of the HIS project performed by the CUAHSI Executive Committee in November, 2004, and by a workshop on Cyberinfrastructure for Environmental Observatories held at NSF on December, 2004, in which cyberinfrastructure requirements for four similar current or proposed NSF observatory programs were discussed with a group of computer scientists. These programs included CUAHSI – Consortium for the Advancement of Hydrologic Science, CLEANER – Collaborative Large-Scale Engineering Analysis Network for Environmental Research, NEON – National Ecologic Observatory Network, and ORION – Ocean Research Interactive Observatory Networks.
The three basic steps in creating a hydrologic information system may be summarized as hydrologic observation, hydrologic representation, and hydrologic analysis. Hydrologic observation refers to the process of measuring hydrologic phenomena by using instrument sensors or by collecting samples and analyzing them in a laboratory, whose results are accumulated in a Hydrologic Observation Database. Hydrologic representation means the fusion of hydrologic observations with other current and historical information such as GIS data, remote sensing and weather and climate grids to form a more complete hydrologic database for a watershed, called here a Digital Watershed. Hydrologic analysis refers to the hydrologic process modeling, statistical analysis, visualization, and data mining and knowledge discovery, which utilize the digital watershed.
A Hydrologic Digital Library is a repository of digital files from all three components of the Hydrologic Information System, which describes the files with metadata to enable them to be stored permanently, and identified and retrieved through web-based searches and automated data acquisition systems. The four components of the prototype CUAHSI Hydrologic Information System are summarized in Figure 2.

Figure 2. Prototype CUAHSI Hydrologic Information System
The complete system created by these various means enables a hydrologist to create a virtual observatory of a particular watershed in which by viewing various images of the information the hydrologist is able to better understand how water cycles through the atmosphere, surface and subsurface of a watershed, and how transported constituents and biota interact with that water. Indeed, it is NSF’s intent that selected data from the other NSF environmental observing programs also be accessible to the hydrologist through this virtual observatory.
Hydrologic Observation
Hydrologic observation occurs by a systematic process that has two main paths, on-site and off-site measurement, as illustrated in Figure 3. On-site measurement means that the hydrologic phenomenon is measured by a sensor such as a pressure transducer for water level in a stream; the result is recorded at the site as a stage height, and is transmitted to a central processing location via satellite, wireless, hand-carrying a paper record, or other means; the resulting data is checked, edited, and quality-controlled at the central location, and the processed data is stored in a database, which permits various forms of data querying and retrieval. During this process, recorded information may be transformed into new products, such as the transformation of stage height into stream discharge using a rating curve. On-site measurement can also involve remote sensing devices, such as the use of radar signals to map out water vapor content of the atmosphere or wave distribution over an estuary.

Figure 3. Hydrologic measurement and the development of a hydrologic observations database.
Off-site measurement describes the process of taking samples at a site, transporting them to a laboratory where they are analyzed, and recording the results of the analysis in a database. This is how measurement of water quality, biological and sediment samples is carried out. Typically, a water quality laboratory has associated with it a Laboratory Information Management System (LIMS) for storing and retrieving the resulting data. An important challenge in off-site measurement is to ensure that the sample acquisition, transport and laboratory analysis occurs according to a proper protocol, and indeed even that the method by which the measurement is done is documented. This is particularly so for individual investigator projects on CUAHSI Observatories where documenting measurement methods is laborious and is not as much of a priority for the investigators as making the measurements themselves. There is thus a class of observation metadata that needs to be created and stored along with such measurements to document how the measurements were made.
Whether the hydrologic measurement is carried out off-site or on-site, the end result is that the resulting data are stored in a hydrologic observation database and can be retrieved in a consistent manner. For example, USGS observation data are obtained from the USGS National Water Information System.
Digital Watershed
CUAHSI hydrologic observations are the core of the information to be obtained from CUAHSI Observatories, but other information is needed to obtain a more complete picture of a hydrologic environment and the water which flows through it. This information includes hydrologic observations made in the same area by federal, state and local agencies; GIS coverages, such as terrain, watersheds, stream hydrography, soils, land cover, aquifers and geology; remote sensing imagery obtained from aircraft and satellites; and weather and climate grids, such as Nexrad and National Weather Service numerical weather prediction models.
These data are stored in quite different formats and data systems, as shown in Figure 4. Hydrologic observations are typically stored in a relational database, and are extracted as delimited ascii files; GIS coverages are stored in special georeferenced data formats, such as ESRI shape files and grids; NASA satellite remote sensing data are stored in a format called EOS-HDF; weather and climate model data are presented as collections of variables defined on a multidimensional space-time grid, often in the NetCDF format developed by Unidata. Unidata is NSF’s atmospheric science data center located in Boulder, Colorado, which supplies real-time weather information to US universities. The HIS team has established an informal collaboration with Unidata to facilitate access to their information.
Further complicating this situation is that the data are available on various web-sites which are individually complex to navigate, and which collectively are so formidable that one could almost say that it requires an expert in “websitology” to be able to acquire all the required files.
Data fusion is the task of bringing all these data together into a coherent framework in space and time. This requires, first of all, that there be a precise space-time reference frame on which the data from various sources can be registered so that they can be viewed in correct relation to one another in space and time. Then the information has to be converted to common formats or made readable in its native format by the viewing system so that the hydrologic observer can obtain a coherent picture of the information as graphs, maps or 3D images. The HIS team is using the Arc Hydro customization of the ArcGIS geographic information system to accomplish these goals. Arc Hydro includes a sufficiently extensive time series representation that it can serve to adequately synthesis hydrologic observations through time with GIS data through space. Remote sensing and weather and climate information are being layered over that by leaving the data in their native formats and linking them to ArcGIS for viewing and analysis purposes.

Figure 4. Fusion of hydrologic observations with other information to form a digital watershed
Data fusion is a complex task and the approach taken by the HIS team is one of several that might be employed. Another approach, for example, would be to use a statistical package such as SAS as a data fusion platform. The intent of the HIS team in building upon the existing ArcGIS and Arc Hydro framework is to benefit from the already extensive tools and data structures that this system affords so as to achieve a practical result in a reasonably short time and to achieve a deeper understanding of the underlying data framework and tools that are needed. It is likely, however, that significant further development will be needed after this project is complete to have a fully functional hydrologic data fusion system.
Hydrologic Analysis
The end result, once all this labor of observation, data acquisition and data fusion has been completed is to be able to do hydrologic analysis, as illustrated in Figure 5.

Figure 5. Hydrologic analysis system
There are a myriad of techniques, models, tools and approaches to hydrologic analysis, ranging from computation and graphing in Excel to very complex spatially distribution dynamic simulations. There is no need for HIS to be able to do all these things but rather to provide a data infrastructure to support them being done in whatever system the hydrologist desires.
In the Neuse prototype observatory study, CUAHSI has identified “fluxes, flow paths, residence times, and mass balances” as the key elements testing hydrologic hypotheses. It is thus important that HIS be able to support these analysis functions. A set of exploratory tools is being prepared to accomplish this task, focusing in particular on mass balances in phases of the hydrologic system, such as the atmosphere, watershed, channel system and aquifers, and on connections among them.
Of the three HIS components, this component will be the least developed during the CUAHSI HIS project, in part because it depends on the earlier completion of the first and second components.
What Comes First?
As with every grand vision, it is easy for the HIS team to promise the sun, the moon and the stars, but the practicalities of time and budget limit what can realistically be accomplished. During the NSF Cyberinfrastructure Workshop on Environmental Observatories, Jim Gray from Microsoft Corporation was asked to summarize the view of the computer scientists present as to how the various cyberinfrastructure teams should conduct their work. He recommended that the teams should spend the next year trying to answer the question “what’s unique and what’s similar” among the observatory missions, defining the requirements for a joint cyberinfrastructure, and finding commonalities among the observatories so that resources can be pooled. This one-year time frame for working out how CUAHSI’s work fits into the general NSF environmental observatory picture is quite appropriate given the time frame of our project and our present progress with it. Dr Gray made a further recommendation, as shown in Figure 6, that the most critical function of cyberinfrastructure for environmental observing systems is simply to store, document, preserve, and provide access to the raw environmental observation data.

Figure 6. Recommended minimum requirements for cyberinfrastructure for environmental observing systems (Source: J. Gray, Microsoft Corporation).
Conclusions
This mission statement sets out the goals of the CUAHSI HIS project as best they are understood by the HIS team at this time. No doubt these goals will be refined as additional insight is obtained as the project proceeds. The vision is exciting, tangible progress is being made, and we look forward to seeing HIS take shape!