GIS, GeoObjects, and Monte Carlo Simulation:
A Match Made in Heaven (or )?
December 3rd, 1999
Table of Contents
The Marcus Hook refinery is a crude oil refinery and storage facility on the Delaware river, near Philadelphia, Pennsylvania. Subsurface chemical fate and transport is currently being investigated at this site by researchers at The University of Texas at Austin, including Dr. David Maidment, Julie Kim, Andrew Romanek, and Lesley Hay Wilson. Due to their efforts, a large portion of the facility has been processed for use in a GIS environment.
A GIS grid of groundwater elevations developed by Andrew Romanek will be used (without modification) in this project to estimate groundwater gradients and directions at the Marcus Hook refinery.
As part of my PhD dissertation research, I have programmed a one-dimensional subsurface model for predicting the fate and transport of aged hydrocarbons in soil. The conceptual model includes a contaminated source area (zone 1) and a (initially uncontaminated) downgradient area (zone 2). At the end of zone 2 lies a hypothetical receptor (also called the point of demonstration, or POD) where we wish to predict the chemical concentration as a function of time (i.e. the breakthrough curve). The express purpose of this model is to quantify and evaluate the effects of kinetically-limited release from the soil to the water phase, however this aspect will not be stressed in this term project. A large component of the model, which is relevant to this term project, is that it can be run repeatedly, through the use of Monte Carlo Simulation (MCS).
The groundwater model from my dissertation research will be used, with minor modifications, to calculate chemical fate and transport for this project. Note that this model is a groundwater transport model, not a groundwater flow model. A flow model uses inputted heads and Darcy's Law to calculate groundwater velocity. A transport model uses inputted groundwater velocity and the advection/dispersion/reaction equation to calculate breakthrough curves. Hence, the groundwater velocity is required as an input.
Monte Carlo Simulation
The purpose of using Monte Carlo Simulation is to generate multiple sets of simulated, but realistic data. The creation of numerous sets of data allows for the generation of statistical inferences (e.g. standard deviation and coefficient of variation) related to the output of a model. However, the creation of numerous sets of results necessitates running the model a large number of times, frequently 1,000 times or more. Hence, this is not practically feasible without some form of automation. My original groundwater model is run by executing a Visual Basic for Applications (VBA) program in Excel, which then runs a VB .exe file (which is the guts of the groundwater model). Automation is handled in VB by @Risk and the @Risk Developer's Kit, both of which are MCS software packages developed by Palisade Corp (Newfield, NY).
As an aside for those unfamiliar with VBA, VBA is Visual Basic, but the focus lies not in creating generic stand-alone code, but in driving a software package. Every COM-compliant software package has a suite of commands and functions in an object library that can be used to drive the software. For example, Excel has an object library and if you wanted a program that writes the number 3000 into cell B2 of the active spreadsheet, you might code something like this:
Range("B2").Value = 3000
The Range("B2").Value command is not a part of generic Visual Basic, it is a command in the Excel object library (i.e. VBA for Excel). Similarly, the @Risk Developer's Kit and ArcInfo 8.0 have a suite of commands that may be used to automate their capabilities. Theoretically, anything the user can do by hand may be programmed to run automatically using VBA.
@Risk and the @Risk Developer's Kit are automated by the groundwater model and will be used in this project.
ArcInfo 8.0 (the latest version, currently in beta form as of Dec. 3, 1999) is the first version to include VBA functionality. In this way, ArcInfo 8.0 is theoretically able to communicate with my groundwater model in the same way that Excel does, namely, by writing a text file of input values, shelling out to the groundwater model, and generating a plot of the model output.
For the purposes of this project, it is assumed that there exists a discrete chemical source area and a corresponding point of demonstration (POD) where we are interested in predicting the concentration of chemical as a function of time (i.e. the breakthrough curve). Desired functionality includes:
the user be able to select in ArcMap (an application within ArcInfo 8.0 that is very similar to ArcView 3.1) the locations of the chemical source area and the corresponding POD.
the user be able to run the groundwater model from the ArcMap environment and a graph of the breakthrough curve at that POD be automatically displayed.
the POD be intimately (and permanently) associated with the numerical model results, in the sense that the user be able to plot the graph of the breakthrough curve at any time. This requires a unique linkage between each POD and each text file containing the results. It is easy enough to give a text file a unique name, however, until the advent of ArcInfo 8.0, it has not been possible to create a POD object, associate code with it, and give it a unique identifier.
ArcInfo 8.0 is also the first version of ArcInfo to support fully functional objects, in the parlance of Unified Modeling Language (UML). It is beyond the scope of this document to describe, in any detail, UML and the creation of custom objects. The reader is referred to Tim Whiteaker's term project and Kim Davis's masters thesis for more information. For now, it is sufficient to recognize that UML is an umbrella under which different software packages can use common code in the form of objects. For the purposes of this project, the aspect of UML that is most relevant involves the creation of a custom object with a custom property. The object was created by Kim Davis, Tim Whiteaker, and Daniel Opdyke and, since it was the first such object created at UT, it was called HelloWorld. For this project, the HelloWorld object will be used as a POD. The custom property of the HelloWorld object is its name, although, as will be described later, the name property did not work and a work-around had to be used.
For this project, ArcInfo is used to show a groundwater elevation grid of the site, to drive the groundwater model, and to drive MSGraph in creating a plot of the results. For a hypothetical user of this model, ArcInfo is the only software directly accessed by the user.
Visio Enterprise is used to create the HelloWorld object. The reader is referred to Tim Whiteaker's term project for a description of creating custom objects in Visio Enterprise.
In the model developed for my dissertation, the breakthrough curve (i.e. the model output) is plotted in Excel. However, 1) Excel is not used in this term project for any other purpose, 2) Excel requires a lot of memory, and 3) the only ability desired was the ability to plot a graph. Hence, another software package was desired. An excellent candidate turned out to be MSGraph (called Graph9 and Microsoft Graph 2000 in Office2000). MSGraph is the program used by PowerPoint when the user wishes to generate a plot or graph from scratch, i.e. without using Excel. It functions much like an Excel graph, only it does not require the memory overhead that Excel requires. Most importantly, MSGraph is COM-compliant (Component Object Model), which means that any other COM compliant software (e.g. ArcInfo 8.0, but not previous versions) can run MSGraph automatically. In this project, VBA code in ArcInfo will be used to drive MSGraph to plot the breakthrough curve.
Since ArcInfo 8.0 has not yet been released, and the scope of this project draws heavily on capabilities that were not included in ArcInfo 7, the primary goal of this project was to establish what is possible with the new software. Hence, this project may be considered a demonstration. The selection of the Marcus Hook site was merely a convenience and this project should not be construed as a rigorous modeling analysis of Marcus Hook. Hence, no conclusions should be inferred with regards to groundwater transport at Marcus Hook.
The objectives of this project were as follows:
create a custom object.
associate the custom object with the groundwater model and MSGraph such that these may be run automatically from the ArcMap environment.
plot time series data from within the ArcMap environment.
All told, this project relied on the following software:
ESRI core object library in VBA
Code Generation Wizard (in Visual C++)
Schema Creation Wizard (in ArcCatalog)
MSGraph (part of Microsoft Office)
@Risk and the @Risk Developer's Kit
As such, part of this project is, by default, an investigation of the ease of connection between many different software packages. In the conclusions section, some of these connections will be evaluated.
The work involved in this project can most easily be illustrated in the form of the following figure. You may click on a blue square to go directly to that section of the report or you can continue scrolling down.
Figure 1. Overview of Project
Map of Site
A photo of the The Marcus Hook refinery is shown in Figure 2 (click the figure for larger view). The orange line denotes the site boundary.
Figure 2. Photo of the Marcus Hook Refinery
An ArcView layout of the site is given in Figure 3. In this figure, buildings are shown in green, tanks are shown in red, the Delaware river, Marcus Hook creek, and other water bodies are shown in blue, and the site boundary is in orange.
Figure 3. Layout of the Marcus Hook Refinery
The only map of the Marcus Hook site that will be directly used in this project is a grid of the groundwater elevations. This grid is shown in Figure 4 below.
Creation of the Custom HelloWorld Object
In order to associate a specific POD with a breakthrough curve at the POD's location, we created a custom object called HelloWorld. For this project, the HelloWorld object is equivalent to a POD. The HelloWorld object was created using Visio Enterprise, a software package that creates the framework for an object model. In a nutshell, custom objects are computer representations of real-world objects. For example, a dam (in real life), can be represented by a dam custom object in the computer. This dam object may have methods (a.k.a. behaviors, e.g. the ability to control the flow of a river) and/or properties (a.k.a. attributes, e.g. height or name). Objects in a model diagram inherit methods and properties from objects higher up in the diagram. ArcInfo 8.0 comes bundled with a generic model diagram, with the intention that some users may create specific objects as extensions to the generic model. This is exactly what we did. We created the HelloWorld object immediately below Feature, in the ESRI model diagram. Hence, HelloWorld inherits all properties and methods of Feature, Object, and Row (all of which are above HelloWorld in the model diagram). In Visio Enterprise, we assigned the property "name" to HelloWorld. Also in Visio Enterprise, we created an interface called IHelloWorld.
After creating the HelloWorld object and IHelloWorld interface in Visio Enterprise, we exported them to the Microsoft Repository (using a menu command in Visio). The Microsoft repository is a storage location for custom objects. Then we opened Visual C++ and used the Code Generation Wizard (an application that is part of ArcInfo 8.0) to generate C++ stub code. This code is required for proper functioning of the custom object. Since we did not wish the HelloWorld object to have any inherent methods, we did not have to program any C++ code ourselves. By inherent methods, I mean methods that are intrinsically a part of the object. These must be programmed in C++. Once in ArcInfo, methods can be coded for objects using VBA by way of the object's interface. The IHelloWorld interface is needed because, in VBA, one does not work directly with objects, rather one works with the object's interface. For VBA purposes, the interface can be thought of as a handle, allowing the VBA code to operate on a specific object. This solution is more flexible, but less stable than programming the method(s) in C++ directly.
The last step in the creation of the HelloWorld object is the use of ArcCatalog to create a personal database, a feature dataset, and, using the Schema Creation wizard, a feature class. At this point, an empty HelloWorld class exists and is ready to be added to an ArcMap document (an ArcMap document is similar to an ArcView project). Note that the projection of the feature dataset (specified upon creation of the feature dataset) must be the same as the ArcMap document. In ArcMap, I loaded the gwelevq4 grid (groundwater elevations) and pressed the add data (+) button and added the HelloWorld feature class. Now that HelloWorld is loaded, the ArcMap editor may be used to create (and name) new HelloWorld objects. A screen shot of the groundwater elevation grid of Marcus Hook and a single HelloWorld object is shown in Figure 4.
Figure 4. Groundwater Elevation Grid and HelloWorld Object
The two tools identified in Figure 4 are used to specify the location of the source area (called point a, using tool 1) and the POD (called point b, using tool 2). These tools are used the same way as the identify tool: the user selects the tool by single-clicking on it, selects a location on the grid, and the code associated with the tool runs automatically. The code for tool 1 is similar to that for tool 2: both draw a colored dot and write a text file containing the x, y, and z coordinates of the location where the user clicked. Unfortunately, I had to fudge the z (elevation) value. I had hoped to use the ESRI function GetVal to return the z value, but since this is a beta version and ArcInfo 8.0 is not intended to be 100% compatible with grids anyway (as it is with vector layers), this function did not work. Instead, I hardwired the code to write an elevation of 18 for point a and 2 for point b. A screen shot of point a (the source area) and point b (the POD) is shown in Figure 5. Note that point b is in the same location as the HelloWorld object shown in Figure 4. It would have been nice to eliminate point b entirely and simply use the coordinates of the HelloWorld object. Unfortunately, I could not find a function that returns the coordinates of the HelloWorld object (x, y, and/or z, although z is most definitely a lost cause here because the HelloWorld object does not have an elevation). Hence, point b is used to define the location of the POD, but the numbers describing the breakthrough curve will be associated with the HelloWorld object.
Figure 5. Groundwater Elevation Grid with Point a and Point b
Now that the x, y, and z coordinates of both point a and point b are written to text files, the groundwater model can be run. This is accomplished by selecting the HelloWorld object, right-clicking to bring up the context menu, and selecting "run model and plot graph". The context menu was customized to include two VBA subroutines: "run model and plot graph" and "plot graph only". The subroutine "run model and plot graph" should only be used once for each unique pair of source area and POD. Afterwards, the subroutine "run model only" should be used. In this way, if other locations for a source area and/or a POD are selected, they will not affect the breakthrough curve associated with the original POD. Figure 6 illustrates the context menu.
Figure 6. Groundwater Elevation Grid and Context Menu
The subroutine "run model and plot graph" performs the following steps:
It verifies that a single HelloWorld object is selected and ascertains its OID value. I had hoped that the HelloWorld property "name" could be used to uniquely identify each specific HelloWorld object, however, the code did not work. Instead, I used the Object ID (OID). HelloWorld inherited this property from the ESRI "object" object (which is between row and feature in the ESRI model diagram). Since the OID is unique for each object, it can be used instead of "name" without problem, although it is less intuitive.
It opens the pointa.txt and pointb.txt files and reads in the x, y, and z coordinates of the two points.
It calculates the linear distance and gradient between the two points.
It writes these, and other necessary input parameter values to the tempp.txt file in the order required by the visual basic groundwater model. Note that the other input parameters are hard-wired to values that I believe are representative of Marcus Hook, but I had to make some of them up. Therefore, no conclusions should be drawn with respect to the Marcus Hook site itself. Most importantly, I had to make up most of the probabilitic information. For this demonstration project, I assumed that all inputs were normally distributed. The ultimate goal here would be to write the VBA code to automatically open an Access database and extract the relevant parameter values (and even their uncertainties), based on the locations of points a and b. At the present time, this may be possible, but would likely be extremely time-consuming to program.
It shells out to the the visual basic groundwater model and waits for the model to finish. A flow chart of the groundwater transport model is shown in Figure 7:
Figure 7. Flow Chart of Groundwater Transport Model
It renames the files (which were created by the groundwater model):
It creates an instance of MSGraph, reads in the model results (from the the files renamed in step 6), puts the numbers in the proper columns in MSGraph, and formats an XY plot of the breakthrough curves at point b (the POD).
The subroutine "plot graph only" performs steps 1 & 7 only.
A screen shot of the plot as it appears is given in Figure 8.
Figure 8. Screen Shot of MSGraph
Note that in Figure 8, the lines are very narrow. There exist commands in the MSGraph object library that are supposed to allow formatting of the lines (e.g. thickness, color) using code. However, I could not get any of these commands to work. I posted a question (twice) to a VB newsgroup, but got no replies. This is one of the few persistent problems I had with the Microsoft applications. Needless to say, there were many more persistent problems with ArcInfo 8.
A better formatted view of the breakthrough curves is given in Figure 9.
Figure 9. Breakthrough Curves at the POD
The breakthrough curves are somewhat choppy, since I used a limited number number of time steps in order to minimize the run time. Shown in Figure 9 are the results of 10 Monte Carlo Iterations. Each line represents a breakthrough curve generated from one iteration. Each iteration uses a set of randomly sampled values, one for each uncertain input parameter. These values are randomly selected from the distributions of the input parameters (all of which are normal and are specified by a mean and standard deviation). All curves stop at 500,000 days because that is the time I selected to end the model.
In this term project, a custom object was created using Visio Enterprise, Visual C++, and two wizards that come bundled with ArcInfo 8.0. This object was used to define a point of demonstration (POD) where a breakthrough curve (concentration as a function of time) was calculated and displayed. There are at least two innovative aspects of this project:
The creation of a custom object
The generation of a plot of time series data in a GIS software package.
Both of these are only possible in GIS with the arrival of ArcInfo 8.0. Although the final model did not contain many capabilities that I had hoped for, and much of the code is crude, the most important goal was achieved: a custom object now has a plot of time series data intimately associated with it.
Several persistent problems were encountered along the way (besides the endless beta problems):
For those who do not know C++, the output of the code generation wizard is a black box. When this stub code had errors, we were hard pressed to fix them.
Documentation for the ESRI core object library is sparse and on-line help is non-existent, since no-one has used this software yet. This is in direct contrast to most Microsoft applications which enjoy a large user base, extensive off- and on-line help, and numerous newsgroups.
The VB and Fortran components of the groundwater model are necessary because VBA is slower than molasses in January (and not in Texas either!). This significantly limits the potential for VBA use in a number intensive environment such as GIS. I have no way to compare the speed of Avenue or AML to VBA, so I won't even try.
The "name" property did not work, ever. This begs the question: why? - and leads to what I think is an unavoidable, but significant, problem in the creation of custom objects. Namely, the process of creating a custom object requires several different software packages, wizards, etc. in a multi-step process. It is difficult to determine where the mistake was made in the creation of the name property. The error message we got when trying to use it was, as usual, uninformative. A similar problem occurred in my research when connecting VB to Fortran. Namely, if your code crashes and you can't isolate which software package caused the problem, it is exceptionally difficult to debug. Luckily, all COM compliant languages are easy to debug, hence, a scorecard may look as follows (note that VBA means any VBA, including ArcInfo VBA):
Ease of Connectivity Scorecard
|Visio||to||ArcInfo||***||A multi-step process. Most is automated, but if something doesn't work, it is hard to isolate.|
|VBA||and||VB executable||***||Shell command has problems, but so do other commands.|
|VB or VBA||and||MSGraph||****||Everything in one environment, debugger works.|
|VB or VBA||and||Fortran||*||Fortran fast, but not COM-compliant. Capabilities limited to passing numbers back and forth. Passing difficult to debug.|
|VB or VBA||and||@Risk||****||Everything in one environment, debugger works.|
Ratings: * sad ** poor *** workable **** ideal
In conclusion, ArcInfo has now joined the ranks of modern, connectible, software packages. The new capabilities allow the user to use ArcInfo for what it does best, and easily connect to other software packages to take advantage of their abilities, all within the same VBA subroutine. In the future, this will allow ArcInfo users to significantly decrease repetitive work and hand copying of numerical values, both of which are prone to errors. For example, an MS Access database could be programmed so that, when a user enters a new value in a specific cell, a specific ArcInfo map document is automatically opened, the corresponding value in a specific object is changed, and a text file is modified to log the action. Furthermore, the object in ArcInfo could have a method that performs an action in response to this change, such as changing its color and sending a signal to other objects in the network, thereby activating some of their methods.
The possibilities are endless, as are the potential for headaches...
Many thanks to
UT GeoObject Team: Kim Davis, Tim Whiteaker
UT Marcus Hook Team: Lesley Hay Wilson, Andrew Romanek, Julie Kim
ESRI: Maggie Ruan, Rob Burke, Evan Brinton