SAMSI Notes on Meeting of Jan 30, 2003 Present: Prasad Kasibhatla, Richard Smith, Jim Zidek LARGE SCALE OZONE MODELS Prasad described a chemical transport model (CTM) that he and co-investigators have used to predict regional hourly levels of ozone and related species in the US in the summer of 1995. As inputs, the CTM used the MM5 meteorological model to predict the relevant meteorological variables used to calculate transport of chemical species. These MM5 outputs were also inputs for a model that enabled estimation of source emissions such as those from power plants. These emissions estimates joined the MM5 outputs in the CTM in which atmospheric oxidant chemistry was represented. The CTM generated predictions (or "simulations") of the hourly concentrations of ozone and related photochemical species during the summer of 1995 for a grid of 36km x 36km cells covering the eastern US. A variety of evaluations were then made of model performance. For example, the actual hourly ozone measurements at sites in each cell were averaged to get a cell hourly average. These were then averaged over the afternoon hours to get a daily value. These were compared with the same aggregates of the model outputs. A spatial correlation (r) of these two values over all sites was computed for each day with variable results over days. As well, the 90th percentile for each cell was calculated (without regard to the day that was attained), and r was again computed. That r, as well as the ones for other percentiles were found to be quite large. A number of substantive problems that invite statistical contribution were identified. 1. The emissions inputs are inferred from a combination of modelling (eg motor vehicle transportation models) and are therefore uncertain. One might think of incorporating these uncertainty inputs by endowing them with a (prior) probability distribution and then generating a sequence of random outputs by generating a sequence of random inputs from that distribution to propogate that uncertainty. This idea proves unrealistic since the run times for the big ozone model can entail very long run times even on a supercomputer. HOW CAN INPUT UNCERTAINTIES BE REFLECTED AS UNCERTAINTY IN THE OUTPUT? 2. The physical model provides for each hour a rich set of prior "quasi data" from which spatial mean and covariance fields can be constructed. How can these best be used in spatial prediction methodologies like kriging? In particular, given that the available database contains not only the complete set of physical model predictions as well as hourly ozone site monitor measurements, how might one compare the predictive performance of (1) the physical model? (2) the purely statistical model? (3) some synthesis of the two? Also how might one use the physical model outputs to improve the quality of standard spatial statistics covariance models? 3. In the future, one would like to develop a dynamic ozone forecasting model. How might this be done? Can a practical Kalman filter type approach be developed to enable the actual hourly ozone measurement at say 11am together with the physical model estimate be used to predict the measurement to be made at 12noon? (The Kalman filter is suggested specifically to address the problem of minimizing data storage and computational requirements.) INVERSE (SOURCE APPORTIONMENT) PROBLEMS: Given (monthly mean) measurements, y, of CO at a number of montoring sites, how might find,x, the estimates of the emissions from a discrete set of source categories? Uncertain, a prior estimates, x_o, are available from bottom-up emission inventories. For the study period considered in Kasibhatla et al. (2002), 419 observations (y's) are available from fixed 38 site monitors which, in conjunction with the x_o's, provide a basis for inference about the x's. On the modelling side, a transfer (Jacobean) matrix, K, is available from physical modelling giving an estimated mean, Kx, for the sampling distribution of the y's. One might (unrealistically) assume the y's are sampled independently to get a diagonal sampling covariance matrix, S_e. Finally, if the y's are assumed to have a jointly Gaussian sampling distribution, one can characterize the likelihood. If in addition, the prior distribution for the x's are assumed to have a jointly Gaussian sampling distribution, with covariance S_o (described in the last two sentences of section 2.1 in the paper). With these assumptions, one can estimate the x's by minimizing a negative log posterior density of the form J(x) = (y-Kx)^T S_e^{-1} (y-Kx) + (x-x_o)^T S_o^{-1} (x-x_a) where the second term comes from the prior density. The result: \hat{x} = x_o + G (y-Kx) for a matrix G determined by the S's in the log posterior above. For the particular problem considered in Kasibhatla et al. (2002), the top-down source estiates [i.e. the x's] differ significantly from the bottom-up estimates [i.e. the x_o's]. Further refining these source estimates [e.g. deriving estimates a larger number of source categories] will require far more measurements than are currently provided by the surface monitoring network. The relative paucity of monitoring data, might be overcome using satellite y_s's. There are lots of them (100, 000 per day for a total of about 7mi!). However, they are fact surrogate measures containing substantial measurement errors: y_s=(1-A)y_p + A y where y_p represents a unmeasured quality of the atmospheric column through which y is being assessed. This leads to problem 2. The problems to be addressed here are: 1. How can the S's best be specified to improve on the ad hoc techniques currently used. More generally, how might the y's be best modelled in terms of the x's and x_o's and how might uncertainties associated with the model parameters be incorporated in the final estimates. 2. How can one practically use the vast amount (7mi) satellite observations? In particular, how can one factor out the measurement error component? REFERENCES 1. Sampson, PD, Guttorp, P. (????) Operational evaluation of air quality models. NRCSE-TRS No. 018. University of Washington. http://www.nrcse.washington.edu/research/reports.html 2. Kasibhatla, P., and W. L. Chameides (2000) Seasonal modeling of regional ozone pollution in the eastern United States, Geophys. Res. Lett., 27, 1415-1418. http://www.env.duke.edu/faculty/prasad/publications/publications.html 3. Kasibhatla, P., A. Arellano, J. A. Logan, P. I. Palmer, and P. Novelli (2002). Top-down estimate of a large source of atmospheric carbon monoxide associated with fuel combustion is Asia, Geophys. Res. Lett., 29(19), 10.1029/2002GL015581. http://www.env.duke.edu/faculty/prasad/publications/publications.html 4. Hogrefe, C., S. T. Rao, P. Kasibhatla, G. Kallos, C. Tremback, W. Hao, D. Olerud, A. Xiu, J. McHenry, and K. Alapaty. (2001). Evaluating the performance of regional-scale photochemical modeling systems: Part I-Meteorological predictions, Atmos. Environ., 35, 4159-4174. http://www.env.duke.edu/faculty/prasad/publications/publications.html 5. Hogrefe, C., S. T. Rao, P. Kasibhatla, W. Hao, G. Sistla, R. Mathur, and J. McHenry. (20011). Evaluating the performance of regional-scale photochemical modeling systems: Part II-Ozone predictions, Atmos. Environ., 35, 4175-4188. http://www.env.duke.edu/faculty/prasad/publications/publications.html 6. Saito, H. and P. Goovaerts. (2001). Accounting for Source Location and Transport Direction into Geostatistical Prediction of Contaminants, Environmental Science & Technology, Vol.35, No.24: 4823-4829. http://www-personal.engin.umich.edu/~hirotaka/publication.html 7. Rodger, Clive. (2000). Inverse methods for atmospheric sounding: Theory and Practice, World Scientific Publishing Co. Pte. Ltd., Singapore, 2000.