Abstract
One approach to validating simulation models is to formally compare model outputs with independent data. We consider such model validation from the point of view of Frequentist statistics. A range of estimates and tests of goodness of fit have been advanced. We review these approaches, and demonstrate that some of the tests suffer from difficulties in interpretation because they rely on the null hypothesisHypothesis that the model is similar to the observationsObservations. This reliance creates two unpleasant possibilities, namely, a model could be spuriously validated when dataData are too few, or inappropriately rejected when data are too many. Finally, these tests do not allow a principled declaration of what a reasonable level of difference would be considering the purposes to which the model will be put. We consider equivalence tests, and demonstrate that they do not suffer from the previously identified shortcomings. We provide two case studies to illustrate the claims of the chapter.