Abstract
Graphical models are a powerful tool for causal model specification. Besides allowing for a hierarchical representation of variable interactions, they do not require any a priori specification of the functional dependence between variables. The construction of such graphs hence often relies on the mere testing of whether or not model variables are marginally or conditionally independent. The identification of causal relationships then solely requires some general assumptions on the relation between stochastic and causal independence, such as the Causal Markov Condition and the Faithfulness Condition (Spirtes et al. 2000; Pearl 2000). However, a procedure would require further assumptions to hold. Namely those the independence tests themselves are based on.
In continuous settings, Spirtes et al. (2000) suggest causal inference based on a very restrictive formulation of independence, that is, vanishing partial correlations. Such a measure does, however, limit the applicability of causal inference to linear systems. This constitutes a serious drawback especially for the social sciences where an a priori specification of the functional form proves difficult or at odds with linearity. In short: graphical models theoretically reduce specification uncertainty regarding functional dependence, but their implementation in practice deprives them of this virtue.
In this paper we investigate how causal structures in continuous settings can be identified when both functional forms and probability distributions of the variables remain unspecified. We focus on tests exploiting the fact that if X and Y are conditionally independent given a set of variables Z, the two conditional densities f (X|Y, Z) and f (X|Z) must coincide. We start by estimating the conditional densities f(X|Y, Z) and f(X|Z) via nonparametric techniques (kernel methods). We
proceed by testing if some metric expressing the distance between these very conditional densities is sufficiently close to zero. Out of several metrics available in the literature to express such distance we choose two, the Euclidean, and the Hellinger distance. We investigate in a Monte Carlo study how different tests involving either measure are able to detect statistical independence, conditioned on a small set of variables.
One limitation may result from nonparametric density estimation being subject to the curse of dimensionality. As the number of variables increases, the estimated empirical density converges at a slower rate to its population value. To compensate
this drawback we use a local bootstrap procedure which consists of resampling the data for each test. While local bootstrap strongly increases the computational time of the test, it succeeds in counterbalancing the curse of dimensionality.