The perceptual system estimates distal conditions based upon proximal sensory input. It typically exploits information from multiple cues across and within modalities: it estimates shape based upon visual and haptic cues; it estimates depth based upon convergence, binocular disparity, motion parallax, and other visual cues; and so on. Bayesian models illuminate the computations through which the perceptual system combines sensory cues. I review key aspects of these models. Based on my review, I argue that we should posit co-referring perceptual representations corresponding to distinct sensory cues. For example, the perceptual system represents a distal size using a representation canonically linked with vision and a distinct representation canonically linked with touch. Distinct co-referring perceptual representations represent the same denotation, but they do so under different modes of presentation. Bayesian cue combination models demonstrate that psychological explanation of perception should attend to mode of presentation and not simply to denotation.