Abstract
We propose a novel probabilistic image selection method for the Web image gathering system we proposed before. It employed two-step processing: (1) Gather HTML files of Web pages related to given keywords, analyze them and fetch only Web images expected to be highly related to the keywords. (2) Select only relevant images from the gathered images based on the image-feature-based clustering. In this paper, we propose building a generative model based on the Gaussian mixture model to represent the distribution of image features of images related to the given keywords, and applying it to select images instead of the processing (2). We call the new system ``Probabilistic Image Collector''. We show the effectiveness of our proposed system by the experimental results.