Abstract
The enormous scale of the available information and products on the Internet has necessitated the development of algorithms that intermediate between options and human users. These algorithms attempt to provide the user with relevant information. In doing so, the algorithms may incur potential negative consequences stemming from the need to select items about which it is uncertain to obtain information about users versus the need to select items about which it is certain to secure high ratings. This tension is an instance of the exploration–exploitation trade‐off in the context of recommender systems. Because humans are in this interaction loop, the long‐term trade‐off behavior depends on human variability. Our goal is to characterize the trade‐off behavior as a function of human variability fundamental to such human–algorithm interaction. To tackle the characterization, we first introduce a unifying model that smoothly transitions between active learning and recommending relevant information. The unifying model gives us access to a continuum of algorithms along the exploration–exploitation trade‐off. We then present two experiments to measure the trade‐off behavior under two very different levels of human variability. The experimental results inform a thorough simulation study in which we modeled and varied human variability systematically over a wide rage. The main result is that exploration–exploitation trade‐off grows in severity as human variability increases, but there exists a regime of low variability where algorithms balanced in exploration and exploitation can largely overcome the trade‐off.