Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users on the fly. However, the required exploration drives it away from successful practices in offline learning to rank, which limits OL2R's empirical performance and practical applicability.
Stepping away from existing numerous but sub-optimal OL2R algorithms, we take a unique perspective to convert offline learning to rank algorithms online, which empowers us to directly leverage the best practices in the past twenty years' development in offline learning to rank to solve the OL2R problem. In this talk, I will discuss our recent effort in learning a neural ranking model based on user clicks online. We prove that, under standard assumptions, our neural OL2R solution obtains a gap-dependent upper regret bound of O(\log^2(T)), which is defined on the total number of mis-ordered pairs over T rounds. Empirically, it also outperformed a rich set of state-of-the-art OL2R baselines on two large benchmark datasets.