Information theory has guided the practical communication system design by characterizing the fundamental limits of data communication and compression. This talk will discuss how methodologies originating from information theory can provide similar benefits in learning problems. We show that information-theoretic tools can be used to understand the generalization behavior of learning algorithms, i.e., how a trained machine learning model behaves on unseen data.
We provide an exact characterization of the generalization error for the Gibbs algorithm, which can be viewed as a randomized empirical risk minimization algorithm. We show that the generalization error of the Gibbs algorithm is equal to the symmetrized Kullback-Leibler (KL) information between the input training samples and the output model weights. Such an information-theoretic approach is versatile, as we can also characterize the generalization error of the Gibbs-based transfer learning algorithms using the conditional symmetrized KL information. We believe this analysis can guide the choice of transfer learning algorithms and the design of machine learning system in practice.
Yuheng Bu received the B.S. degree (Hons.) in electrical engineering from Tsinghua University in 2014 and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana Champaign in 2019. Since September 2019, he has been a Postdoctoral Research Associate with the Institute for Data, Systems, and Society (IDSS) and the Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology. His research interests lie in the intersection of information theory, signal processing, and machine learning.