Department of Chemistry - Seminar - Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C–O Couplings

02:00pm - 03:30pm

Speaker: Dr. Jules SCHLEINITZ

Institution: École Normale Supérieure PSL, Paris

Hosted By: Professor Haibin SU

Co-Host: Professor Zhenyang LIN

Zoom Link:



Synthetic yield prediction using machine learning is intensively studied. Previous work has focused on two categories of data sets: high-throughput experimentation data, as an ideal case study, and data sets extracted from proprietary databases, which are known to have a strong reporting bias toward high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a data set on nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.


About the speaker

Jules Schleinitz completed a bachelor in chemistry and physics and then a master in theoretical chemistry at the École Normale Supérieure in Paris, then was recruited for a three year PhD at École Normale Supérieure under a teaching contract. He will defend his Ph.D thesis entitled "Mechanistic Analysis and Machine Learning" in October. In November he will start a postdoc for Computer Assisted Synthesis in Sarah E. Reisman's group at Caltech.

Event Format
Recommended For
PG students
Faculty and staff
Department of Chemistry
Post an event
Campus organizations are invited to add their events to the calendar.