Despite recent advances, goal-directed generation molecules remains challenging due to the discrete nature of the data. In practice, expensive heuristic search or reinforcement learning algorithms are often employed. In the first part of the work, we investigate the use of conditional generative models which directly attack this inverse problem, by modeling the distribution of molecules given properties of interest. Unfortunately, maximum likelihood training of such models often fails with the samples from the generative model inadequately respecting the input properties. To address this, we introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We avoid high-variance score-function estimators that would otherwise be required by sampling from an approximation to the normalized rewards, allowing simple Monte Carlo estimation of model gradients. We test our methodology on generating molecules with a set of user-defined properties and we find improvements over maximum likelihood estimation and other baselines. In the second half of the work, we investigate the advantage and disadvantages of the latent variable model versus the autoregressive model for de novo molecule design task and propose potential ways to combine the best of two worlds.
Download the slides for this talk.Download ( PDF, 1172.04 MB)