Efficient algorithms are indispensable in large scale ML applications. In recent years, the ML community has not just been a large consumer of what the optimization literature had to offer, but it has also been acting as a driving force in the development of new algorithmic tools. The challenges of massive data and efficient implementations have led to many cutting-edge advances in optimization.
The goal of this workshop is to bring practitioners and theoreticians together and to stimulate the exchange between experts from industry and academia. For practitioners, the workshop should give an idea of exciting new developments which they can *use* in their work. For theorists, it should provide a forum to frame the practicality of assumptions and recent work, as well as potentially interesting open questions.
Welcome Remarks
13:30-13:35 January 27
Theory of neural networks training: challenges and recent results
13:35-14:15 January 27
Theory Vs. Practice – It’s a Data Problem
14:15-14:55 January 27 · with Claudiu Musat
Coffee Break
14:55-15:10 January 27
Bayesian Hyperparameter Optimization for Automated Machine Learning
15:10-15:50 January 27 · with Aaron Klein
Multilingual word alignment
15:50-16:30 January 27 · with Armand Joulin
Lénaïc Chizat: Theory of neural networks training: challenges and recent results
CNRS researcher in Orsay (France)
The current successes achieved by neural networks are mostly driven by experimental exploration of various architectures, pipelines, and hyper-parameters, motivated by intuition rather than precise theories. Focusing on the optimization/training aspect, we will see in this talk why pushing theory forward is challenging, but also why it matters and key insights it may lead to. Along the way, we will present some recent results on the role of over-parameterization, on the phenomenon of "lazy training" and on training neural networks with a single hidden layer.
Claudiu Musat: Theory Vs. Practice – It’s a Data Problem
Director of Research, Data, Analytics & AI, Swisscom
Focusing on the creation of dialogue systems, the Swisscom ML research team has been often asked to improve systems that are currently in use. To an ML person, this is a dream case, as the starting expectation is that any ML system should beat hand crafted rules. Moreover, after framing the problem correctly, we find that it is actually well studied and that the solutions abound. The task is easy. In theory.
We find that the data available at the start of projects is not there, not in the right format or simply not enough. In this talk I will discuss possible solutions to the low data availability and showcase several problems where traditional approaches fail. In practice.
Aaron Klein: Bayesian Hyperparameter Optimization for Automated Machine Learning
Phd student, Machine Learning Lab (Frank Hutter), University of Freiburg
Machine learning has recently achieved great successes in a wide range of practical applications, but the performance of the most prominent methods depends more strongly than ever on the correct setting of many internal hyperparameters. The best-performing models for many modern applications are getting ever larger and thus more computationally expensive to train, but at the same time both researchers and practitioners desire to set as many hyperparameters automatically as possible. Automatic machine learning (AutoML) is a new research area that targets the progressive automation of machine learning. One of its success stories is Bayesian hyperparameter optimization which tries to find the best hyperparameter setting for a given machine learning algorithm.
In this talk, I will show how Bayesian optimization can be efficiently used for hyperparameter optimization. I will also present recent advances that speed up the optimization process by exploiting cheap approximations of the objective function, such as the performance when running on a subset of data or the learning curves of iterative machine learning algorithms.
Armand Joulin: Multilingual word alignment
Research scientist, Facebook Artificial Intelligence Research
We consider the problem of aligning continuous word representations, learned in multiple languages, to a common space. It was recently shown that, in the case of two languages, it is possible to learn such a mapping. This talk will present several recent approaches for bilingual alignment that works with and without supervision, as well as an extension to the problem of jointly aligning multiple languages to a common space.
In presence of supervision, we show that this problem can be cast as retrieval problem with a convex formulation, leading to significant improvement over the state of the art. In absence of supervision, we propose an approach based on optimal transport, with theoretical guarantees and competitive empirical performance.
Intermediate level
The workshop will not discuss high level aspects of ML or data processing, but rather focus on core components that are essential for actual implementations. Hence, the participants should ideally be familiar with the main optimization algorithms used in ML and the main challenges arising in the implementations.