In decentralized learning, the goal is to efficiently leverage the power of several user devices or computing nodes, in order to speed up training of machine learning models, while enabling data privacy. Everyone keeps control over their own local training data, and communicates only with a small number of its neighbors, without a central coordinator. This allows to train models in federated learning setting---directly on user’s devices without revealing any sensitive user data to other nodes. The main bottleneck in decentralized (and other distributed) training methods is communication time because the communication between machines often slow and the amount of the trainable parameters is usually huge, (for example modern neural networks often have several billions of weights). In this talk, I will discuss decentralized optimization techniques, and in particular Choco-SGD --- a recent algorithm improving communication-efficiency of decentralized methods, which is especially important in low bandwidth networks. Choco-SGD can drastically reduce communication cost using arbitrary communication compression, while still converging with the same speed as decentralized SGD with full communications. It is the first provable decentralized SGD algorithm which can deal with arbitrary high communication compression.