Close the Gap between Proof-of-Concept and Data Science Product

Workshop / Overview

Find the workshop material on GitHub: https://github.com/versatile-data-kit-amld/workshop/blob/main/README.md

Enterprises nowadays invest heavily in data analytics and data science. One well-recognized problem is the inability to move a project quickly from proof-of-concept to production without the involvement of multiple teams in the organization. Versatile Data Kit is an open-source project that brings some of the best Dev Ops practices into the data world.

Participants will get hands-on experience with a modern data engineering tool that allows data practitioners to be able to bring new data, transform raw data into business meaningful KPIs, create data science models and feature engineering. The tool also allows to move predictive models from Proof-of-Concept to production in a self-service manner.

During the workshop participants will have the ability to perform feature engineering, build their predictive models and move it to production.

We will introduce the main challenges faced when moving a predictive model in production as well as the best Dev Ops practices in the data world that are easily accessible by any data users. We would start with the process of data acquisition. Next, we will perform feature engineering and run predictive data model. Once the model is tested, we will make sure it is deployed, scheduled and we can monitor our data science product.

Workshop / Outcome

Attendees will discover how to move their data science projects and underlying data processing and feature engineering to production. They will be introduced to some of the well-established Data Ops practices.

Attendees will gain hands-on-experience applying these best practices using a new open-source tool called Versatile Data Kit.

Workshop / Difficulty

Beginner level

Workshop / Prerequisites

Each attendee is expected to actively take part in the feature engineering and developing predictive model. They can form small teams if they wish to work on the same laptop. Recent laptop is recommended but no GPU is required.

Attendees should have some knowledge of Python and/or SQL and basic understanding of the data preparation process.

Workshop repo: https://github.com/versatile-data-kit-amld/workshop/blob/main/README.md

Track / Co-organizers

Dimira Petrova

Supervisor Data Analytics, VMware

Antoni Ivanov

VMware

Dako Dakov

Manager R&D, VMware

AMLD EPFL 2022 / Workshops

View workshops

MLOps on AWS: a Hands-On Tutorial

With Gabriele Mazzola, Emanuele Fabbiani, Marco Paruscio, Matteo Moroni, Marta Peroni & Gabriele Orlandi

09:00-13:00 March 262ABC

Designing Effective Visualisations to Communicate Data Stories

With Jacqueline Stählin, Charlotte Cabane, Diana Mitache & Sebastian Baumhauer

10:00-16:00 March 264ABC

Build your Own Data and Machine Learning Apps in Minutes with Streamlit

With Arnaud Miribel

10:00-12:00 March 261BC

AMLD / Global partners

AMLD EPFL 2022 Close the Gap between Proof-of-Concept and Data Science Product