We will get hands-on experience with real data provided by the data centers of VMware . Yes, it is a real data, so we won’t get that easily without taking care and massaging it a little bit. We will characterize the workload of multiple virtual machines based on diverse set of performance measurements like those, generated by your OS's performance manager.
We will demonstrate how you can derive meaningful features that capture diverse aspects of our data. We will leverage them in several expressive data embedding models and come up with meaningful interpretation. The hardcore theory will be left behind, however by the end of the workshop you will develop intuition about the algorithms we are using and how they can make our data tell us a story.
As a participant you will be able to approach real enterprise data science project by:
- Organize multi time series data for further analysis
- Apply different time series imputation methods
- Properly derive robust descriptive statistics from noisy time series
- Extract more advanced features from TS domain specific methods like ACF and spectral density
- Get high level overview of unsupervised algorithms like t-SNE, UMAP, DBSCAN and use them for modelling aforementioned data
- Profile regions of the embedded data and interpret the results
Beginner level
- Basic level in ML and statistics
- Basic knowledge of Python
- R practitioners are also welcomed
- Own laptop with a modern browser
- Google account (for using Colab)
- Repository: https://github.com/amld-vmware/AMLD_2020