Latest updates
Kickstart Your ML Journey: Scoping, Structuring, and Exploring Data (Part 1)
We will cover the following topics in this post
- Understand the business problem
- Set up your working environment and directory layout
- Gather data (use multithreading to speed up 2 to 4x)
- Pre-process data (use vectorization to speed up 10x)
- Gain valuable insights through EDA
- Build interactive visualizations (in Part 2 of this series)
- Finally use ML to answer questions (in Part 3 of this series)
- Extras: you will also learn how to modularize the code into independent and reusable components, as well as how to use abstraction.
Note: this post is intended for beginner to mid level data scientists.
Almost all data science and ML projects start with a business problem. So, let’s define the problem that we are trying to solve here first.
Say, you work for a taxi service company in NYC and your team is trying to…