When I was new to Data Science, I used to participate in hackathons to learn new techniques for solving the problems in Data Science. Hackathons are meant for a short period of time. We can learn a lot in less time in the process of improving the score/rank.
When it comes to real-world problems, it is completely different from the hackathons. The data science enthusiasts who don’t have real-world experience think that both are same. I was also one of them at some point in time. So, I wrote this article to explain the differences between Real-world problems and Hackathons.
Hackathons are very structured. The time period for solving the problem is fixed and there will be a leaderboard to check where we stand. The data and the evaluation metric will be provided by them. The participants can focus only on feature engineering and modelling. Most of the times, we follow 5 steps during hackathons.
- Preprocess data and create features
- Build a suitable model
- Make submission
- Check leaderboard
- Repeat the above 4 steps.
For a Data Science newbie, hackathons are a perfect way to improve knowledge. Each dataset is different in its own way. The same methodology may not be applicable to all the problems. To get a sense of different types of problems, we can make use of hackathons. At the end of each hackathon, we can compare our approach with the leaderboard top rankers and try to improve in future. Some of the most popular online platforms for hackathons are Kaggle and AnalyticsVidya.
Hackathons are a great opportunity for learning and networking. We can apply state-of-the-art approaches. We can also earn money and get recognized in the data science community.
Unlike hackathons, real-world problems are very unstructured. First of all, we should formulate the problem. For most of the problems, data may not available. Even if the data is available, it will be unstructured and noisy. Sometimes, the raw data will be in GBs/TBs. To process the data and creating the features itself will take months. Domain knowledge plays an important role in creating features.
In hackathons, we have independent features and the target variable. We can directly start with modelling. But, in real-world problems, problem-solving plays a key role. We cannot solve if we have only knowledge of how to use data science tools. To come to a proper solution, it requires a lot of discussions and experimentations. In the end, we should be able to prove that the provided solution will work.
The high-level real-world data science ML pipeline looks like this:
- Understanding the business problem
- Formulating the problem
- Collect data or create data
- Preprocess data and create features
- Modelling
- Way to evaluate the solution on real scenarios
- Model deployment
Model deployment is the last stage of the pipeline. If we don’t deploy, the users cannot use our product. As the number of users increases, it should handle huge traffic. In some companies, there will be a separate team to deploy and in some companies, data scientists will do this. So, it’s better to have basic knowledge of how it works.
I hope you got an overview of how real-world data science problems are completely different from Hackathons. As I said, you don’t get a chance to formulate the problem, deploy and test in hackathons. If you have any queries, comment in the comments section below. I would be more than happy to answer your queries.
Thank you for reading my blog and supporting me. Stay tuned for my next article. If you want to receive email updates, don’t forget to subscribe to my blog. Keep learning and sharing!!
Follow me here:
GitHub: https://github.com/Abhishekmamidi123
LinkedIn: https://www.linkedin.com/in/abhishekmamidi/
Kaggle: https://www.kaggle.com/abhishekmamidi
If you are looking for any specific blog, please do comment in the comment section below.
GitHub: https://github.com/Abhishekmamidi123
LinkedIn: https://www.linkedin.com/in/abhishekmamidi/
Kaggle: https://www.kaggle.com/abhishekmamidi
If you are looking for any specific blog, please do comment in the comment section below.
Thank you for you compliment. Please don't spam it.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete