Skip to main content


Showing posts from November, 2019

Real-world data science vs Hackathons

When I was new to Data Science, I used to participate in hackathons to learn new techniques for solving the problems in Data Science. Hackathons are meant for a short period of time. We can learn a lot in less time in the process of improving the score/rank. When it comes to real-world problems, it is completely different from the hackathons. The data science enthusiasts who don’t have real-world experience think that both are same. I was also one of them at some point in time. So, I wrote this article to explain the differences between Real-world problems and Hackathons. Hackathons are very structured. The time period for solving the problem is fixed and there will be a leaderboard to check where we stand. The data and the evaluation metric will be provided by them. The participants can focus only on feature engineering and modelling. Most of the times, we follow 5 steps during hackathons. Preprocess data and create features Build a suitable model Make submissi

DBSCAN - Density-Based Spatial Clustering of Applications with Noise

DBSCAN is one of the most popular clustering algorithms after the K-means clustering algorithm. It is very good at clustering arbitrary shaped clusters like square, rectangle,..etc. K-means are effective for spherically shaped clusters, which is one of the main disadvantages. In this article, we will see how DBSCAN works. DBSCAN is a density-based non-parametric clustering algorithm. As the name says, it clusters the data based on density i.e., the neighbouring points forms a cluster. The algorithm is also good at detecting outliers or noise. The main advantage of DBSCAN is that we need not choose the number of clusters. Before going into the depth of the algorithm, we should be familiar with a few terms that will be used to explain the algorithm. In K-means, k is a parameter that should be chosen by the user. Similarly, there are two parameters in DBSCAN, ϵ and MinPts. ϵ is the radius around a point and MinPts is the minimum number of points in the given ϵ. In DBSCAN

PuLP - Linear Programming with Python

Are you wondering whether this would be useful for Data Science or not? The answer is “Yes”. This article will be highly useful in Data Science. In this article, you will learn how to use PuLP and where this can be used. When solving Data Science problems, we often come to a point where there are a lot of constraints and you are trying to maximize or minimize the cost. Here, you can use PuLP to solve for an optimized solution. The PuLP helps us to write mathematical expressions in Python language. There are many commercial and other open-source solvers available too for solving the optimization. Let’s see how we can approach and solve a problem using PuLP. This involves a few steps: Understanding the problem Converting the problem into a mathematical expression Solving optimization Post-process to extract the solution In linear programming, the decision variables should be real, objective and constraints should be linear. We can also do Integer program