### PuLP - Linear Programming with Python

Are you wondering whether this would be useful for Data Science or not? The answer is “Yes”. This article will be highly useful in Data Science. In this article, you will learn how to use PuLP and where this can be used.

When solving Data Science problems, we often come to a point where there are a lot of constraints and you are trying to maximize or minimize the cost. Here, you can use PuLP to solve for an optimized solution. The PuLP helps us to write mathematical expressions in Python language. There are many commercial and other open-source solvers available too for solving the optimization.

Let’s see how we can approach and solve a problem using PuLP. This involves a few steps:
• Understanding the problem
• Converting the problem into a mathematical expression
• Solving optimization
• Post-process to extract the solution
In linear programming, the decision variables should be real, objective and constraints should be linear. We can also do Integer programming, in which some or all of the decision variables are integers.

Now, we will discuss how to convert a problem into a linear program and solve it using PuLP. I will guide you through a simple example, which can be extended to a complicated problem.

For example, you went to a shop. There are ten different chocolates. Your goal is to buy chocolates such that the number of calories is maximized. Some of the constraints are:
• You can buy only 5 chocolates
• The cost should be less than or equal to 100
How to solve this? In general, we take 10 variables refering to each chocolate type. After that, we define the above linear constraints and solve the optimization. Let's do the same thing step by step using PuLP.

Let's code it in Python using PuLP. We will start by importing libraries.

```# Import libraries
import pulp
import pandas as pd```

First of all, we should initialize PuLP object and choose whether we want to minimize or maximize.

```# Initalize PuLP object
optimization_model = pulp.LpProblem('MaximizeCalories', pulp.LpMaximize)
```

I have defined few constants below. The number of chocolates are 10. I took some random cost and calories for each chocolate, in two separate lists.

```# Constants
number_of_chocolates = 10
cost_of_each_chocolate = [10, 16, 20, 25, 22, 18, 35, 40, 40, 24]
calories_of_each_chocolate = [100, 85, 200, 165, 78, 45, 80, 105, 65, 120]
```

Now, we create 10 decision variables (c_0, c_1, c_2, c_3, c_4, c_5, c_6, c_7, c_8, c_9) that will be used to define optimization function and constraints. Each decision variable refers to a chocolate. We store them in a list for easy accessibility. Also, please note that we are defining integer variables to be either 0(lowBound) or 1(upBound), so that the chocolate will be selected or rejected. We can also create variables that hold real numbers. We can choose it based on the problem we are solving.

```# Create decision variables
decision_variables = []
for i in range(number_of_chocolates):
variable = 'c' + '_' + str(i)
variable = pulp.LpVariable(str(variable), lowBound = 0, upBound = 2, cat= 'Binary')
decision_variables.append(variable)
```

After creating the decision variables, we define an optimization function and add it to the PuLP object(optimization_model). Our aim is to maximize the number of calories. So, we multiply each decision variable with it's respective calorie value. The optimization function would look like this. (c_0*v_0 + c_1*v_1 + c_2*v_2 + c_3*v_3 + c_4*v_4 + c_5*v_5 + c_6*v_6 + c_7*v_7 + c_8*v_8 + c_9*v_9). "c" refers to the decision variable and "v" refers to it's calorie value. After optimization, only some of the decision variables will be one. Their sum of calorie value is the maximum value.

```# Optimization function
optimization_function = ""
for i in range(number_of_chocolates):
formula = decision_variables[i] * calories_of_each_chocolate[i]
optimization_function += formula

optimization_model += optimization_function
```

It's time to add our constraints to the optimization model. The first constraint is "We can select only 5 chocolates". The mathematical expression of this constraint would look as (c_0 + c_1 + c_2 + c_3 + c_4 + c_5 + c_6 + c_7 + c_8 + c_9 = 5). This constraint will make sure that we select only 5 variables (chocolates).

The second constraint is "The cost should be less than or equal to 100". The mathematical form of this constraint is similar to the optimization function(c_0*cost_0 + c_1*cost_1 + c_2*cost_2 + c3*cost_3 + c4*cost_4 + c5*cost_5 + c6*cost_6 + c7*cost_7 + c8*cost_8 + c9*cost_9 <= 100). "c" refers to the decision variable and "cost" refers to the price of that chocolate.

```# Constraints
formula = ""
for i in range(number_of_chocolates):
formula += (decision_variables[i])
optimization_model += (formula == 5)

formula = ""
for i in range(number_of_chocolates):
formula += (cost_of_each_chocolate[i] * decision_variables[i])
optimization_model += (formula <= 100)
```

We have defined our optimization function and constraints. It's time to solve the optimization. We can do this by calling the solve method.

```# Solve optimization
optimization_result = optimization_model.solve()
```

The solution can be optimal, not solved or infeasible. Let's check what we get. We will also print the maximum number of calories we can get.

```print("Status:", pulp.LpStatus[optimization_model.status])
print("Total number of calories: ", pulp.value(optimization_model.objective))
```
Status: Optimal
Total number of calories: 670.0

The solution is optimal in this case. But, when we solve some complicated problems with alot of constraints, we may end up with "infeasible" solution. In that case, we need to either change the optimization function or constraints to make it optimal. It depends on the use case you are trying to solve.

```# Let's see which decision variables are selected by the optimization
chocolate_is_selected = []
for var in optimization_model.variables():
if (var.varValue == 1):
print(var, end=" ")
chocolate_is_selected.append(var.varValue)
print("\n")
print(chocolate_is_selected)
```
c_0 c_1 c_2 c_3 c_9
[1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]

The optimization selected 5 decision variables that maximizes the number of calories(i.e 670 calories) by just spending less than 100 rupees. We will see how much we need to spend for those 5 chocolates below.

```total_cost = (pd.Series(cost_of_each_chocolate)*pd.Series(chocolate_is_selected)).sum()
print(total_cost)
```
95

This is the end of this article. I hope you enjoyed reading this and learnt something new. If you have any queries, comment in the comments section below. I would be more than happy to answer your queries.

Thank you for reading my blog and supporting me. Stay tuned for my next article. If you want to receive email updates, don’t forget to subscribe to my blog. Keep learning and sharing!!

GitHub: https://github.com/Abhishekmamidi123
Kaggle: https://www.kaggle.com/abhishekmamidi
If you are looking for any specific blog, please do comment in the comment section below.

### Google Colab - Increase RAM upto 25GB

Google colab is a free jupyter notebook that is hosted on Google cloud servers. We can use CPU, GPU and TPU for free. It helps you to write and execute your code. You can directly access this through your browser. If you want to use Google Cloud/AWS, it requires hell lot of setup. You have to spin a cluster, create a notebook and then use it. But, Google colab will be readily available for you to use it. You can also install libraries from the notebook itself. These notebooks are very useful for training large models and processing huge datasets. Students and developers can make use of this because it’s very difficult for them to afford GPUs and TPUs. I was trying to run a memory heavy job. The notebook crashed. Then, I came to know how I can increase the RAM. So, I thought of sharing it in my blog. There are some constraints with the notebook. You can run these notebooks for not more than 12 hours and you can use only 12 GB RAM. There is no direct method or button t

### Skills required to become a Data Scientist

Data Science is one of the hottest areas in the 21st century. We can solve many complex problems using a huge amount of information. The way electricity has changed the world, information helps us to make our lives easier and comfortable. Every second, an enormous amount of data is being generated. The data may be in the form of text, image, speech or tabular. As there is a lot of growth in the field of Data Science, in recent years, most of the companies have started building their own Data Science teams to get benefited from the information they have. This has created a lot of opportunities and demand for Data Science in different domains. For the next 5+ years, this demand would continue to increase. If we have the right skills, companies are ready to offer salaries more than the market standards. So, this is the right time to explore and gain skills which enables you to enter into this field. We have discussed the importance and demand for data science in the market. Let’s disc

### Top 35 frequently asked Data Science interview questions

Interviews are very stressful. We should prepare for the worse. So, we have to plan accordingly in order to crack them. In this blog, you will get to know the type of questions that will be asked during the interview. It also depends on the experience level and the company too. This blog is mainly focused on entry-level Data Science related jobs. If you haven’t read my previous blog-posts, I highly recommend you to go through them: Skills required to become a Data Scientist How to apply for a Data Science job? First of all, you must be thorough with your resume, mainly your Internship experience and academic projects. You will have at least one project discussion round. Take mock interviews and improve your technical and presentation skills, which will surely help in the interviews. Based on my experience, I have curated the topmost 35 frequently asked Data Science questions during the interviews. Explain the Naive Bayes classifier? In case of Regression, how do y

### Exploratory Data Analysis and Data Preprocessing steps

Exploratory Data Analysis is the foremost step while solving a Data Science problem. EDA helps us to solve 70% of the problem. We should understand the importance of exploring the data. In general, Data Scientists spend most of their time exploring and preprocessing the data. EDA is the key to building high-performance models. In this article, I will tell you the importance of EDA and preprocessing steps you can do before you dive into modeling. I have divided the article into two parts: Exploratory Data Analysis Data Preprocessing Steps Exploratory Data Analysis Exploratory Data Analysis(EDA) is an art. It’s all about understanding and extracting insights from the data. When you solve a problem using Data Science, it is very important to have domain knowledge. This helps us to get the insights better according to the business problem. We can find the magic features from the data, which boost the performance. We can do the following with EDA. Get comfortable with

### My Data Science Journey and Suggestions - Part 1

I always wanted to share my detailed Data Science journey. So, I have divided the whole journey from BTech first year to final year into 3 parts. I will share everything, without leaving a single detail, starting from my projects, internships to getting a job. You can follow the path that I have followed if you like my journey or create your own path. In 2015, I got a seat in Electronics and Communication Engineering (ECE), IIIT Sri City through IIT JEE Mains. Because of my rank in JEE Mains, I couldn’t get into the Computer Science department. I wanted to shift to Computer Science after my first year, but couldn’t due to some reasons. In our college, we have only two branches, CSE and ECE. For the first three semesters, the syllabus was the same for both the departments except for a few courses. This helped me to explore Computer Science. In the first 3 semesters, I took Computer Programming, Data Structures, Algorithms, Computer Organization, Operation Systems courses, wh

### SHAP - An approach to explain the output of any ML model (with Python code)

Can we explain the output of complex tree models? We use different algorithms to improve the performance of the model. If you input a new test datapoint into the model, it will produce an output. Did you ever explore which features are causing to produce the output? We can extract the overall feature importance from the model, but can we get which features are responsible for the output? If we use a decision tree, we can at least explain the output by plotting the tree structure. But, it’s not easy to explain the output for advanced tree-based algorithms like XGBoost, LightGBM, CatBoost or other scikit-learn models. To explain the output for the above algorithms, researches have come up with an approach called SHAP. SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, uniting several previous methods and representing the only possible consistent and locally accurate ad

### Complete Data Science Pipeline

Data Science is not just modelling. To extract value out from Data Science, it needs to be integrated with business and deploy the product to make it available for the users. To build a Data Science product, it needs to go through several steps. In this article, I will discuss the complete Data Science pipeline. Steps involved in building a Data Science product: Understanding the Business problem Data Collection Data Cleaning Exploratory Data Analysis Modelling Deployment Let us discuss each step in detail. Understanding the business problem: We use Data Science to solve a problem. Without understanding the problem, we can’t apply data science and solve it. Understanding the business is very important in building a data science product. The model which we build completely depends on the problem we are solving. If the requirement is different, we need to adjust our algorithm such that it solves the problem. For example, if we are build

### How to apply for a Data Science job?

Job search is one of the painful tasks. We have to invest a lot of time to get placed in one of the best companies, we were dreaming for. The demand for Data Scientists is increasing over the years, and we have to stand out of the crowd to get a job. In this post, I will guide you on “How to apply for a Data Science job?”. I have divided the blog post into the following: What are the skills required for a Data Science job? How to build a good Data Science profile/resume? What are the different ways of applying for a Data Science job? What are the skills required for a Data Science job? I have created a blog-post on “Skills required to become a Data Scientist”, last week. I would suggest going through the previous blog before you go to the next section. How to build a good Data Science profile/resume? After acquiring the necessary skills, it is required to maintain a good Data Science profile. Your presence on the social network makes a difference too. Some tips

### A year of experience as a Data Scientist

On June 3rd 2019, I joined ZS Associates as a Data Scientist after graduating from IIIT SriCity. It was my first job and was very happy to get placed as a Data Scientist through lateral hiring. If you haven’t read my Data Science journey, please read it here :) After joining, I had some awesome moments that I never experienced since childhood. I got a chance to stay in a 4 star or 5 star hotel multiple times. I got a chance to travel by flight. I travelled to Pune, Delhi and Bangalore. I saw Vizag, Pune, Delhi and Bangalore airports in less than six months. I loved it. A few office parties, outings during Diwali and New year celebrations. Above are some of the moments that I can never forget in my life. My first job allowed me to experience these first time moments. Enjoying life is more important than anything. If you don’t enjoy your life, you cannot achieve anything big. Okay, let’s go into the main topic in detail. Me (inner voice during BTech):

### Building ML Pipelines using Scikit Learn and Hyper Parameter Tuning

Data Scientists often build Machine learning pipelines which involves preprocessing (imputing null values, feature transformation, creating new features), modeling, hyper parameter tuning. There are many transformations that need to be done before modeling in a particular order. Scikit learn provides us with the Pipeline class to perform those transformations in one go. Pipeline serves multiple purposes here (from documentation ): Convenience and encapsulation : You only have to call fit and predict once on your data to fit a whole sequence of estimators. Joint parameter selection : You can grid search over parameters of all estimators in the pipeline at once (hyper-parameter tuning/optimization). Safety : Pipelines help avoid leaking statistics from your test data into the trained model in cross-validation, by ensuring that the same samples are used to train the transformers and predictors. In this article, I will show you How to build a complete pi