Posts

Machine Learning Algorithms - Overview

Linear Regression Type                      : Supervised Learning Target Attribute  : Continuous variable Pre-processing Remove the replace the NULL and NA values Check for Outliers and replace Divide the data into train and test data set. Check for multicollinearity. Convert the categorical variables to numeric variables Use feature selection techniques to select only the important features.                  Forward selection                  Backward selection                  Hybrid feature selection Build linear regression model (without regularization) Metrics to consider for evaluation R square value – This is the proportion of the data explained by the model Adjusted R square – This takes account of number of features RMSE – Root Mean Squared Error – This gives the root of squared difference between the actual and predicted target variable Mean Absolute Error Mean Squared Error AIC and BIC values Residual An

Ubuntu Cheat Sheet

Ubuntu Cheat Sheet Note : This blog is is still under construction. This blog gives you some of the common commands or tasks which you would be performing everyday in Ubuntu machine. This is mainly for the beginners who just moved from Windows to Ubuntu. How to list the installed software in Ubuntu? dpkg --list How to install a package? sudo apt-get install <package_name> How to completely remove a package from Ubuntu system? sudo apt-get purge <package_name> xkill

Machine Learning Basics

In this blog I give an overview of the Machine Learning Project Flow. Every Machine Learning project involves the below steps: Understand the client requirement / Problem statement Data Understanding Data Collection (CSV file, logs, sensor data, data from SQL etc) Data Explore Data Quality Analysis : Analyse the data such that the sufficient information or data is available to prepare the plan for building ML model. Data Preparation Cleaning the data :  Check for NULL and NA values in the dataset, and take necessary actions. a. Remove if dataset is huge and removing a samples doesn’t affect the quality of the data. b. Impute the missing values with mean, median or KNN. Outliers :  Might be due to human error. This can be checked by using boxplot or the summary statistics of the data. Remove or replace accordingly. Sample Distribution : Check how the features are distribute using histogram. Divide the data into train and test data set. Feature Selection :  Thi