Machine Learning Algorithms - Overview
Linear Regression
Type : Supervised Learning
Target Attribute : Continuous variable
Pre-processing
-
Remove the replace the NULL and NA values
-
Check for Outliers and replace
-
Divide the data into train and test data set.
-
Check for multicollinearity.
-
Convert the categorical variables to numeric variables
-
Use feature selection techniques to select only the important features.
Forward selection
Backward selection
Hybrid feature selection
Build linear regression model (without regularization)Metrics to consider for evaluation
-
R square value – This is the proportion of the data explained by the model
-
Adjusted R square – This takes account of number of features
-
RMSE – Root Mean Squared Error – This gives the root of squared difference between the actual and predicted target variable
-
Mean Absolute Error
-
Mean Squared Error
-
AIC and BIC values
-
Residual Analysis – Error terms should be randomly distributed
-
Normalize the data and re-build the new model with regularization parameter.
-
Build new model with only significant features by performing feature selection.
-
Ask the customer to provide more samples of data.
If the model is under fitting
-
Build new model with polynomial feature or by using feature transformation and feature extraction.
Logistic Regression
Type : Supervised Learning
Target Attribute : Discrete Variable / Classes
Pre-processing
-
Remove the replace the NULL and NA values
-
Check for Outliers and replace
-
Divide the data into train and test data set.
-
Check for multicollinearity.
-
Convert the categorical variables to numeric variables
-
Use feature selection techniques to select only the important features.
Forward selection
Backward selection
Hybrid feature selection
Build logistic regression model (without regularization)Metrics to consider for evaluation
-
Confusion Matrix
-
Classification accuracy (Accuracy = (TP+TN)/(TP+TN+FP+FN)
-
Classification error (misclassification rate = (FP+FN)/(TP+TN+FN+FP)
-
Recall / Sensitivity / True positive rate = (TP/(TP+FN))
-
Specificity = TN/(TN+FP)
-
False positive rate = FP/(TN+FP)
-
Precision = TP/(TP+FP)Residual Analysis – Error terms should be randomly distributed
-
AUC value
In Logistic regression threshold of the probability to classify the classes can be selected by plotting the ROC curve based on the metric requirement.
Comments
Post a Comment