Student Reviews
( 5 Of 5 )
1 review
Video of Outlier detection and removal: z score, standard deviation Feature engineering tutorial python # 3 in Machine Learning course by codebasics channel, video No. 41 free certified online
If we have a dataset that follows normal distribution than we can use 3 or more standard deviation to spot outliers in the dataset. Many times these are legitimate values and it really depends on the situation if you want to remove them or not. But removing outliers can significantly increase the statistical power of machine learning model hence it is recommended that you treat outliers before building a model. Z score indicates how many standard deviation away a given sample is. We are going to go through all this theory and write python code to remove outliers from heights dataset that I have taken it from kaggle.
Link for kaggle dataset: https://www.kaggle.com/mustafaali96/weight-height
Code & Exercise: https://github.com/codebasics/py/blob/master/ML/FeatureEngineering/2_outliers_z_score/2_outliers_z_score.ipynb
CSV file for exercise: https://github.com/codebasics/py/tree/master/ML/FeatureEngineering/2_outliers_z_score/Exercise
Topics
00:00 Introduction
00:20 Exploratory analysis on a kaggle dataset
01:14 Plot histogram and bell curve
06:30 Use 3 standard deviation to remove outliers
12:14 Use Z score to remove outliers
17:39 Exercise
Do you want to learn technology from me? Check https://codebasics.io/ for my affordable video courses.
Website: https://codebasics.io/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub