Impute missing values using KNNImputer or IterativeImputer

Data School

3 года назад

39,641 Просмотров

Скачать видео

Комментарии:

Data School - 17.11.2020 18:13

Thanks for watching! 🙌 Let me know if you have any questions about imputation and I'm happy to answer them! 👇

Ответить

Anum Gulzar - 14.10.2023 15:02

Respected Sir,
Can we multiple imputation in eviews9 for panel data?

Ответить

Akshat Rai Laddha - 01.10.2023 00:08

is this works for categorical features also ??

Ответить

Intelligence Junction - 19.09.2023 21:27

Thank you!

Ответить

Gisle Berge - 04.08.2023 11:59

No need to standardise the SibSp and Age columns (e.g. between 0 an 1) before the imputation process? Or is that not relevant here?

Ответить

sachin - 01.08.2023 18:56

thank you.
love the clarity in your explanation!

Ответить

Noradrenalin - 21.07.2023 18:07

Thank you, this is exactly what I need. Plus you've explained it very well!

Ответить

Isfan Tauhid - 28.06.2023 21:01

Can this apply on categorical data? Or for numerical only?

Ответить

Marco Portillo - 26.04.2023 05:59

Fantastic video !! 👏🏼👏🏼👏🏼 … thank you for spreading the knowledge

Ответить

Ling talfi - 08.01.2023 23:34

Thanks, that was very interesting.

Ответить

Eva Rdn - 28.12.2022 19:52

Hello ! Thank you very much for your interesting video ! Do you know where I can find a video like this one to know how many neighbors choose ?
Thank you very much

Ответить

Jong Cheul Kim - 05.09.2022 17:35

Thank you^^

Ответить

-o- - 02.09.2022 07:56

Question: If we impute values of a feature based on other features, wouldn't that increase the likelihood of multicollinearity?

Ответить

Hardik Vegad - 13.07.2022 17:20

Hey Kevin, quick question... should k in knn should always be odd... if yes than why and if no than why? as me in the interview... Thank for all your content.

Ответить

Rishi Singh - 23.06.2022 07:05

Thanks for sharing this!

Why cannot KNN imputer be used for categorical variables? KKN algorithms works with classification problems.

Ответить

loxa - 11.05.2022 16:02

This imputation return an array as the OHE want a dataframe. How can we solve this if we want to put both inside a pipepline?

Ответить

Whale G - 02.05.2022 11:48

It seems that we should definitely not try it in a large dataset. It takes forever.

Ответить

Ri Yaz - 13.04.2022 06:56

why don't you have 2M subscribers man ?

Ответить

loxa - 12.04.2022 05:31

What is the effect to the dataset after imputation? Any bias or something? I understand it's a mathematical way to insert a valueinto NaN but I feel there must be any effect on this action. Then, when do we need to remove NaN and when do we need to use imputation?

Ответить

M F - 09.04.2022 18:33

Awesome video, couldn't be clearer. Thanks

Ответить

dizetoot - 28.12.2021 16:02

Thanks for posting this. For features where there are missing values, should I be passing in the whole df to impute the missing values, or should I only include features that are correlated with the dependent variable I'm trying to impute?

Ответить

R A - 22.07.2021 22:41

In the example we have only 1 missing so the imputer is having "easy" mission. What if we had not only a few missing per this column/feature and we were facing "randomly" missing values for different col/features. How does the imputer decides to fill : which column first will be imputed and then based upon this filling it will advance to the "next best" (impute handling) column and fill in missing...and so on

Ответить

Soumya Banerjee - 20.05.2021 09:08

can iterative imputer and knn imputer works with only numerical values ? Or can it also impute string/alphanumeric values as well?

Ответить

Kumi - 09.05.2021 20:24

How to use KNN to interpolate time series data?

Ответить

Ashwin Krishnan - 05.05.2021 21:56

What do I use if the values are catagorical

Ответить

Prima Ezy - 05.05.2021 07:42

very nice video, however i want to ask, is the knn-imputer can use for data object (string )?

Ответить

Aniket Kulkarni - 04.05.2021 23:40

Thank you for such an amazing video!
I used <pd.get_dummies(df,columns = ['Employment.Type'], drop_first=True)> to encode my categorical data into numerical one and then ran the KNNImputer but its giving me Error - TypeError: invalid type promotion.

Any insights what might be going wrong?

Ответить

Sean Santiago - 03.05.2021 03:47

You are awesome man!! Saved me a lot of time yet again!!!!

Ответить

Levon9 - 05.03.2021 16:39

I really love your videos, they are just right, concise and informative, no unnecessary fluff. Thank you so much for these.

Ответить

matrix47 - 03.03.2021 16:58

How to handle missing categorical variables?

Ответить

Kek - 28.02.2021 21:39

Thank you!

Ответить

Eric Sims - 18.02.2021 02:21

Super helpful, as always. Is IterativeImputer the sklearn version of MICE?

Ответить

Shreyas B.S - 13.02.2021 10:59

I have one doubt ...which is first process missing value impuation or outlier removal?

Ответить

Hyukjung Kwon - 15.01.2021 13:20

What if the first column has a missing value? T
It is a categorical feature and it would be better if we use multivariate regression.
It has 0 or 1 but if we use KNNimputrr or IterativeImputer, it imputes as float value. I think there's the same question as mine in comments.

Ответить

WheatleyOS - 06.01.2021 04:27

I can't think of a realistic example of where KNNImputer is better than IterativeImputer, IterativeImputer seems much more robust.
Am I the only one thinking this?

Ответить

Moon Cake - 15.12.2020 22:19

Hi, I tried encoding my categorical variables (boolean value column) and then running the data through a KNNImputer but instead of getting 1's and 0's I got values inbetween those values, for example 0.4,0.9 etc. Is there anything I am missing, or is there any way to improve the prediction of this imputer ?

Ответить

Edward Chong - 28.11.2020 16:42

Kevin, how does it work if let's say B and C are both missing?

Ответить

lovejazzbass - 18.11.2020 20:08

Kevin, you just expanded my column transformation vocabulary. Thank you.

Ответить

Susmit Vengurlekar - 17.11.2020 20:56

My idea: line plot of cols which have null values with other continuous cols and box plot for discrete and then impute constant value according to result of this process, like say, Pclass is 2, so you impute median fare of Pclass 2 wherever fare is missing and Pclass is 2. Basically similar to iterative imputer, only manual work, slow but maybe better results because of human knowledge about problem statement. What are your thoughts about this idea ?

Ответить