Dealing With Missing Data - Multiple Imputation

Dealing With Missing Data - Multiple Imputation

ritvikmath

5 лет назад

46,012 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@BelleandBoos
@BelleandBoos - 28.11.2023 15:06

oh my gosh thank you so much!

Ответить
@newbie8051
@newbie8051 - 01.02.2023 12:09

Great explanation ! Thanks a lot

Ответить
@SirDerRosen
@SirDerRosen - 26.01.2023 23:26

Thank you very much :)

Ответить
@jessicalambert4019
@jessicalambert4019 - 12.01.2023 23:26

Thanks very clear and useful!

Ответить
@zhaoqian58
@zhaoqian58 - 05.12.2022 12:18

Thank you for producing this high-quaity video.

Ответить
@user-ek5rc5kl7o
@user-ek5rc5kl7o - 29.11.2022 07:37

This is so clearly explained. Thank you very much for this concise and informative video! I have a question. I believe the purpose of step 2 - calculating the standard deviation - is to confirm that the mean is a reliable one. What if the standard deviation is too large? Does it imply that the imputation method is not a reliable one and should not be adopted? Thank you!

Ответить
@jimpauls7765
@jimpauls7765 - 02.11.2022 00:33

Very informative! Thank you, good sir :)

Ответить
@hamishthecat666
@hamishthecat666 - 06.09.2022 06:59

How does PMM identify nearby candidates when there are a mixture of numeric and categorical variables? Thanks :)

Ответить
@sean_gruber
@sean_gruber - 07.08.2022 00:02

VERY clear explanation. Thank you!

Ответить
@user-ey2np8ff8k
@user-ey2np8ff8k - 17.07.2022 10:50

This is an amazing video. Thank you so much. Do we have to check the assumptions for linear regression for each model for each imputed variable?

Ответить
@alexslayerking
@alexslayerking - 14.03.2022 23:18

This is an outstanding explanation. Thank you so much for making this.

Ответить
@judewells1
@judewells1 - 09.03.2022 13:10

It wasn’t apparent to me why this estimator would be less biased than a single imputation, you mentioned that doing multiple regressions and the aggregating ‘washes away the noise’ but each of your individual regressions would also be more noisy than a single regression that uses the whole dataset - so how do I know that in the aggregate they are less noisy than a single regression?

Ответить
@lucavisconti1872
@lucavisconti1872 - 27.02.2022 12:24

Thanks for the practical example, not clear to me, at the end which value we have to use to fill in the missing value with the multiple imputation method. Could you please clarify?

Ответить
@ayselceferzade8587
@ayselceferzade8587 - 20.02.2022 23:17

clearly explained! thanks a lot!

Ответить
@MyMy-tv7fd
@MyMy-tv7fd - 30.01.2022 19:57

very clear and easy to follow thanks, but will we not get as good a result by taking one regression sample of 250 data items, as opposed to five sets of fifty, then taking the mean of the means?

Ответить
@raterake
@raterake - 05.10.2021 17:42

Thanks for the great video! Question: suppose I have 5 different random samples with which I can get 5 regressions, and then \mu_1, ..., \mu_5, to find an aggregate mean \mu_A. Why not just pool those 5 data sets into one large data set and compute the grand mean \mu_B that way? Wouldn't my answer \mu_B be more precise (less variable) than just taking the average of the 5 means to get \mu_A?

Ответить
@leoncioblp
@leoncioblp - 12.09.2021 09:51

Wouldn't this be problematic if your objective with the dataset is precisely to demonstrate if there is any relationship (like a linear relationship) between those 2 variables? Filling a missing value through a method which assumes the very same linear realtionship you are trying to demosntrate would actually be begging the question, isn't it?

Ответить
@phumlanimbabela-thesocialc3285
@phumlanimbabela-thesocialc3285 - 10.08.2021 21:51

Thank you very much.

Ответить
@rorysamuels2829
@rorysamuels2829 - 02.08.2021 21:28

Thanks for the video! If the subsets are random, all the estimators are unbiased right? The aggregated estimator would just have lower variability.

Ответить
@sultanmehmood5022
@sultanmehmood5022 - 19.07.2021 20:08

Data for 1.7 & 2.1 mi is not, prima faci true

Ответить
@user-sn1dl7vm6w
@user-sn1dl7vm6w - 14.07.2021 16:14

Sir,
you said we need from 5 to 10 models. How to calculate the exact needed number?
thank you

Ответить
@ericazombie793
@ericazombie793 - 29.06.2021 08:52

Very clear!

Ответить
@davidrussell3433
@davidrussell3433 - 18.04.2021 17:46

Very helpful, thank you!

Ответить
@amiriqbal1871
@amiriqbal1871 - 18.04.2021 04:59

I was struggling with the concept, but your video made it crystal clear to me, thanks

Ответить
@robertcsalodi1207
@robertcsalodi1207 - 19.03.2021 12:58

This explanation is awesome! Congratulations!

Ответить
@ThuHuongHaThi
@ThuHuongHaThi - 16.02.2021 17:53

A thousand thanks, your explanation is very easy to understand, it's really helpful.

Ответить
@bhushantayade7984
@bhushantayade7984 - 09.12.2020 17:15

Amazing sir. It's really helpful.

Ответить
@andreibarbulescu7812
@andreibarbulescu7812 - 21.11.2020 20:36

Isn't it actually even more complicated than that? Isn't it that for each regression, instead of imputing the missing fine value with the value predicted by the regression we actually randomly sample from the distribution of fine values around that predicted value (the distribution of fine conditional on distance)? This adds even more of the uncertainty involved in the guess we are making to our imputation process.

Ответить
@yochisimon2652
@yochisimon2652 - 19.11.2020 18:30

This was so clear and easy to understand! Thank you!

Ответить
@claymarzobestgoofy
@claymarzobestgoofy - 15.11.2020 01:40

Can you actually do standard deviation? Won't that just reduce the sd for each regression by adding a bunch of point that perfectly fit the regression?

Ответить
@brendali5803
@brendali5803 - 14.10.2020 08:51

Great job!

Ответить
@qwertyuiop-qy6hb
@qwertyuiop-qy6hb - 24.08.2020 16:41

Great explanation, thanks.
I have done many retrospective clinical research projects and I have never dealt with missing data. I always left these blank knowing that they will automatically be excluded from analysis.
I believe leaving these missing data unfilled is better to avoid any chance of bias influenced by data of other patients in the study cohort.
What do you think?
Now looking at your clear video I am thinking about this approach as well for future projects.
I am not a statistician and I've done all these while in training.

Ответить
@AnkushSharma-zv5hv
@AnkushSharma-zv5hv - 08.08.2020 21:38

thank you so much

Ответить
@chancesofrain6480
@chancesofrain6480 - 26.07.2020 00:12

What we do with 5 imputations that have been calculated? which of them can be considered as the imputed value finally if we want just to show this as a graph?

Ответить
@kyliestaraway2492
@kyliestaraway2492 - 20.07.2020 17:56

Can you do regression imputation next? I really loved this vid

Ответить
@Hari-888
@Hari-888 - 13.07.2020 00:24

I just wish that you were more neat instead of writing everything on that one paper and you keep moving it and it isnt clear what you are referring to when you point your finger on the paper as you've written everything in every nook and corner of that paper.

Ответить
@alfinpradana
@alfinpradana - 29.05.2020 05:10

Great explanation and excellent in describing how multiple imputations! But I have a question to ask, how could I choose the final value for the imputation if there is 5 value? should I go average 5 of the value instead, or is there any better approach? Thank You

Ответить
@joefishq11
@joefishq11 - 24.05.2020 20:45

Great explanation! But one that also seems at odds with what I'm reading from other sources, which make it sound like parameters in the model estimating the outcome are what get randomly selected for each iteration, not the observations used to make the prediction.
Is what I'm describing an alternative approach to the same thing, or am I misunderstanding the approach?

Ответить
@tracykakyoalexis2155
@tracykakyoalexis2155 - 01.05.2020 17:15

This was my aaahaaa moment. Thank you!

Ответить
@ayakhaled5316
@ayakhaled5316 - 25.04.2020 07:30

TTTTHhHHHHhhaaaannnnnkkkkk yooooooooooooooooooooooooou very very very much

Ответить
@elkalaiibrahim365
@elkalaiibrahim365 - 19.02.2020 10:10

~2~Thanks for the clear explanation. One thing I'm struggling to understand is when you are running multiple iterations, say 5, how are the different sets of data points generated? In your example, you fit lines among 50 data points. Do you randomly select 50 data points among those that have non-missing value in the raw dataset?

Ответить
@keeszethof6272
@keeszethof6272 - 17.02.2020 20:25

Thank you!

Ответить
@alecvan7143
@alecvan7143 - 01.02.2020 22:42

Awesome!

Ответить
@emicat7045
@emicat7045 - 15.01.2020 12:19

Thanks you very much! love your videos, they were always clearly explained.

Ответить
@mehmetkaya4330
@mehmetkaya4330 - 07.12.2019 20:16

Very very clear. Very helpful. Thank you!

Ответить
@f2gms647
@f2gms647 - 29.10.2019 22:30

I found it confusing !! Especially you move the paper up n down when you talk!

Ответить
@diptikalyan
@diptikalyan - 17.09.2019 08:53

It would be great if you can share links to some of the papers or books that you refer here.

Ответить
@me3jab1
@me3jab1 - 06.07.2019 04:51

thank you

Ответить
@PedroRibeiro-zs5go
@PedroRibeiro-zs5go - 25.04.2019 13:30

Thanks! That was a really nice explanation!!

Ответить
@bevansmith3210
@bevansmith3210 - 20.03.2019 16:39

Thank you very much. Quick question, which imputed values do you end up leaving in the dataset for further analysis. Say now I want to impute values to be used later for a variety of machine learning applications. Surely, I cant use multiple imputation every time I want to implement a new machine learning model and measure a metric?

Ответить