2 Years of My Research Explained in 13 Minutes

2 Years of My Research Explained in 13 Minutes

Edan Meyer

3 месяца назад

57,235 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@ladyravendale1
@ladyravendale1 - 20.07.2024 08:48

Very interesting video, I enjoyed it a ton

Ответить
@mac-bs1ng
@mac-bs1ng - 20.07.2024 09:55

If you can really encode the world state is as little as 32 values, it sounds more like we have proven that there exists a manifold in the subspace that DOES provide a working solution and that VQVAEs can converge on that but autoencoders cant. So I feel like the question is more like: why CANT autoencoders converge on that solution? Its within the networks space after all.

The only thing I can think of is RL generally tend to deal more so with deterministic, discrete time step datasets where theres literally an integer number of possible world states, the majority of which arent relevant or can be grouped together. So the actual latent manifold is within this super low dimension, which causes traditional encoders to struggle. Kinda like how technically the convolution layers of a CNN can be described as a regular linear layer, but sparse. Yet CNNs but perform NNs massively in the image domain.

Ответить
@richardbloemenkamp8532
@richardbloemenkamp8532 - 20.07.2024 14:51

Great work, great approach, great video. Very intriguing where the difference really originates from. I always like mini-grid approaches.

Ответить
@ElijahGalahad
@ElijahGalahad - 20.07.2024 18:25

Thanks for sharing such an interesting exploration of a few problems that also intrigued me. I quite like the part that you mentioned about "you never know the secret sauce of xxx" where xxx refers to some great/famous research. Thanks for your endeavour in finding out "the secret sauce"!

Ответить
@ElijahGalahad
@ElijahGalahad - 20.07.2024 18:29

My takeaway: discrete latent space models more accurately with less capacity, and enables faster adaptation of policy learned on it.
If would be better if you shed more light on why discrete latent space models have such benefits or provide some links? Thanks!

Ответить
@lachlanperrier2851
@lachlanperrier2851 - 21.07.2024 02:38

Doesnt a taking the mode of a Monte Carlo tree search for the continuous space solve this problem?

Ответить
@akkokagari7255
@akkokagari7255 - 21.07.2024 06:09

THANK YOU SO MUCH
I figured it probably existed but I never new the concept was called "Model-based reinforcement learning"
cool video

Ответить
@richardniu2450
@richardniu2450 - 21.07.2024 14:49

Is there any chance that this "discreteness" you are focusing on has some relations to "sparsity"?
I am thinking something like, can we consider a sparse representation of world as collection of multiple discrete representations, where only one discrete representation is active at any time (e.g., 16 discrete 16 bit representations make up a spare 256 bit representation, where at any time at most 16 bits are active).
I am thinking this cause biological neural networks are famous for their sparsity. Combining with your conclusion, can we even say that biological brains are insanely multi modal cause they are packed with discrete representation of a very small aspect of the world (an instance of your game with only hundreds of configurations), but they add up to be a good model of the whole world?

Ответить
@strangelaw6384
@strangelaw6384 - 21.07.2024 19:47

So a world model performs better when trained using discrete representations? Is there some kind of mathematical analogy between this and the quantized nature of the real world at a microscopic level?

Ответить
@ivandeneriev7500
@ivandeneriev7500 - 21.07.2024 20:12

This seems like such a simple question:
Continuous models can be more rich than discrete ones but need more data to work

Ответить
@Talec-7
@Talec-7 - 22.07.2024 18:49

I don't know much but I think it not because of a lack of model capacity, but is because integer representation fits more neatly into the binary environment, with continuous representation small errors compound, but with integers the errors don't compound because they are not off by 0.01 they are exactly 0 or 1, as you add more capacity, they are able to over fit and ignore this integer benefit.

Ответить
@KspManiac
@KspManiac - 22.07.2024 22:43

Have you tried modeling this in an environment where there are multiple agents and communication is possible? For example, distributed learning systems. I'm curious to see how the discrete/continuous embedding outcomes change. Also, what would happen if the embedding dimensionality is increased and only the input and output layers are trained?

Ответить
@coopercoldwell
@coopercoldwell - 23.07.2024 04:10

This reminds me of Chris Olah’s “Toy Models of Superposition” paper, where the claim is made that sparsity increases a model’s propensity to “superimpose” features to represent them more compactly, which makes the model behave like a much larger model.

Ответить
@shilohshahan2046
@shilohshahan2046 - 23.07.2024 06:28

I mean hey, you got my subscribe.

Ответить
@CC-1.
@CC-1. - 23.07.2024 13:02

It’s similar to using maximum and minimum functions in mathematics for various tasks ( if your counter and button that increases the counter are offset, you can use the maximum value of the counter to get the result). Instead of an offset, you might use clamping effects, where values are restricted to 0 or 1 rather than more precise values. Given that the environment may introduce noise, especially for modal values, it could be easier to obtain values, though it might be coincidental. Additionally, editing fewer values is typically easier than editing many. While continuous adjustments can lead to better learning over time, it takes longer due to the complexity of optimizing many digits.


here by noise I mean Pesudo Noise possibly

Ответить
@MaxPicAxe
@MaxPicAxe - 23.07.2024 16:25

That was good, it's rare i watch an ai video to the end

Ответить
@judepuddicombe8748
@judepuddicombe8748 - 23.07.2024 22:42

Maybe snapping to discreet representations means the prediction accumulates less noise?

Ответить
@AnalKumar02
@AnalKumar02 - 23.07.2024 22:54

This is an excellent video showcasing your research. I wish more people make such videos of their papers (I know I am going to once my paper is published).

Ответить
@AlanZheng-t4q
@AlanZheng-t4q - 24.07.2024 00:25

GREAT project and COOL results!! I was wondering if you have any thoughts on the differences between the discrete stochastic states danijar used vs. the standard VQ-VAE? it seems like in DV3 they don't have a codebook?

Ответить
@ninjalutador8761
@ninjalutador8761 - 24.07.2024 08:25

If ur still doing research, it could be potentially interesting to incorporate the findings from the paper about Mamba since that they solved the complexity and vanishing/exploding gradient problem with rnn/rcnns. Maybe either ppo or the world model could perform better if they had long term memory.

Ответить
@hafidhrendyanto2690
@hafidhrendyanto2690 - 24.07.2024 15:08

Hi, in your last diagram about policy learning you use timesteps for the X axis. What does timesteps here mean and why don’t you use epoch?

Ответить
@calcs001
@calcs001 - 24.07.2024 20:23

fantastic! 🎉

Ответить
@robodoctor
@robodoctor - 25.07.2024 04:10

This is pretty interesting. You've earned a subscriber :) Thanks for making this into a video and sharing with the world!

I had a question- could it be possible that policy learning from discrete representations is better because discrete representation learning encodes environment dynamics faster than continuous representation learning? One way to verify this is to plot the autoencoder loss for discrete and continuous representations.

Ответить
@username.9421
@username.9421 - 25.07.2024 18:42

I have thought the same question you posed in the Research Paper before, and wondered if the performance of 1-hot discrete Representations might arise from the smaller distribution of Inputs for the Actor.

Have you thought about seeing what happens if you were to normalise the continuous Autoencoder‘s Output to Length 1 ? This would allow the Autoencoder to utilise the continuous space of each dismension, whilst also reducing variation in the actual scalars when two latent vectors are similar.

Ответить
@yensteel
@yensteel - 26.07.2024 10:06

Never imagine how critical Sudoku is to Reinforcement Learning!

Ответить
@WernerBeroux
@WernerBeroux - 26.07.2024 10:27

Most LLM also try reducing quantisation whilst keeping the scale as opposed to reducing the scale but keeping the quants. The (simple) extreme of that being binary. Would be interesting to see if binary is the ideal for learning speed compared to say ternary.

Ответить
@markdatton1348
@markdatton1348 - 26.07.2024 15:25

In order to represent the world better, could you make a relatively small but very high dimensional tensor network, to represent sort of "tiers" of outcomes? For example, one dimension of the tensor may represent how happy a result should be, ranging from unhappy to happy. Or angry. Etc. In that way, you could modify the interpretation of the world via dimensionality rather than pure scale?

Ответить
@R.GrantD
@R.GrantD - 28.07.2024 01:44

It seems like it would be worth attempting to use a combination of discreet and continuous in real world applications. Also, It may be that the better choice depends on which representation more naturally reflects the world being modeled.

Ответить
@noahchristensen3718
@noahchristensen3718 - 28.07.2024 17:56

Just watched this, and I love the visuals. I would really like to know more about A.I. and their use in pattern recognition, because I have a lot of data that I find strenuous to analyze. Some patterns I can identify, but unfortunately my small brain doesn't have the recognition skills to understand the interplay between them. I'd like to train or build a model which will coherently describe where they are coming from. If OP reads this, I would love any information that will bring me closer to understanding.

Ответить
@henrycook859
@henrycook859 - 28.07.2024 23:35

"sorry, thought I'd do a little subliminal messaging" LMAO

Ответить
@keshamix_
@keshamix_ - 29.07.2024 00:39

Simply wow! An amazing video. I’m not really in the field, not even a uni student yet, just curious about technology, but I feel like I understood most of what was mentioned there! And it only proves the point that you are an amazing narrator, good job!

Ответить
@feifeizhang7757
@feifeizhang7757 - 31.07.2024 03:43

What a great topic! It is what I need to listen ❤

Ответить
@rmt3589
@rmt3589 - 31.07.2024 08:19

This amazing, and puts an explanation to a concept I've been pondering on. Would be a great start to a self-simulating module for an AI, even if it only works on a projection like a video game state. You gained a new subscriber!

Ответить
@SirajFlorida
@SirajFlorida - 06.08.2024 01:07

Great work man. 🙌

Ответить
@angelorf
@angelorf - 07.08.2024 11:10

That's not a fair comparison. These VQ methods have a lot more non-linearities than non-VQ nets. I have no idea how to make a fair comparison, but it just might be due to nonlinearities of quantization that these nets have better representation capability.

Ответить
@AtifKARABONCUK
@AtifKARABONCUK - 08.08.2024 22:43

Thank you its really great video! do you have advices for begginners ?

Ответить
@gobblegobble2559
@gobblegobble2559 - 09.08.2024 05:42

Dude, your 13 minutes of video literally explained more than a whole semester at uni!!! Thanks!!!

Ответить
@rrogerx92III
@rrogerx92III - 11.08.2024 14:50

Great video!!

Ответить
@01174755
@01174755 - 12.08.2024 17:02

I was reading through the paper, and there's a particular sentence that I'm finding a bit challenging to fully grasp: 'Demonstrating that the successes of discrete representations are likely attributable to the choice of one-hot encoding rather than the “discreteness” of the representations themselves.'

I was under the impression that one-hot encoding is a form of discreteness. If one-hot encoding is effective, wouldn't that imply that the 'discreteness' aspect also plays a role?

Ответить
@ruiyangxu790
@ruiyangxu790 - 12.08.2024 23:26

A great contribution to the “cult”! ❤

Ответить
@LatelierdArmand
@LatelierdArmand - 13.08.2024 08:44

super interesting!! thanks for making a video about it

Ответить
@niceshotapps1233
@niceshotapps1233 - 13.08.2024 10:58

c) discrete representation better represents the world because the world itself (that you run your experiments on) is discrete

basically you introduce outside a priori knowledge about your problem into your solution through its architecture

continuous representation has one more thing to learn about your toy world, that it's discrete ... possibly that's why it takes longer

Ответить
@jordynfinity
@jordynfinity - 13.08.2024 17:52

I just subscribed because I'd love to hear more about your RL research.

Ответить
@HaroldSchranz
@HaroldSchranz - 14.08.2024 07:18

Cool video on a field which I am curious enough about to want to dabble in. So I am just wondering. Based on research on nonlinear models in a totally different field: sometimes less is more (manageable and faster to converge). Dimensionality and parameter count does not need to be truly high to capture the essence of a model. Of course efficiency of the approach used will affect the effort required - kind of similar to how different methods of numerical integration: quadrature or Monte Carlo can require adaptive and even importance sampling and course graining.

Ответить
@fredericfc
@fredericfc - 15.08.2024 13:09

🔢 where do all youse come from? ppl around me have no clue what an ai model is other than typing in a question on chatgpt. for over a year i tried to write a simple .py and got to train one neuron how to double a number with 99.9% accuracy. now it’s going to take me 1,000,000 years to understand this video.

Ответить
@phillipmorgankinney881
@phillipmorgankinney881 - 16.08.2024 08:00

Really appreciate u making this video, I love auto encoders.

Could the higher rate of learning on the discrete model be because the total space of representations on discrete models are dramatically smaller? If you imagine latent space modeling as some map of all possible representations to explore looking for the best possible representation, then a space where every vector is of N x 32 is much much smaller world than a space where every vector is N x 42,000,000,000. I imagine local optima are easier to stumble into

It's like if you were looking at an image, 32px x 32px, on your 1080p monitor. In a flash you see it's an elephant, and you can find its trunk. But f that same picture was 42Billion x 42Billion, displayed at true scale on your 1080p monitor, and someone asked you to find the elephant trunk... you're just too close to the elephant. You'll be doing a lot of scrolling around, following clues until you find what you're looking for

Ответить
@henlofrens
@henlofrens - 16.08.2024 14:55

a continuous variable cannot take on a value of +-inf, though maybe some python packages lets you cheese that, it's not proper math

Ответить
@ΜιχαήλΣάπκας
@ΜιχαήλΣάπκας - 20.08.2024 09:06

If you have discrete state space a discrete representation would be appropriate, no?

Ответить
@ai_outline
@ai_outline - 28.08.2024 19:51

Great video! I wish there were more computer science channels like this!! This being free is amazing. Thank you ❤️

Ответить