Комментарии:
Very interesting video, I enjoyed it a ton
ОтветитьIf you can really encode the world state is as little as 32 values, it sounds more like we have proven that there exists a manifold in the subspace that DOES provide a working solution and that VQVAEs can converge on that but autoencoders cant. So I feel like the question is more like: why CANT autoencoders converge on that solution? Its within the networks space after all.
The only thing I can think of is RL generally tend to deal more so with deterministic, discrete time step datasets where theres literally an integer number of possible world states, the majority of which arent relevant or can be grouped together. So the actual latent manifold is within this super low dimension, which causes traditional encoders to struggle. Kinda like how technically the convolution layers of a CNN can be described as a regular linear layer, but sparse. Yet CNNs but perform NNs massively in the image domain.
Great work, great approach, great video. Very intriguing where the difference really originates from. I always like mini-grid approaches.
ОтветитьThanks for sharing such an interesting exploration of a few problems that also intrigued me. I quite like the part that you mentioned about "you never know the secret sauce of xxx" where xxx refers to some great/famous research. Thanks for your endeavour in finding out "the secret sauce"!
ОтветитьMy takeaway: discrete latent space models more accurately with less capacity, and enables faster adaptation of policy learned on it.
If would be better if you shed more light on why discrete latent space models have such benefits or provide some links? Thanks!
Doesnt a taking the mode of a Monte Carlo tree search for the continuous space solve this problem?
ОтветитьTHANK YOU SO MUCH
I figured it probably existed but I never new the concept was called "Model-based reinforcement learning"
cool video
Is there any chance that this "discreteness" you are focusing on has some relations to "sparsity"?
I am thinking something like, can we consider a sparse representation of world as collection of multiple discrete representations, where only one discrete representation is active at any time (e.g., 16 discrete 16 bit representations make up a spare 256 bit representation, where at any time at most 16 bits are active).
I am thinking this cause biological neural networks are famous for their sparsity. Combining with your conclusion, can we even say that biological brains are insanely multi modal cause they are packed with discrete representation of a very small aspect of the world (an instance of your game with only hundreds of configurations), but they add up to be a good model of the whole world?
So a world model performs better when trained using discrete representations? Is there some kind of mathematical analogy between this and the quantized nature of the real world at a microscopic level?
ОтветитьThis seems like such a simple question:
Continuous models can be more rich than discrete ones but need more data to work
I don't know much but I think it not because of a lack of model capacity, but is because integer representation fits more neatly into the binary environment, with continuous representation small errors compound, but with integers the errors don't compound because they are not off by 0.01 they are exactly 0 or 1, as you add more capacity, they are able to over fit and ignore this integer benefit.
ОтветитьHave you tried modeling this in an environment where there are multiple agents and communication is possible? For example, distributed learning systems. I'm curious to see how the discrete/continuous embedding outcomes change. Also, what would happen if the embedding dimensionality is increased and only the input and output layers are trained?
ОтветитьThis reminds me of Chris Olah’s “Toy Models of Superposition” paper, where the claim is made that sparsity increases a model’s propensity to “superimpose” features to represent them more compactly, which makes the model behave like a much larger model.
ОтветитьI mean hey, you got my subscribe.
ОтветитьIt’s similar to using maximum and minimum functions in mathematics for various tasks ( if your counter and button that increases the counter are offset, you can use the maximum value of the counter to get the result). Instead of an offset, you might use clamping effects, where values are restricted to 0 or 1 rather than more precise values. Given that the environment may introduce noise, especially for modal values, it could be easier to obtain values, though it might be coincidental. Additionally, editing fewer values is typically easier than editing many. While continuous adjustments can lead to better learning over time, it takes longer due to the complexity of optimizing many digits.
here by noise I mean Pesudo Noise possibly
That was good, it's rare i watch an ai video to the end
ОтветитьMaybe snapping to discreet representations means the prediction accumulates less noise?
ОтветитьThis is an excellent video showcasing your research. I wish more people make such videos of their papers (I know I am going to once my paper is published).
ОтветитьGREAT project and COOL results!! I was wondering if you have any thoughts on the differences between the discrete stochastic states danijar used vs. the standard VQ-VAE? it seems like in DV3 they don't have a codebook?
ОтветитьIf ur still doing research, it could be potentially interesting to incorporate the findings from the paper about Mamba since that they solved the complexity and vanishing/exploding gradient problem with rnn/rcnns. Maybe either ppo or the world model could perform better if they had long term memory.
ОтветитьHi, in your last diagram about policy learning you use timesteps for the X axis. What does timesteps here mean and why don’t you use epoch?
Ответитьfantastic! 🎉
ОтветитьThis is pretty interesting. You've earned a subscriber :) Thanks for making this into a video and sharing with the world!
I had a question- could it be possible that policy learning from discrete representations is better because discrete representation learning encodes environment dynamics faster than continuous representation learning? One way to verify this is to plot the autoencoder loss for discrete and continuous representations.
I have thought the same question you posed in the Research Paper before, and wondered if the performance of 1-hot discrete Representations might arise from the smaller distribution of Inputs for the Actor.
Have you thought about seeing what happens if you were to normalise the continuous Autoencoder‘s Output to Length 1 ? This would allow the Autoencoder to utilise the continuous space of each dismension, whilst also reducing variation in the actual scalars when two latent vectors are similar.
Never imagine how critical Sudoku is to Reinforcement Learning!
ОтветитьMost LLM also try reducing quantisation whilst keeping the scale as opposed to reducing the scale but keeping the quants. The (simple) extreme of that being binary. Would be interesting to see if binary is the ideal for learning speed compared to say ternary.
ОтветитьIn order to represent the world better, could you make a relatively small but very high dimensional tensor network, to represent sort of "tiers" of outcomes? For example, one dimension of the tensor may represent how happy a result should be, ranging from unhappy to happy. Or angry. Etc. In that way, you could modify the interpretation of the world via dimensionality rather than pure scale?
ОтветитьIt seems like it would be worth attempting to use a combination of discreet and continuous in real world applications. Also, It may be that the better choice depends on which representation more naturally reflects the world being modeled.
ОтветитьJust watched this, and I love the visuals. I would really like to know more about A.I. and their use in pattern recognition, because I have a lot of data that I find strenuous to analyze. Some patterns I can identify, but unfortunately my small brain doesn't have the recognition skills to understand the interplay between them. I'd like to train or build a model which will coherently describe where they are coming from. If OP reads this, I would love any information that will bring me closer to understanding.
Ответить"sorry, thought I'd do a little subliminal messaging" LMAO
ОтветитьSimply wow! An amazing video. I’m not really in the field, not even a uni student yet, just curious about technology, but I feel like I understood most of what was mentioned there! And it only proves the point that you are an amazing narrator, good job!
ОтветитьWhat a great topic! It is what I need to listen ❤
ОтветитьThis amazing, and puts an explanation to a concept I've been pondering on. Would be a great start to a self-simulating module for an AI, even if it only works on a projection like a video game state. You gained a new subscriber!
ОтветитьGreat work man. 🙌
ОтветитьThat's not a fair comparison. These VQ methods have a lot more non-linearities than non-VQ nets. I have no idea how to make a fair comparison, but it just might be due to nonlinearities of quantization that these nets have better representation capability.
ОтветитьThank you its really great video! do you have advices for begginners ?
ОтветитьDude, your 13 minutes of video literally explained more than a whole semester at uni!!! Thanks!!!
ОтветитьGreat video!!
ОтветитьI was reading through the paper, and there's a particular sentence that I'm finding a bit challenging to fully grasp: 'Demonstrating that the successes of discrete representations are likely attributable to the choice of one-hot encoding rather than the “discreteness” of the representations themselves.'
I was under the impression that one-hot encoding is a form of discreteness. If one-hot encoding is effective, wouldn't that imply that the 'discreteness' aspect also plays a role?
A great contribution to the “cult”! ❤
Ответитьsuper interesting!! thanks for making a video about it
Ответитьc) discrete representation better represents the world because the world itself (that you run your experiments on) is discrete
basically you introduce outside a priori knowledge about your problem into your solution through its architecture
continuous representation has one more thing to learn about your toy world, that it's discrete ... possibly that's why it takes longer
I just subscribed because I'd love to hear more about your RL research.
ОтветитьCool video on a field which I am curious enough about to want to dabble in. So I am just wondering. Based on research on nonlinear models in a totally different field: sometimes less is more (manageable and faster to converge). Dimensionality and parameter count does not need to be truly high to capture the essence of a model. Of course efficiency of the approach used will affect the effort required - kind of similar to how different methods of numerical integration: quadrature or Monte Carlo can require adaptive and even importance sampling and course graining.
Ответить🔢 where do all youse come from? ppl around me have no clue what an ai model is other than typing in a question on chatgpt. for over a year i tried to write a simple .py and got to train one neuron how to double a number with 99.9% accuracy. now it’s going to take me 1,000,000 years to understand this video.
ОтветитьReally appreciate u making this video, I love auto encoders.
Could the higher rate of learning on the discrete model be because the total space of representations on discrete models are dramatically smaller? If you imagine latent space modeling as some map of all possible representations to explore looking for the best possible representation, then a space where every vector is of N x 32 is much much smaller world than a space where every vector is N x 42,000,000,000. I imagine local optima are easier to stumble into
It's like if you were looking at an image, 32px x 32px, on your 1080p monitor. In a flash you see it's an elephant, and you can find its trunk. But f that same picture was 42Billion x 42Billion, displayed at true scale on your 1080p monitor, and someone asked you to find the elephant trunk... you're just too close to the elephant. You'll be doing a lot of scrolling around, following clues until you find what you're looking for
a continuous variable cannot take on a value of +-inf, though maybe some python packages lets you cheese that, it's not proper math
ОтветитьIf you have discrete state space a discrete representation would be appropriate, no?
ОтветитьGreat video! I wish there were more computer science channels like this!! This being free is amazing. Thank you ❤️
Ответить