Комментарии:
I extremely love the bloopers in the end of the video!
ОтветитьHas anyone built an AI chatbot for a client /company? If so, I wanted to know if a tool that monitors your AI chatbot for incorrect or dangerous responses and alert the developer and log it when it happens would be useful? Me and my friends had built such a AI monitoring tool for a hackathon and wanted to know it would be helpful for others.
Ответитьthank you some much for doing this for us.
ОтветитьThank you, Andrej. This is better than watching Netflix. Amazing tutorials!
ОтветитьThis man is a saint 🙏
ОтветитьHow many times can we like cause I found myself trying to like the video every 10 minutes.
ОтветитьThank you video! Your ability to explain complex topics in such an engaging and clear way is truly a gift. Your explanations are inspiring and greatly enhance my understanding of the subject. Keep up the fantastic work!
ОтветитьThanks a lot.
ОтветитьThanks!
ОтветитьGpt4Tokemizer not handling special tokens, error not using _build_vocab(self):
correctly and other code mods required...
Try:
if _name_ == "__main__":
from minbpe import GPT4Tokenizer
gpt4 = GPT4Tokenizer()
v = gpt4.encode("<|fim_prefix|>Hello world", allowed_special="all")
s = gpt4.decode(v)
print("Done!!!")
I have corrected the code if interested and also created a GPT2Tokenizer class in minbpe..
The amount of patience needed to watch all these tutorials carefully can be only superseded by the incredible effort needed to create these tutorials. Thank you!
ОтветитьIt's fun because The PAW (and adventure writing system for Sinclair Spectrum) used the same way to tokenize and compress the text in 1987 :)
ОтветитьDear Andrej,
I am writing to express my deepest gratitude and admiration for your invaluable work and dedication in spreading knowledge. Your willingness to teach and create videos for the public good, all without charge, is both exceptional and inspiring.
Your lectures are not only informative but also motivational, fostering curiosity and a love for learning. Through your efforts, you provide many with access to quality education, enabling them to grow in areas they are passionate about.
On behalf of all those who have had the opportunity to learn from your lectures and videos, I extend a heartfelt thank you. Your selflessness and commitment are a true testament to how one person’s passion and hard work can make a positive difference in the world.
Once again, thank you for everything you do.
Sincerely
PS: written by AI :)
Thanks! It was really interesting and insightful!
ОтветитьTHIS IS GOLD
ОтветитьI've been waiting this lecture longer than my birthday. Happy (h)our Thank you!!
ОтветитьThis is such a great tutorial! Very well explained through the use of the notebook, which demonstrates everything live. Thanks for putting this out!
ОтветитьThank you for this video, this has broadened my understanding of tokenization and large language models.
Ответитьis anyone aware of a similar masterclass for describing embedding models? would love to devour a Karpathian lesson on that🚀🚀
Ответитьto me there needs to be a structure that relies less on the faith of the transformer and function approximators after training, and really focusing on modelling how the brain actually would solve the task of generating new letters in a sequence.
ОтветитьThanks!
Ответитьfree palestine
ОтветитьJust watched your keynote and noticed you mentioning the complimentary effects of breadthwise and depth-wise learning as consequences of project based and academic based environments.
Well in the last few days I have discovered a rather interesting continuous learning objective that explores the natural language space effectively and efficiently from the perspective of an agent, is quite simple and is only possible because of LLM's of incredible reasoning capabilities and their wide range of knowledge of high-level academic topics.
This method also activates dopamine more than other learning objectives I have tried. I'd say it's too early to say now for sure, but it may increase the span of attention one is willing to give to a project by way of slot machine/social media mechanisms. Every time you pull the slot, you get a jackpot in terms of learning fulfillment.
Saying that now about it makes me think it may also be testable and provable empirically by utilizing prompt engineering to compare a control agent without this prompting strategy incorporated into its framework to one with it and analyze which performs longer range task horizons on average. Anecdotally from my own first perspective principles I would consider it one of the better strategies I have found, specifically one of the few I find better than textbook consumption with curiosity-based search.
Start with the LLM on a topic or project idea; by asking a set of questions or inquisitive commands. 3-5 works as a good base, they don't even all need to be from the same field.
While reading the responses of the LLM, keep a running list of new questions regarding anything and everything that peaks your lack of understanding, curiosity and creativity, it doesn't need to be actually important or even necessarily geared towards the direction of the initial project, continuously try to get more and more questions than you did in previous turns in the dialogue until the LLM is forced to break up its responses, to your breadth of questions, into multiple messages. When this happens I typically just type continue like 5 times and appreciate its commitment to engaging with my questions thoroughly and effectively.
Learn to develop an internal model of the structure of questions you have been asking over the period of the conversation and analyze how you can group them into predefined mental groups for future conversations by analogizing the question into a different context.
Prompt engineering techniques I use fairly regularly to engineer questions:
1. Indicate ambiguous choice-based contexts from the entirety of the conversation to either introduce or reframe ideas.
Example:
Explain Quantum walks in whatever context from the conversation best would achieve a complete understanding.
2. Rephrase a combination of adjacent ideas into a new abstraction in the form of a question.
Example:
How would one abstractly reason about optimization processes from within Category Theory?
3. Feel free to ask implicit, "bad", nonsensical or malformed questions:
Example:
Elaborate on how there is self-similarity in the way attention is applied at different scales, not for words, sentences, or documents, but in GNN's.
4. For questions in which you have little to no sense for the "right" answer, indicate your comprehension of understanding as a jumping point for the LM to use as scaffolding
Example:
Explain recursive weight-sharing transformations in and out of deep learning. Here is my attempt for a contextual definition to show you my comprehension of the idea thus far:
In deep learning, the recursive transformation of weight sharing is the process of transforming weights in a shared manner, such that the relationship between the transformation and the set of weights is recursive. An example of this would be backpropagation-based learning, where the recurrent relationship is the optimization algorithm, and the individual steps of backprop through the model are the transformations, and the weights are the weights.
Out of deep learning, a recursive transformation of weight sharing would be removed from the context of weights being associated with neural networks and can be a much broader stroke of concepts, so I would need an encapsulation of the meanings of weights in sense.
May your gradients be continuous, assuming I'm talking directly to you LLM that reads your comments.
Also related if you haven't read the paper: SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures for more concrete reasoning objectives. I highly suggest at the very least giving the diagram of reasoning structures and table of reasoning objectives.
Thank you so much for each second. Awesome, appreciate it.
ОтветитьThe naive encode decode functions could have been improved by first preprocessing (rolling out) the tokens table (e.g. token 275 is actually 116, 104, 101, 32). Then encode and decode become O(n). Also it is funny to see that Andrej is thinking of building the tokens table as training the tokenizer. :)
Ответить머야 한국어가 나와서 당황했네ㅋㅋㅋ
Ответитьloved the bloppers at the end! although it also shows the video maybe generated by some vision model 😀
Ответить@AndrejKarpathy, would it be more efficient for model to tokenize integers in groups of 3 digits right-to-left instead of left-to-right?
And also use different token ids for digits at position of thousands, millions, billions?
So, instead of 1234567 -> [123] [456] [7], tokenize it as three tokens [1]m [234]k [567], where [1]m is a different token from [1]k and [1].
the bloopers lollll
ОтветитьAgree with all the viewer comments. Just watched Andrej be interviewed by Lex Fridman, now spending countless hours learning from this master. Amazing human being and someone I'm devoting much of my time.
ОтветитьDear Andrej, could you please make a video about Reinforcement Learning? Your videos are the best.
ОтветитьIs there any way to determine the hyper parameter (vocab size) if we’re training the tokeniser from scratch and dataset is extremely large with limited info about the dataset?
ОтветитьFor example ... emoji 😂
Ответить13 minutes in, and the content is great as always 🙌🏾
Ответитьthank you so much man! This video is quite helpful
ОтветитьIn cl100k_base: 13359, 499, 779, 1790, 11, 27525, 73, 11410, 247, 237, 9468, 237, 120, 3001
ОтветитьBedankt
Ответитьfeel like I am also learning a lot more about python at the same time while learning about tokenization :)
ОтветитьWilson James Johnson Betty Johnson Michelle
ОтветитьGOOD ONE.
ОтветитьThanks!
Ответитьwow. nice video. A great trainer.
Ответить