Комментарии:
Amazing tutorial!! thank you so muchh
ОтветитьIf we divide the image into 3*3=9 small boxes, why do we still need bx, by, bh, bw these box coordinate variables?
ОтветитьIs Non-Max suppression used during training?
Ответитьwhat are the values of don't care question marks? Is it up to the labeler or is there a convention?
Ответитьwhat if an object spans more than one grid cell?
ОтветитьLoda heran karas tu machine learning bhanai ne
ОтветитьThanks for the video, it brought me back to light:)
I however still have a question: In the Yolo v1 paper it is described that the final convolutional output layer is a tensor of 7x7x1024 dimension (Darknet), then the detection follows, where grid cells dimension of 7x7 are defined. My assumption here is, since the dimension of the conv output the same as the grid cell's, can one say that one grid cell represents one pixel, hence the detection proceeds one 'pixel' at a time?
How to get value c1, c2, c3?
Ответитьi have read multiple blog posts on yolo, along with the original paper, but this video provides the intuition at a different level. amazing !
ОтветитьI am not clear how will it's work at Inference time? How can I get model output BB into original image format? Kindly give me the mathematics how to compute it?
ОтветитьThank you!!
Ответитьgreat explanation
Ответитьthe best AI teacher, thank you
ОтветитьLet's say I have an object in 3 of the grid cells. Then, the outputs of all the 3 of the grid cells should be identical, with the same values of bx,by, bh,bw. Am I correct?
ОтветитьIs this a graduate or undergraduate level course?
Ответитьamazing educator
ОтветитьSame concept is used in YOLO v3, but instead of softmax activation for all classes, logistic regression is applied to each class (meaning there can be an object belonging to two classes)
ОтветитьHow is this grid cell segmentation actually encoded in the neural network? Is it encoded at all?
If I understood correctly, the segmentation is only encoded into the training data, and the network is supposed to "learn" to output the y=3x3x16 that matches the locations of the objects relative to the grid cell on the training data. In other words, the network has no information about any image grid.
I read some documents and I know yolo use HSV, can you explain for me why?
ОтветитьHow to define anchors boxes boundary
Ответитьthank you
ОтветитьOK, so how many objects can one cell of YOLOv1 predict? The article says 'we only predict one set of class probabilities per grid cell regardless of the number of boxes'? It seems that the article skirts around the fact that the model can only predict at most 1 object/cell, but the wording above does not exclude, for example, the case when all B objects belong to the same class. So how many?
Ответитьsource code?
ОтветитьIs someone else tell me training time we are using anchor box terminology become boundingbox in prediction time is that right?Prediction time acnhorbox not using only boundingbox right?
ОтветитьIs this YOLO or YOLO 9000? According to the YOLO paper, I think the y should be 3x3x((2x5)+3), so y is 3x3x13. Is this right?
ОтветитьClear and good ecough. Thank you.
ОтветитьIs yolo is a deep learning algorithm???
ОтветитьThank you very much for all your YOLO videos. They are just great :)
Ответитьwhat happens when the expected training output was close to bounding box 1, but the output of the network was 2 and coordinates of the box on expected output were incorrectly marked close to 1 whereas they should have been close to 2
ОтветитьHow to get the programming exercise
ОтветитьThis algorithm simplified the bounding box regression by having a 3x3 (or some other) grid output, right? What I didn't understand is how anchor boxes are used in this algorithm...
ОтветитьThank you Andrew !
Ответить