comparing GPUs to CPUs isn't fair

comparing GPUs to CPUs isn't fair

Low Level Learning

1 год назад

287,798 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Felix
Felix - 29.10.2023 14:26

Correction*
GPUs don't have thousands of cores.
For something to be considered a core it needs to be able to fetch, decode, and execute an instruction on it's own.
CUDA cores are just a lane of a vector execution unit.

Ответить
Kai Huso
Kai Huso - 23.10.2023 07:55

Again, someone calling NVidia "nuh-vidia"... Why are people now calling NVidia "nuh-vidia"? It's "en-vidia". Has been since Jensen thought of the name, because of what the meaning of the name came from and because it's been called "en-vidia" all the way up until recently. What the fuck changed? Why are people now calling it "nuh-vidia"?

Ответить
Cypher
Cypher - 06.10.2023 10:28

Is the concept of 59.7 times slower a concept taught in schools now?

Fifty times less use to be taught as one-fiftieth and saying 50x less wouldve landed a lecture by a math instructor

Ответить
Paul Stoleriu
Paul Stoleriu - 18.09.2023 15:28

Haha, "very complicated floating point operations" (add and multiply) and "high precision floating point" - well, actually, we use 16 bit wherever we can get away with it, and fixed point would be even better for most uses (operations on normalized vectors).
GPUs are not slower than CPUs, they're just so much stupider than CPUs that they're barely useful for general computation. Though, hardware wise they're waaay more complex with the media engines and compression/decompression, sampling algorithms implemented in hardware and other fixed function blocks. But we don't actually know what's hardware, what's microcode and what's driver software since the stuff is mostly secret, and the closest to bare metal programming on the gpu is Vulkan. Well, you can look trough the mesa drivers source code.

Ответить
askmiller
askmiller - 05.09.2023 02:14

I don't think it's sufficient to just say the # of warps is the limiting factor of a GPU's speed because there's still far more warps in a GPU than cores in a CPU. the 4080 has 304 warps, still like 10x more than what a CPU would have. There's 2 big problems really for why we don't just use a GPU for everything. 1. each warp is significantly slower at individual operations than a CPU core. 2. most software isn't going to be able to scale to take advantage of all of the warps on you GPU and will likely just use 1 anyway. Suppose you wrote an OS to utilize a GPU like it's a CPU, you'd need to be running hundreds of applications at once for the scalability of the GPU to overcome each warp being slower. I think the analogy I would use is that trains can move more people per hour than cars, but as an individual it will take you longer to get to your destination via train than driving in most circumstances. It only makes sense to have a train in a situation where you have a lot of people who specifically need to go from one point to another.

Ответить
Wh0isTh3D0ct0r
Wh0isTh3D0ct0r - 04.09.2023 18:35

Your thumbnail is wrong. Nothing can be greater than 1x SLOWER than another thing it is being compared to. Because 1x = 100%. Look at it this way...if an object is moving at 30mph and something else is going 1x SLOWER than the original object (1x = 30mph, meaning 30mph SLOWER than the original object), then the other object is moving at 0mph.

Any value greater than "1x SLOWER" would be something going BACKWARDS.

Ответить
Roberto Bokarev
Roberto Bokarev - 11.07.2023 09:43

Summary:
CPU - Low latency
GPU - Great Parallelising

Ответить
jlinkels
jlinkels - 06.07.2023 03:58

Stupid tune in the background 👎

Ответить
yeetboi3000
yeetboi3000 - 26.06.2023 23:31

You talked about CUDA cores but what about AMD GPUs ? Do they follow the same principles ?

Ответить
CT 2
CT 2 - 21.06.2023 08:26

Until halfway i thought he was chanting the spell for fireball, I guess i'm not cut up for deeper tech knowledge 😂

Ответить
Shapeless
Shapeless - 18.06.2023 16:58

I'd put things a bit differently.
CPU cores are designed to be versatile, to be able to perform many kinds of operations, but generally do one thing at a time very fast fast, that's because programs need to execute in a precise order, like add A to B, then devide the result by C.
GPU cores on the other side are designed to do only certain, limited things and are super small so you can fit thousands of them next to each other, meaning they can do multiple tiny things in parallel, all at once. Then you can just glue together what they produced to compose an image.

Ответить
Naturally Interested
Naturally Interested - 16.06.2023 14:29

This kind of parallelism is actually called Single Instruction Multiple Threads, as it is slightly different from Single Instruction Multiple Data. In fact a Warp can be Single Instruction Multiple Threads like explained in the video and process multiple pixels at once (and following every branch in unison), while every core can be Single Instruction Multiple Data and process a vec4 at a time, not just a float.

Ответить
Leonmitchelli Galette des Rois
Leonmitchelli Galette des Rois - 15.06.2023 05:11

And then Intel releases Xeon Phi that is CPU but works like GPU.

Ответить
Tree Librarian
Tree Librarian - 08.06.2023 04:30

from your description, one alder-lake core (with avx512 enabled, so saphire rapids) would be equivalent to 32 cuda cores, since it can initiate 32 floating-point math ops per clock cycle. except that they can be two sets of 16-way simd, and simultaneously can perform another 4 logical ops and more memory ops at the same time, twisting and winding it's way through 2 arbitrary threads at the same time with far greater flexibility, and at a higher clock rate. so top xeon cpu's can theoretically manage >7 terraflops, as long as the memory can keep up. still 7 times slower, but much easier to design diverse code for.

Ответить
IsYitzach
IsYitzach - 07.06.2023 23:24

The FPU in a CUDA core are optimized for particular floating point operations. They're really good at addition, subtraction, and multiplication. They suck at division and most any other function like trig and exponents. You're also going to suffer from starting GPU kernels and data transfers between the GPU RAM and main RAM/CPU cache. And the largest float they can handle is double. They can't do long doubles. Doubles are usually enough. But double precision is also slower than single precision, but that may be in common in with CPUs. I haven't checked.

Ответить
Kibbles
Kibbles - 05.06.2023 07:26

Nice explanation of warp scheduling and stuff, I used those ideas a lot in my path tracer

Ответить
York's Gaming Emporium
York's Gaming Emporium - 31.05.2023 07:06

Hey, Low Level Learning, this description doesn't make sense, I think you meant GPU?

Ответить
I3L4CK SkILLZz
I3L4CK SkILLZz - 31.05.2023 05:47

The 4090 has 80 tflops

Ответить
Silly Sally
Silly Sally - 26.05.2023 10:30

You forgot that context switching on the gpu has zero overhead and that the cpu ram is far slower than gpu ram. On the gpu all instructions are 32 wide simd BUT the gpu has 256 registers per thread (or 65000ish per actual core) vs the cpu 16 per core. Keep in mind that the cpu takes a clock speed penalty if it tries to use simd while the gpu does that by default. Btw the gpu has actual cores like the cpu but they’re for me for ex 46 in number, the 4090 is 180 or sth.

Ответить
Donovan Lavinder
Donovan Lavinder - 25.05.2023 22:21

It's not weird at all. It's also because GPUs are technically in-order processors (the GPUs that are capable of out-of-order execution now also exist - ARM Valhalla and Snapdragon Adreno are examples of out-of-order GPUs now in use), and CPUs are therefore already superscalar out-of-order monsters. However how you program either or both processors matters a whole lot too, because GPUs are meant to chew through the vector math in parallel (ie. you want as many ALUs busy as possible).

EDITED: Forgot to add, SIMD vector processors, especially some superscalar varieties (such as AMD GCN shaders), tend to only work on exactly the same item at a time in parallel, unless they explicitly allow dual-issuing of two different SIMD vector math instructions at a time. FPUs are weird in a way.

Ответить
Shihab Uddin Ahmad
Shihab Uddin Ahmad - 22.05.2023 20:37

What about tensor cores in gtx 1080ti ?

Ответить
Homework
Homework - 18.05.2023 08:07

My take on it is GPUs are Stupid Faster and CPUs are Smart Slower

Ответить
MissesWitch
MissesWitch - 17.05.2023 04:46

cpus can't have thousands of cores? but i saw a 9000 core cpu at playtech!

Ответить
roax206
roax206 - 13.05.2023 18:42

Note that the Nvidia Cuda cores are probably closer to Intel threads. The best Nvidia equivalent for the core is the Streaming Multiprocessor (SM).

For the Ada Lovelace Architecture (RTX 40 series), each SM has 128 Cuda cores. 128 * 128 = 16384.

CPUs also tend to handle a larger number of data types with additional instructions (and more hardware) needed for both conversions between each, as well as all different operations performed for each different data type. The GPU core, on the other hand, can be a lot simpler and thus smaller.

Ответить
devansh anil
devansh anil - 13.05.2023 11:08

u should have gone in a bit more depth

Ответить
Seth Pentolope
Seth Pentolope - 13.05.2023 09:39

Another way to put it:

A CPU core: A single jack-of-all trades genius who gets things done ASAP. He even tries to predict what will happen next so he is never waiting on someone else. His desk space takes up an entire floor of the office building. He has multiple assistants that organize things and try to keep whatever he needs within reach of this genius. Trying to manage lots of these people gets… unmanageable.

A GPU core: A factory worker who needs to be told what to do all of the time. Give the managers megaphones and they all get lots of stuff done, as long as the manager can utilize the factory workers efficiently. To do that, the factory workers have to all be following the same instructions.

See? Totally different!

Ответить
cryptearth
cryptearth - 12.05.2023 00:28

what do so many techies have such issues to correctly pronounce "en-vidia"?
it's NOT "neh-vidia" but rather "en-vidia" ... jeez - it hurts so much everytime I hear it

Ответить
Skorp
Skorp - 10.05.2023 01:32

We live in a world where Nvidia tells us how to pronounce their name and people still say Nuh Vidia.

Ответить
StickGuy
StickGuy - 22.03.2023 09:57

How modern hardware and programming languages work is still black magic to me

Ответить
data
data - 22.03.2023 06:28

Absolutely amazing video, all the other channels massively oversimplify it.

Ответить
kdcadet
kdcadet - 15.03.2023 19:05

What about cuda core vs SM or for amd i think it's stream processor vs execution engine

Ответить
Mika Lindqvist
Mika Lindqvist - 27.02.2023 11:34

Because GPU handles SIMD, it would be more correct to compare it with SIMD unit inside CPUs... My GPU has 14 cores, but each core can handle 1024 threads. Even if I stress my GPU, it is likely to utilize only 10 or 11 cores as it doesn't have enough memory to utilize all cores with own kernels.

Ответить
Ymi_Yugy
Ymi_Yugy - 25.02.2023 17:49

CPUs actually include SIMD execution units and perform tasks in a similar way to GPUs.
What sets GPUs apart is their fixed function hardware used for special tasks like performing raster algorithms, sampling textures and in more recent times neural network inference and ray intersection testing.
We have also seen that under the right circumstances compute shaders are competitive with fixed function rasterizations and some CPUs including machine learning instructions.
This makes me wonder whether someday we might go back to a unified processor, that forgoes that overhead and complicated programming model that comes with using a co-processor.

Ответить
Clayton Macleod
Clayton Macleod - 20.02.2023 01:45

It’s not nuhvidia. It’s envidia.

Ответить
Rupert Erskin
Rupert Erskin - 19.02.2023 09:17

Right on. Thanks for sharing.

Ответить
Frank Johnson
Frank Johnson - 19.02.2023 05:17

I thought it was SMT that got nvidia's gpu to 15k. Shouldn't it be half the stream processors physically

Ответить
Benjamin Batchelor
Benjamin Batchelor - 18.02.2023 14:33

The N in NVIDIA is pronounced “en” not “nuh”

Ответить
Mob6 CCJP
Mob6 CCJP - 18.02.2023 09:33

I just want multi cpu boards back, I refuse to elaborate

Ответить
Blaze It Ken
Blaze It Ken - 18.02.2023 00:35

CPUs and GPUs need each other for the optimal epic gamer experience.

Ответить
takipsiz AD
takipsiz AD - 16.02.2023 14:20

i guess intel larrabee is just ignored

Ответить
jordan ryan
jordan ryan - 16.02.2023 04:36

Well first off, it depends on the workload. Applications that are highly threaded are generally much faster with GPUs. But single threaded stuff or even lightly threaded apps benefit from cpu

Ответить
Yassine
Yassine - 14.02.2023 19:13

I see that if they pass to 1nm they can make thousands of cores and boost the speed to 10 ghz or more... same for all other components

Ответить
Will 93
Will 93 - 14.02.2023 07:27

FYI: Nvidia is pronounced "En vidia" not "Na vidia."
Signed,
Some old dude who has been gaming since long before they released their first GPU. 😉

Ответить
P_V
P_V - 12.02.2023 00:49

for comparison the best desktop cpu right now the i9 13900ks has 24 cores and the rtx 4090 has 16384 cores :/

Ответить
Schmelon
Schmelon - 11.02.2023 14:13

warp as in warp and weft

Ответить
Chris R
Chris R - 09.02.2023 07:25

It's NOT Ne-vidia - it's IN-Vidia!! Geez.

Ответить
lt3
lt3 - 09.02.2023 06:53

More content like this I love it. Keep up the great work man!

Ответить
John G.
John G. - 09.02.2023 02:49

How is it possible to multiply a positive number, by a positive number, and get a smaller number?

It may be that thinking a fraction (i.e., ~1/50th) as a multiplier is equivalent to saying that something is less. Maybe, but it is at best highly illogical. So if we need to indicate that a resultant number is less than or smaller than an initial number how about we just start talking in terms of a fraction of the initial number, rather than a multiplier.

Ответить
Sven Isaksson
Sven Isaksson - 08.02.2023 17:14

Does the difference between GPUs and CPUs apply to DSPs aswell?

Ответить