Комментарии:
Just ran perf stat on python3 hello world and got 6% branch misses. pypy3 had 3% but way more total branches...
g++ or clang++ with -O0 or -O3 and I still get 2.5% misses(though a lot less total branches, 800k vs 30M) using iostream with endl
same branches and misses for printf()
Also 3M instructions for printing hello world?
hmmm... 950k instructions and 220k branches for ` int main(){ return 0;} ` must be overhead from perf or the OS, the assembly is basically zero eax then ret
Why can't compilers do this type of optimization?
ОтветитьThat "session is over" was definitely not predicted.
ОтветитьReally good presentation, thank you!
ОтветитьCpu and compiler engineers are mad geniuses lol
ОтветитьI watched this talk sometime back around when it was first uploaded, and I am almost certain I missed that joke at the beginning: "it's a talk on performance; the closer you sit, the better the performance." I am so glad YT recommended it to me again.
ОтветитьYou know that you have no real problems in your life if you are focused on the branch prediction pipeline of your CPU. ;-)
ОтветитьVery great talk, thank you!
ОтветитьWhy code like `rand() & 0x1` generates branches that can be missed? Doesnt this piece of code perform a branchless stream of instruction (with some bitwise-and call)?
Ответитьyou write this kind of code and then some other person will look at your code and be like "was this guy on drugs when he wrote this?". I think all this makes C++ a horrid language
ОтветитьIt’s crazy the quality of people Russia has lost
ОтветитьReally great topic and good info from Fedor, but his style takes some getting used to.
ОтветитьVery interesting, thanks
ОтветитьAwkward ending to say the least
Ответить“Predictor is good in prediction. We are not“ - nice :)
ОтветитьThis is absolutely mind blowing session
Ответитьthe art of writing efficient programs is a very good introductory book into this performance focused universe, highly recommended
ОтветитьThank you CppCon. Is the presentation slides available anywhere?
ОтветитьI suggest you look at `sudo perf top -e branch-misses` - it will just tell you exactly where there are too many mispredicts. Hit enter on the top thing and drill down to what function it is. Build with debug info.
ОтветитьNice to see a talk like this when there are so many people scoffing at branch free, because they saw one example where a perfectly predictable branch skipped a few ops and ended up slightly faster.
ОтветитьMy low-mid (second from the bottom level) level interview was 70% leadership to start with, followed by 30% extremely basic technical. I didn't expect leadership questions at such a low level, and I messed up that interview so hard, but fortunately they hired all 4 people that made it to interview so it's all good!
ОтветитьIn PowerPC, the branch conditional instruction has the prediction baked into it. I've seen, for example, MetroWerks compiler output where its static predictions were very primitive (99% of branches forward were unlikely, 99% of branches backward were likely). I've yet to use the C++20 attributes, but [[likely]] and [[unlikely]] probably give you manual control of the prediction bit for those branches, which is neat.
Debug assertions and nullptr checks were totally the first thing I thought of when I learned about these attributes. Even if the compiler is smart enough to recognize a nullptr check and mark it as unlikely (I'm sure it is), it is nice to be able to self-document it with an attribute.
cool upload CppCon. I smashed that thumbs up on your video. Maintain up the excellent work.
ОтветитьAlways happy to see Fedor!
Ответить1. this video was made in 2021, with supposedly not an ancient CPU in the test system. I've heard so much about modern CPUs having redundant pipelines that keep evaluating both sides of the branch (and still keeping the branch prediction circuitry, in case the pipelines are "shorter" than the branch code paths). If that's true, why doesn't that make the handling of the ASM code generated by the compiler much more efficient? 10% (minority) wrong predictions should still lead to high efficiency.
2. why isn't the C compiler taking advantage of the SIMD in the fastest version of C branchless? Isn't there some compiler option you could have turned on to get the compiler to make code that performs closer to the optimal ASM impl?
.... are compiler implementors and CPU designers lying to us or are they optimizing to unrepresentative/narrow test cases?
The session is over, thank you......
Ответитьgive him another hour
Ответить"The closer you sit, better the performance" - I see what you did there :)
ОтветитьWith x86 the biggest effect likely/unlikely has is to rearrange the instructions such that the unlikely branch is moved to a different area of the program. This makes the likely path a serial instruction stream, which is good for instrction cache. Also good for branch prediction in the case there is no branch prediction entry in the table. E.g. the first time thru
ОтветитьIt can't optimize `c[i] = rand() >= 0` because function `rand()` is treated as "deterministic" random generator. The internal state of RNG must be advanced even though the returned value is discarded. The best one can get is `rand(); c[i] = 1;`
ОтветитьGreat talk like always quality content from Fedor! I will definitely buy his book as I'm rly interested in these type of optimisations.
Also can someone explain why the optimisation with function pointers doesn't work when functions are inlined?
bool(x)+bool(y) is a bit scary, booleans shouldn't have arithmetic adition defined, it should have only boolean operations defined.
ОтветитьHow to find a job for a C++ developer? I'm not joking, I'm a dotnet developer, and I'm asking the serios quesion. Companies (as usual) want to see professional C++ developers after they finished university, which is impossible.
ОтветитьLoved it. Thanks Fedor!
ОтветитьHaving looked at the comments before watching the entire talk, I was a little worried the talk would end before the speaker got to close the talk. So I'll mention here that the cut off happened during the question section at the end, after the last slide. Still somewhat abrupt!
ОтветитьInteresting and informative talk! I like the hands-on, example-driven approach.
What I don’t like is the constant interruptions (esp. ~ mins 30-40) from the audience questions. These are very hard to follow as a remote viewer and disrupt the flow.
It was a great ride! Thanks Fedor & @CppCon 👍
ОтветитьSerious question: if we repeat all the time ‘measure before changing’ and that compilers and processors may do better work then why do we think, that after we changed code once, it stays the fastest code? We made code less readable, we removed branch and added more work.
Then what if new processor came or new optimization in a compiler etc and original code would be better?
Session is over, thank you!
Ответить"Your session is over" :(
ОтветитьMarvellous. Thanks. Deep thinking...
ОтветитьDo your cats own laptops ? LOL.
ОтветитьWhy does the function pointer trick not work ? Perhaps expensive memory look up for the function ?
ОтветитьReally well presented!
ОтветитьExcellent talk about branch predictor and the ways to take advantage of it. When I first saw the title though, for some reason I initially thought that the topic is about functional programming.
Ответитьwhat a way to close the session
Ответить