That is 32 floating point operations per cycle, every cycle. If you do this right and really optimize your memory accesses, you can do 2 vectors of 8 single-precision floating point numbers, per cycle, and you do two operations (one multiply and one add) on each number. For funsies, let's say that the data isn't stored nicely in memory, so you need to use special scatter/gather operations to get it. ![]() ![]() All the data for them needs to be fetched from memory, but you're a clever programmer (or more likely you use a clever compiler), so you fetch this data ahead of time into the registers, so it is available when you need to do the math. No consider that you have vectors of floating point numbers, and you want to do an FMA operation - fused multiply accumulate, where you multiply the values of two vectors and add them to a third. There is no way to parallelize the code, so it will be one integer addition per second (after you run out of registers you also have to fetch data from memory, which is a second operation that will run in parallel), but the main compute parts will be doing one integer addition per cycle. This code will do one integer addition per cycle. You keep doing this for as many registers as there are, and then you start adding up values from memory. ![]() Then you take the value of register 1 and register 3, and store that sum in register 1. Not 100% always, but usually.Ĭonsider if you have code taking the value of register 1 and adding it to register 2, and storing the result in register 1. They're doing it with the computer up and running, so something going on in the background will disturb the scores.Īnd why doesn't Geekbench hit the power limit? Yes, now we're getting into the complicated parts of performance, and why all those people claiming thermal throttle hurting their performance are usually wrong. People don't stop doing other things when running Geekbench.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |