Intel's vector instructions are pathetic
Posted: May 22nd, 2019, 6:02 am
I am so done with using Intel's vector instructions. I mean what even is the point?
"With AVX, you can do 8 floating-point calculations in one cycle!" -Intel
"That's cute :3" - NVidia
With a GeForce GTX 1080, you'll have 2560 cores capable of doing thousands of instructions at any one time with a memory clock of a whooping 10 Gigabytes per second. Not only that, each core also has its own personal cache for fast memory fetches. Accessing the same element multiple times non-linearly is not nearly as painful as with something like AVX. This makes it easy to do things like have each core sample a subsection of a texture or calculate a subset of a set of data. Not to mention that all the cores can use synchronisation primitives like atomic integers that'll allow you to perpetuate the results of calculations from any one core to every other core. There's no equivalent for Intel's vector instructions besides outright looping through each element of the vector and applying their results before moving on to the next step.
I keep trying to implement ways to optimise algorithms with AVX. I would spend days figuring out the bottlenecks to the algorithm and trying to work around them. Then I just make an unoptimised version of the algorithm for the GPU and it'll still run 100 times faster than the AVX version. Why should I even bother?
The only reason I could think of on why using the CPU's vector instructions will be better is when you have too much data than the GPU can store. The GeForce GTX 1080 has 8GB of memory while a CPU can have 32GB of RAM and more. Plus, with Direct Memory Access, the CPU can read from disc directly adding a whole terabyte of hard drive memory to the pile. This is great for video editing where you have hundreds of gigabytes of footage to edit or Computer Graphics where you're Pixar and you have to render an entire mountain landscape with billions of polygons.
But most of us aren't here to edit videos nor work for Pixar. As far as Game Development is concerned, speed and efficiency is much more important than having huge amounts of detail. If it can't fit in a GPU, it ain't worth it. As far as I'm concerned, GPGPU is just superior in every way. I'm done with using SSE, AVX and the rest of its ilk.
"With AVX, you can do 8 floating-point calculations in one cycle!" -Intel
"That's cute :3" - NVidia
With a GeForce GTX 1080, you'll have 2560 cores capable of doing thousands of instructions at any one time with a memory clock of a whooping 10 Gigabytes per second. Not only that, each core also has its own personal cache for fast memory fetches. Accessing the same element multiple times non-linearly is not nearly as painful as with something like AVX. This makes it easy to do things like have each core sample a subsection of a texture or calculate a subset of a set of data. Not to mention that all the cores can use synchronisation primitives like atomic integers that'll allow you to perpetuate the results of calculations from any one core to every other core. There's no equivalent for Intel's vector instructions besides outright looping through each element of the vector and applying their results before moving on to the next step.
I keep trying to implement ways to optimise algorithms with AVX. I would spend days figuring out the bottlenecks to the algorithm and trying to work around them. Then I just make an unoptimised version of the algorithm for the GPU and it'll still run 100 times faster than the AVX version. Why should I even bother?
The only reason I could think of on why using the CPU's vector instructions will be better is when you have too much data than the GPU can store. The GeForce GTX 1080 has 8GB of memory while a CPU can have 32GB of RAM and more. Plus, with Direct Memory Access, the CPU can read from disc directly adding a whole terabyte of hard drive memory to the pile. This is great for video editing where you have hundreds of gigabytes of footage to edit or Computer Graphics where you're Pixar and you have to render an entire mountain landscape with billions of polygons.
But most of us aren't here to edit videos nor work for Pixar. As far as Game Development is concerned, speed and efficiency is much more important than having huge amounts of detail. If it can't fit in a GPU, it ain't worth it. As far as I'm concerned, GPGPU is just superior in every way. I'm done with using SSE, AVX and the rest of its ilk.