Intel's vector instructions are pathetic

cyboryxmen · Post by **cyboryxmen** » May 22nd, 2019, 6:02 am

I am so done with using Intel's vector instructions. I mean what even is the point?

"With AVX, you can do 8 floating-point calculations in one cycle!" -Intel

"That's cute :3" - NVidia

With a GeForce GTX 1080, you'll have 2560 cores capable of doing thousands of instructions at any one time with a memory clock of a whooping 10 Gigabytes per second. Not only that, each core also has its own personal cache for fast memory fetches. Accessing the same element multiple times non-linearly is not nearly as painful as with something like AVX. This makes it easy to do things like have each core sample a subsection of a texture or calculate a subset of a set of data. Not to mention that all the cores can use synchronisation primitives like atomic integers that'll allow you to perpetuate the results of calculations from any one core to every other core. There's no equivalent for Intel's vector instructions besides outright looping through each element of the vector and applying their results before moving on to the next step.

I keep trying to implement ways to optimise algorithms with AVX. I would spend days figuring out the bottlenecks to the algorithm and trying to work around them. Then I just make an unoptimised version of the algorithm for the GPU and it'll still run 100 times faster than the AVX version. Why should I even bother?

The only reason I could think of on why using the CPU's vector instructions will be better is when you have too much data than the GPU can store. The GeForce GTX 1080 has 8GB of memory while a CPU can have 32GB of RAM and more. Plus, with Direct Memory Access, the CPU can read from disc directly adding a whole terabyte of hard drive memory to the pile. This is great for video editing where you have hundreds of gigabytes of footage to edit or Computer Graphics where you're Pixar and you have to render an entire mountain landscape with billions of polygons.

But most of us aren't here to edit videos nor work for Pixar. As far as Game Development is concerned, speed and efficiency is much more important than having huge amounts of detail. If it can't fit in a GPU, it ain't worth it. As far as I'm concerned, GPGPU is just superior in every way. I'm done with using SSE, AVX and the rest of its ilk.

Post by **albinopapa** » May 22nd, 2019, 7:40 am

One thing is SIMD intrinsics are C compatible, so no extra compilation and no shaders and other API setup. This make developing quicker and more familiar. Also, consider the latency copying data between CPU -> GPU -> CPU, swapping out pages of memory manually to keep the GPU compute pipeline fed but not overfull.

I'm with you though, GPGPU speeds are insane. On one of the ray tracing projects I compared x64, simd and c++amp( gpgpu ) and the frame rate was something like: x64 = 8, simd = 25 and gpu = 160. The GPU I used was an integrated APU A10-7700K, so very low end. The project had one directional light source, shadows and 2-3 objects and that's it nothing fancy, don't even think it had reflections.

I'm considering using GPGPU in my current project, but last I tried C++Amp and C++17 features didn't play well together and I have a few places I use constexpr-if and std::optional so I need C++17. If that fails, I'll probably end up using DirectCompute, or just switching to full D3D.

cyboryxmen · Post by **cyboryxmen** » May 22nd, 2019, 8:47 am

You might find this more appealing then.

Post by **albinopapa** » May 22nd, 2019, 9:44 am

OMG, thank you. Sycl has come up a few times, but I haven't followed it.

Planet Chili

Intel's vector instructions are pathetic

Intel's vector instructions are pathetic

Re: Intel's vector instructions are pathetic

Re: Intel's vector instructions are pathetic

Re: Intel's vector instructions are pathetic