Yeah.
I'm working on a few tasks right now.
I'm kind of struggling with some choices though.
My mind is all over the place, I could probably benefit from some medication lol.
One of the things I like about C++ over other languages is operator and function overloading. I love being able to call a single function and have the compiler figure out which one I meant. Wouldn't it be nice to have that same uniform nature with SSE/AVX?
There are some hurdles through. The float versions don't have shift instructions and the shuffle instructions for the int versions only take in one parameter. The int versions also don't have a divide instruction, and the integer multiply versions only multiply elements 0 and 2 ( for the int32 version ).
So dilemma number one is: Seeing as how each one of these operations is going to take parameters by *value, calling each one will create copies before executing the intrinsic used inside each function, so is it even worth the effort to create such a library?
*NOTE: Using __vectorcall the first 6 parameters get passed by XMM register instead of __cdecl or __stdcall which only passes the first 4 in x86/x64 registers after the max has been reached the stack is used regardless if I recall correctly.
I've thought about using MACROS, but if constexpr doesn't seem to work as intended inside a macro function definition.
Code: Select all
#define mm_add( a, b, result )\
using T = decltype( a );\
using U = decltype( b );\
static_assert( std::is_same_v<T, U> && std::is_same_v<T, decltype( result ) );\
if constexpr( std::is_same_v<T, __m128> )\
result = _mm_add_ps( a, b );\
else if constexpr( std::is_same_v<T, __m128i> )\
result = _mm_add_epi32( a, b );
// Paraphrasing the error message
Error: cannot convert __m128 to __m128i
Sounds like the whole expression is being evaluated instead of just the block that is true, which is the intent of if constexpr.
The second part of the project is creating a pseudo shader language using templates and the C++ type system. I was originally planning on using straight intrinsics for this, but then thought "how nice it would be to have a uniform interface for the SIMD instructions" while creating this shader language/framework. So dilemma number 2 is; do I go back to strictly instrinsics using if constexpr throughout or, finish the unified SIMD interface and use it?
All this was brought about when I was looking over the shared project Cameron, Luis and I were working on like 5-6 years ago. I've learned a lot about C++ and a little bit about vector math since then so I wanted to revamp the project hoping I could make the code a little cleaner as well as extent it by creating special effects and perhaps use some real physics algorithms. Since the game isn't that demanding as is, I really didn't see a point in using the GPU ( yes, I know I could have had D3D setup and working by now, but this is more fun ).
I thought, oh it would cool to have a way of "quick" way of testing this stuff and not have to spend so much time setting up D3D. I really need to come up with a D3D11 framework.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com