Planet Chili

Posted: **September 12th, 2019, 11:48 am**

	template<typename T>
	auto __vectorcall operator<<( T left, int _count )noexcept
	{
		assert( _count >= 0 && "_count must be positive." );

		if constexpr( traits::is_floating_point_v<T> )
		{
			auto const false_mask = _mm_setzero_ps();
			auto const true_mask = _mm_cmpeq_ps( _mm_setzero_ps(), _mm_setzero_ps() );
			auto const llOO = _mm_shuffle_ps( true_mask, false_mask, swizzle::mask<0, 0, 0, 0> );
			switch( _count )
			{
				case 3:
				{
					//auto const not_mask = _mm_castsi128_ps( storage::constants::mask_x.load() );
					auto const lOOO = _mm_shuffle_ps( llOO, llOO, swizzle::mask<0, 2, 2, 2> );
					auto const result = _mm_shuffle_ps( left, left, swizzle::mask<3, 0, 1, 2> );
					return _mm_and_ps( lOOO, result );
				}
				case 2:
				{

					//auto const not_mask = _mm_castsi128_ps( storage::constants::mask_xy.load() );
					auto const result = _mm_shuffle_ps( left, left, swizzle::mask<2, 3, 0, 1> );
					return _mm_and_ps( llOO, result );
				}
				case 1:
				{
					//auto const not_mask = _mm_castsi128_ps( mask_xyz.load() );
					auto const lllO = _mm_shuffle_ps( llOO, llOO, swizzle::mask<0, 0, 0, 2> );
					auto const result = _mm_shuffle_ps( left, left, swizzle::mask<1, 2, 3, 0> );
					return _mm_and_ps( lllO, result );
				}
				case 0:
				{
					return left;
				}
				default:
				{
					return _mm_setzero_ps();
				}
			}
		}
		else
		{
			return _mm_slli_si128( left, _count * 4 );
		}
	}

Just started using lower case 'l' and upper case 'O' to look like the "bits" that were being set. I think I've been at this programming thing too long.

Posted: **September 14th, 2019, 5:56 am**

Doing some SIMD stuff again I see

Posted: **September 14th, 2019, 10:28 pm**

Yeah.

I'm working on a few tasks right now.

I'm kind of struggling with some choices though.

My mind is all over the place, I could probably benefit from some medication lol.

One of the things I like about C++ over other languages is operator and function overloading. I love being able to call a single function and have the compiler figure out which one I meant. Wouldn't it be nice to have that same uniform nature with SSE/AVX?

There are some hurdles through. The float versions don't have shift instructions and the shuffle instructions for the int versions only take in one parameter. The int versions also don't have a divide instruction, and the integer multiply versions only multiply elements 0 and 2 ( for the int32 version ).

So dilemma number one is: Seeing as how each one of these operations is going to take parameters by *value, calling each one will create copies before executing the intrinsic used inside each function, so is it even worth the effort to create such a library?

*NOTE: Using __vectorcall the first 6 parameters get passed by XMM register instead of __cdecl or __stdcall which only passes the first 4 in x86/x64 registers after the max has been reached the stack is used regardless if I recall correctly.

I've thought about using MACROS, but if constexpr doesn't seem to work as intended inside a macro function definition.

Code: Select all

#define mm_add( a, b, result )\
using T = decltype( a );\
using U = decltype( b );\
static_assert( std::is_same_v<T, U> && std::is_same_v<T, decltype( result ) );\
if constexpr( std::is_same_v<T, __m128> )\
   result = _mm_add_ps( a, b );\
else if constexpr( std::is_same_v<T, __m128i> )\
   result = _mm_add_epi32( a, b );

// Paraphrasing the error message
Error: cannot convert __m128 to __m128i
Sounds like the whole expression is being evaluated instead of just the block that is true, which is the intent of if constexpr.

The second part of the project is creating a pseudo shader language using templates and the C++ type system. I was originally planning on using straight intrinsics for this, but then thought "how nice it would be to have a uniform interface for the SIMD instructions" while creating this shader language/framework. So dilemma number 2 is; do I go back to strictly instrinsics using if constexpr throughout or, finish the unified SIMD interface and use it?

All this was brought about when I was looking over the shared project Cameron, Luis and I were working on like 5-6 years ago. I've learned a lot about C++ and a little bit about vector math since then so I wanted to revamp the project hoping I could make the code a little cleaner as well as extent it by creating special effects and perhaps use some real physics algorithms. Since the game isn't that demanding as is, I really didn't see a point in using the GPU ( yes, I know I could have had D3D setup and working by now, but this is more fun ).

I thought, oh it would cool to have a way of "quick" way of testing this stuff and not have to spend so much time setting up D3D. I really need to come up with a D3D11 framework.

Posted: **September 16th, 2019, 4:46 am**

So dilemma number one is: Seeing as how each one of these operations is going to take parameters by *value, calling each one will create copies before executing the intrinsic used inside each function, so is it even worth the effort to create such a library?

I wouldn't assume that just because parameters are taken by value, or by reference, that that is what is actually going to happen after optimization is done. Inlining often folds away parameters along with the the function call itself.

Posted: **September 16th, 2019, 5:30 pm**

Then I wonder what are these "too many temporaries" that I keep hearing about when people say not to use operator overloads for SIMD operations, but yeah, that's kind of what I though inlining was suppose to do, copy/paste your inline function where the function is actually being called.

Posted: **September 26th, 2019, 4:45 pm**

Idk about too many temporaries. I guess the best way is to write up something minimal and examine the output in Godbolt.

Planet Chili

Got tired of naming things

Got tired of naming things

Re: Got tired of naming things

Re: Got tired of naming things

Re: Got tired of naming things

Re: Got tired of naming things

Re: Got tired of naming things