Re: SIMD library attempt inf
Posted: September 18th, 2020, 2:47 am
I brought this up a long time ago, and again, I've gotten a lot further than I have in the past. I wanted to use the template system of C++ to make a sort of programming language. Well, I'm not sure if that is an accurate description, but "it is what it is" lol.
All the calling functions are parameterless and return void, the data buffer and pseudo registers are global, so when compiled in Release mode, all function calls are inlined and I'm left mostly just the intrinsics ( loadu, loadu, loadu, sub, mul, add, storeu ). There is very little overhead.
I'm still trying to figure out a few things to make it less cumbersome, but for now, you have to set all the registers and offsets manually. Some of that will stay, but I think once I figure out how to make and pass structures, some of that will go away.
For instance, the lerp function body:
Here, I'm saying:
This function is pretty simple and since each parameter in param_list has the register position stored with it, I could probably figure out a way to just use those and make the pseudo function more flexible.
I'm getting close to figuring out a way of simulating structures. The part I'm getting hung up on is automatically generating offsets for members.
I was hoping this line here:
would add current offset and current unpacked T::size and create the next one and so on for each parameter in the pack, however, I really only get:
list<member<0, float3>, member<12, float3>, member<12, float3>>
So I'm kind of confused on where to go from here. I need to recursively create members and each member needs to be current offset + T::size apart. All my types are going to be 4 byte aligned, so I won't have to worry about alignment.
Any suggestions from the template pros?
Code: Select all
using lerp_f1 = function<
float,
param_list<parameter<0, float>, parameter<1, float>, parameter<2, float>>,
function_body<subf<1, 0>, mulf<1, 2>, addf<0, 1>>
>;
int main( int argc, char* argv[] ) {
// data here for now is a char array, so just a stack allocated buffer
auto* dst = reinterpret_cast< float* >( data );
// filling the buffer with some values
dst[ 0 ] = 100.f;
dst[ 1 ] = 100.f;
dst[ 2 ] = 100.f;
dst[ 3 ] = 100.f;
dst[ 4 ] = 200.f;
dst[ 5 ] = 200.f;
dst[ 6 ] = 200.f;
dst[ 7 ] = 200.f;
dst[ 8 ] = .5f;
dst[ 9 ] = .5f;
dst[ 10 ] = .5f;
dst[ 11 ] = .5f;
// loading those values into pseudo registers ( actually calls _mm_loadu_si128 or _mm_loadu_ps )
// The first param is a byte offset into the data buffer,
// The second param is which 'register' to load the data to
execute_instructions<loadf<0, 0>, loadf<16, 1>, loadf<32,2>>();
// a lerp test pseudo function defined above
lerp_f1::exe();
// storing values back into data buffer
// first param is the register to store
// second param is offset into data buffer
execute_instructions<storf<0, 0>>();
// just to check/verify the result
float lerped = dst[ 0 ];
std::cout << lerped << '\n';
return 0;
}
I'm still trying to figure out a few things to make it less cumbersome, but for now, you have to set all the registers and offsets manually. Some of that will stay, but I think once I figure out how to make and pass structures, some of that will go away.
For instance, the lerp function body:
Code: Select all
function_body<subf<1, 0>, mulf<1, 2>, addf<0, 1>>
- subtract reg0 from reg1 assign to reg1: param1 = param1-param0
- multiply reg1 with reg2 assign to reg1: param1 = param1*param2;
- add reg1 to reg0 assign to reg0: param0 = param0+param1;
This function is pretty simple and since each parameter in param_list has the register position stored with it, I could probably figure out a way to just use those and make the pseudo function more flexible.
I'm getting close to figuring out a way of simulating structures. The part I'm getting hung up on is automatically generating offsets for members.
Code: Select all
template<std::size_t, typename...Ts> struct member_list;
template<std::size_t offset_, typename T, typename...Rest>
struct member_list<offset_, T, Rest...> {
using member_type = member<offset_, T>;
using type = list<
member_type,
typename member_list<offset_ + T::size, Rest>::member_type...
>;
};
template<typename...Members>
struct structure
{
using member_list = member_list<0, Members...>::type;
};
Code: Select all
using type = list<
member_type,
typename member_list<offset_ + T::size, Rest>::member_type...
>;
list<member<0, float3>, member<12, float3>, member<12, float3>>
So I'm kind of confused on where to go from here. I need to recursively create members and each member needs to be current offset + T::size apart. All my types are going to be 4 byte aligned, so I won't have to worry about alignment.
Any suggestions from the template pros?