crickets..

MrGodin · Post by **MrGodin** » January 13th, 2018, 3:54 pm

It's so quiet, I can hear a ctrl-alt-delete being dropped

Post by **albinopapa** » January 13th, 2018, 6:15 pm

I know. Well, so goes the chili community. Everyone uses discord and drops comments on the vids, not much left to say here. Kind of sucks though for me as I don't want to keep up with the discord chat server. I like being able to have record of conversations here on the forum as well as being able to actually see when someone needs help. I don't have to constantly watch, I can pop in and out and help, participate or whatever. Discord the chats happen so fast, I don't really feel useful hehe.

MrGodin · Post by **MrGodin** » January 13th, 2018, 11:16 pm

I get that.

MrGodin · Post by **MrGodin** » January 13th, 2018, 11:17 pm

albinopapa, what kind of projects you working on these days ?

Post by **LuisR14** » January 14th, 2018, 6:37 am

me is just sad that few people join to my sig ... and they don't stay on for long :/

Post by **albinopapa** » January 14th, 2018, 7:38 am

Well, I've been meaning to post something actually, but not finding the words I want to use or thoughts I want to convey lol. I'm working on a SIMD and SIMD math library. I've mostly got the float portion worked out, just haven't decided on a few things. The __m128 sse type is a union of doubles, floats and all the integral types except bool. I haven't decided if I want to encapsulate that in a class with all the instrinsics wrapped in member functions, which is useful for chaining operations and having all the possible operations in one place or just do what the DirectXMath library does and alias the data type ( XMVECTOR is an alias for __m128. The reason I'm kind of against encapsulating in a class is really a non issue IF I do wrap all instrinsics.

I am using the chili's 3D Fundamentals framework as a testing ground. I originally did have a __m128 member encapsulated in a Float4 class, but was only getting around 75-80 fps and for the original code was getting around 120-130 fps using a 46,000 mesh model. After refactoring and aliasing the type to float4 ( just lower case f ) and wrapping the intrinsics in global functions and some other things, I got the framerate up to ~90 fps, a 12% improvement. I think one of my original issues causing the slowdown was I had a convertion operator ( operator __m128 ) and an implicit constructor ( Float4( const __m128& rhs ); ) so I think there was a lot more copying than needed.

Now I think the issue might be that because SIMD is most useful with arrays instead of structures, I'm doing too many swizzle operations ( shuffling ). Which I've replaced _mm_shuffle_ps with _mm_blend_ps and _mm_insert_ps where it makes sense.
_mm_shuffle_ps allows you to shuffle the contents of a single register or two registers. Sometimes it can take two shuffle operations to get the data in the order you need.
_mm_blend_ps allows you to cherry pick which elements you want to combine, but it's a one to one, so if you pick element 0 from register A it will be element 0 of the result;
_mm_insert_ps allows you to pick one element from a register and shuffle it to some other position, like moving element 2 to element 0, but it's only one merge at a time.

On this mesh, at the default Z depth, a triangle only takes up like 1 or 2 pixels, so I don't think there's much to be gained in the nested raster loops. Normally, this should be where SIMD shines, but I think I need to convert to structure of arrays to get the most benefit...

I added bilinear filtering to the texture by adding a sampler class. Currently, all that is done in x86 code, haven't done any SIMD integer stuff in a while, but that's where I'm heading next, after figuring out the Float4 encapsulation or float4 alias stuff.

Post by **albinopapa** » January 14th, 2018, 7:39 am

Wow, Luis, it's been awhile. How have you been?

Post by **LuisR14** » January 14th, 2018, 8:24 am

I've been well. Been {dealing with|learning about} spoken language stuff. But, I've been lurking periodically/occasionally. o.o I'm mostly just here tho.

MrGodin · Post by **MrGodin** » January 14th, 2018, 5:23 pm

Cool albinopapa, i haven't done anything with SIMD. i tend to bounce around from idea to idea lol. I really should stick with some project and complete it

You might want to have a peek at this, i found it a while back never used it but found it interesting. May give you some ideas?
https://github.com/pelletier/vector3/bl ... /vector3.h

Post by **albinopapa** » January 15th, 2018, 6:10 am

Thanks.

I've read that accessing the elements inside a simd register slows things down so I don't think I like this approach...regarding the union at the bottom.

Also, the subscript operator[] does exactly that, there are intrinsic functions to extract single elements from the registers like _mm_extract_ps and _mm_store_ss. The _mm_store_ss only extracts the first element so you'd have to shuffle first to get an element to be in the first position. Luckily, shuffles according to the Intel intrinsics guide only take one clock cycle if I'm reading it correctly. The longest part would be storing the result of extract or store into memory. Sorry, my point is, this was designed to be able to mix x86 code with SIMD code...that's not my goal.

What I've decided to do is make the __m128 member private in the wrapper classes for float4 and others. The intrinsic functions are wrapped in global friended functions so they can access this __m128 member without having to do implicit conversions.

Currently, I have done a few classes:
bool4f
- overloads operator bool() returns true if all elements are -1 otherwise false
- has seven helper functions
- IsAllTrue, IsAllFalse, IsSomeTrue
- Is_A_True, Is_B_True, Is_C_True, Is_D_True
bool4i - same as bool4f, the reason for different classes is how A,B,C and D are determined to be true
float4 - In class overloaded math assignment operators and some others like logical comparison operators that result in a bool4f.
int4 - Same as float4

// Vector names may change, don't like the naming scheme
SseVec2 - Array of structure layout with methods for length, lengthsq, dotproduct, crossproduct and normalize
SseVec3 - Same as SseVec2
SseVec4 - Same as SseVec2
Vec2p - Structure of arrays layout with same methods as the other vector types
Vec3p - Same as Vec2p
Vec4p - Same as Vec2p
SseMatrix3x3 - bare bones, no math operations defined
SseMatrix4x4
- Static functions: Identity, RotationX/Y/Z/and others, Translation and Scaling.
- Member functions: Transpose, operator* with SseVec3 and SseMatrix4x4
SseQuaternion - Don't ask, just found a YT video and did what he did, but with SIMD instead of x86 code.
- Member functions: operator* with an SseVec3 and another SseQuaternion, unary operator- returns the conjugate quaternion.

I still want to add the other integral class wrappers for 8 and 16 bit operations. I might rename to aosvec2 and soavec2 instead of SseVec2 and Vec2p ( p is for packed ) or maybe vector2aos and vector2soa. The problem is I also want to add AVX support to this library, so I need a way of distinguishing the two sets of vectors apart, because I know at some point, me or someone else will do a using namespace statement in the header file and cause ambiguities.

Other things I want to do would be to setup a few algorithms that use simd, but I'm afraid I don't know of more than alpha blending at the moment and linear interpolation which are practically the same thing, just done differently in integer SIMD vs float SIMD. I suppose bezier curves would be a good one also.

Planet Chili

crickets..

crickets..

Re: crickets..

Re: crickets..

Re: crickets..

Re: crickets..

Re: crickets..

Re: crickets..

Re: crickets..

Re: crickets..

Re: crickets..