Page 1 of 1

Analysis by Chili's framework expert

Posted: October 29th, 2019, 3:44 pm
by AleksiyCODE
I just made my game use Chili's 3DFundamentals Framework instead of none-3D one (without DirectX ofc). On 1920x1080 resolution, frametime for testing room is about 0.1 sec . (see pic)
I just want to make sure that this perfomance is expected and dont want to switch to DirectX untill i get gud at using this version.
Also, is DirectX version of Framework a lot more complicated than the regular one, or are they pretty much the same?
Thanks! :D

Re: Analysis by Chili's framework expert

Posted: October 30th, 2019, 3:13 am
by albinopapa
The Hardware3D version is pretty different. Still a has a pipeline feel, but they design is way different. Best thing would be to understand what the HW3D version is doing and then you can disassemble as needed. Chili has been iteratively refactoring his code after a few modifications or videos, so it's nice watching that process as well.

Re: Analysis by Chili's framework expert

Posted: October 30th, 2019, 3:31 am
by albinopapa
To address the frame time, a retail game frame tries to be less than .033 for 30 FPS and .016 for 60 FPS. Your .096178 sec is about 10.34 FPS.

I've had my fair share of messing around with software 3D rasterizing and I've found that pixel fill rate is the biggest hurdle to overcome. It's one thing to just fill the screen with a single color or an entire sprite even, but think about how much actual work is going into deciding the color of each pixel and 1920 x 1080 is 2,073,600 pixels. If you lowered your resolution down to 1280x720 it'd be only 921,600 pixels which is less than half the amount, you might even get around 22-24 FPS if performance scales linearly.

There are some modifications you can make to the framework to be able to use multi-threading which could give you a performance boost if done well. For instance, say you have 4 cores without hyperthreading ( Intel ) or SMT ( AMD ), then you could divide the screen up into four quadrants. Each core would be sent the same vertices using the boundaries of those quadrants instead of the entire screen's boundaries. One issue with this approach though is when some quadrants have more work to do such as more geometry.

I used 4 cores as an example seeing as that's the most common core count for now.

There's also SIMD instructions. SIMD comes in two flavors for most processors 4 element SSE and 8 element AVX instructions. This means you can do one operation on either 4 or 8 elements at the same time, which can give you up to a 400% to 800% boost in throughput ( realistically though expect around 2.5x to 3.5x for SSE and 5x to 7x for AVX even if done correctly ). The current rasterization method used in the 3D Fundamentals framework isn't geared for SIMD instructions since some triangles are going to be less than 4/8 pixels wide.

Another method of rasterization uses barycentric coordinates. Once you find the barycentric coordinates of a pixel, you can determine if they are inside the triangle. If they are, you can then use them for interpolation of everything else from the Z coordinate and normal to texture coordinates. This method can easily use SIMD instructions since you can test 4/8 pixels at a time. I'd be interested to see what kind of performance you'd get switching to barycentric coordinates and using SIMD instructions with multi-threading LOL, ( I'm not asking for much am I? ).

EDIT: So, I just modified the framework to use barycentric coordinates using the default scene ( monkey head, the waving flag and the light in the room ) and went from ~18 fps to ~10 or 11 fps. A net loss for just using x86 code. The biggest problem with the BC method is how you do the tests. My naive approach is to find the bounding box of a triangle using just the X and Y coordinates and iterate through the pixels in the bounding area. Well, a triangle is half the size of the box so I'm wasting half the time on pixels that are never going to be inside the box. You can speed this up by calculating the edges of the triangle and using those as the start and end points, but then you might as well be using chili's scanline rasterization method.

Re: Analysis by Chili's framework expert

Posted: October 30th, 2019, 10:37 am
by AleksiyCODE
Hmm, just changing resolution seems to be good enough for now.

I've watched Chili's videos on multithreading and SIMD (well, i've watched all Chili's videos except DirectX ones).
SIMD seemes a bit too difficult for now, but i really want to try implementing multithreading. Ill divide the screen in 4 (or 6 to match my processor) parts as you suggested. Hope it won't be to difficult. But for now Im messing with projections etc. Want a plane to stay at a fixed point in worldspace, but to always face camera. Seems doable

Thanks for the answer!
p.s.: ill also check out barycentric coordinates