Barycentric or Line Raster

The Partridge Family were neither partridges nor a family. Discuss.
albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Barycentric or Line Raster

Post by albinopapa » December 17th, 2017, 8:01 am

LINE_RASTERIZED
------------------------
Min fps: 157.119720
Max fps: 613.858459
Avg fps: 449.799988

BARYCENTRIC_RASTERIZED
------------------------
Min fps: 170.393875
Max fps: 408.886597
Avg fps: 342.000000

HYBRID_RASTERIZED
------------------------
Min fps: 190.675598
Max fps: 475.329254
Avg fps: 384.399994

HYBRID_RASTERIZED ( no checking if in triangle )
------------------------
Min fps: 118.528389
Max fps: 525.122131
Avg fps: 427.000000

The motivation behind this was simple, there are fewer steps involved in calculating barycentric coordinates and using those coordinates to interpolate between vertices than the line rasterizing technique taught by chili. So I thought barycentric rasterization would be equal or faster than the line rasterizing. Doesn't look like that is the case. The measurements were done using the default scene of the 3D fundamentals, without manipulation of the scene, no user input.

Now, granted, my algorithm iterates over a bounding box of each triangle when figuring out when to start drawing and interpolating pixels and the rest of the vertices. This means that half the time, it's wasting it's time.

I'm going to try splitting the triangles up like chili does to find the edges, then use the barycentric method for interpolating the vertices and see if that is any better, but honestly, line rasterization is probably going to win no matter what.

UPDATE: Added hybrid rasterization ( using line raster to find edges and prestep, then interpolation done using barycentric coordinates. A speed gain to be had, but still not as fast as pure line rasterization.

With line rasterization being 100%
Barycentric rasterization is about 76% as fast
Hybrid rasterization is about 85% as fast.
Hybrid rasterization no in bounds checking is about 95% as fast

UPDATE 2:
Since I'm using the edges for begin and end points, I removed the check to see if the point was in the triangle...duh it should be. This is now pretty close to the scan line speed. The check to see if in triangle requires like 6 comparisons, three variables need to be between 0 and 1 in order for the point to be in triangle.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: Barycentric or Line Raster

Post by chili » December 17th, 2017, 11:41 am

You can do barycentric with SSE/AVX pretty easily. That would allow it to blow the scanline impl out of the water.
Chili

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Barycentric or Line Raster

Post by albinopapa » December 17th, 2017, 1:08 pm

Yeah, I'm thinking of ways to accomplish this without just doing the bounding box, but anything else seems too complicated hehe.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: Barycentric or Line Raster

Post by chili » December 18th, 2017, 11:57 am

Yeah, it would be a little complicated. Still would be fun implement, but only so many hours in the day.
Chili

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Barycentric or Line Raster

Post by albinopapa » July 13th, 2018, 8:36 pm

So revisiting this and am getting very poor results.

Instead of using the 3D fundamentals framework, I decided to start from scratch using just the chili framework 2016. I wanted to just do some 2D sprite drawing with rotations and scaling using matrix transforms, so figured wouldn't need all the 3D conversions to 2D.

My first attempt again was to just calculate a bounding box around the min and max values of four vertices making up a sprite rect. Then cycle through each pixel from left to right, top to bottom checking the two triangles for intersection at each pixel. I chose this route because I knew there were only ever going to be two triangles for each sprite, so I could use std::array<Triangle, 2> which is stack allocated.

This also made things easier to use SSE/AVX with as I have two routes to take, either test four/eight pixel locations at a time, or the same pixel location on four/eight triangles at a time. I only really tested SSE, and since I am only dealing with two triangles, I chose the four pixel locations at a time route.

In the x86 case, this meant, for every pixel in the bounding area, I would test for intersection of two triangles each iteration. In the SSE case, I would test intersections of four rays against one triangle then the same four rays against another triangle each iteration.

In the next test scenario, I sorted the vertices and spliced the triangles in the same manner chili does in 3D Fundamentals. Then interpolate down the edges of the flat bottom and flat top triangles just to use barycentric coordinates to interpolate texture coordinates.

The results were kind of shocking to me. With the first test of using a bounding area to iterate through, the x86 code gave me around 80 fps on a sprite that was the size of the 800x600 window then rotated 45 degrees. Using SSE only game me about +12 fps resulting in ~92 fps. The second test case where I only iterate over the triangle almost halved the fps down to 47 using x86 code. I haven't attempted using SSE instructions as of yet, but this is a strange result for me and I'm not sure why iterating over the triangle vs iterating over the bounding area of the triangle is so much worse. Especially, since in the previous post above I claim it to be near equal to plain line rasterization.

I can't compare these results to the 3D fundamental results because I never tested the case where the waving Saurans Butthole ( the test I used previously ) covered the entire screen. I'm merely comparing the deltas between line rasterization and using barycentric coordinates.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Barycentric or Line Raster

Post by albinopapa » July 13th, 2018, 9:57 pm

For anyone wondering what the hell barycentric coordinates are, well, I don't know what they are lol. They can be used to determine a how close a given point is to each of the three vertices of a triangle. If the point is equal to vertex A, the barycentric coordinates would look something like ( 1.0, 0.0, 0.0 ) as you are on vertex A and no where near B or C. In the middle of the triangle, they would probably be ( .33, .33, .33 ).

The way I learned and came to understand how to calculate the BC is to find the area of the triangle, then divide the triangle into three sub trangles using the point you wish to test as the new vertex third vertex of each sub triangle. The ratio of the areas each sub triangle to the area of the original triangle is the barycentric coordinate.

See diagram: Image

Now, the area of a triangle is base x height x half, but in order to use that method, you have to get the lengths of a side and the length of the vector that goes from one vertex to the mid point of that side. That would require 2 sqrts for the whole triangle, then two for each sub triangle. Enter the cross product. The cross product allows you to get the area of any parallelogram without the need for square roots to get the lengths of the edges. To calculate the cross product of a 2D vector ( Vec2 ) just cross multiply the x and y components, the subtract the results: ( v0.y * v1.x ) - ( v0.x * v1.y );

So if that's the area of a parallelogram, and a triangle is half that, multiply by half to get the area of a triangle. To find v0 and v1, you'd subtract two of the points of the triangle, from the third: ( v0 = B - A, v1 = C - A ).
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Barycentric or Line Raster

Post by albinopapa » July 16th, 2018, 7:57 pm

Ok, even more weirdness. I took an image ( 1920 x 1200 ) and scaled it to the 800 x 600 width of the framework window and rotated it 45 degrees. Over 5 seconds it renders 212 times, about 42.4 fps. I then wondered what would happen if it covered the entire screen, since when I rotated it the corners were off screen and the corners of the window were blank. I figured this would be more taxing since I would be filling in more pixels. However, this was not the case. Scaling the image 2x and even 3x raised the frames rendered to 366 and 363 respectively, about 73.2 and 72.6 frames per second.

The only thing that I can think of making such a huge difference was branch prediction and fewer branch misses. This would allow the code path to be cached both data and instructions. This is still using the calculated edges of the triangle instead of the bounding area of the four vertices making up the transformed sprite to iterate over the triangle.

I still need if checks because of the nature of integer pixel locations and the floating point positions of the triangles, especially after transformation, don't exactly line up.

If a vertex's X position component is at 320.2f, I'd start iterating at 320. During the calculations for the barycentric coordinates, I'd use this 320 value which is outside the triangle. Same would go for the opposite edge, fxStart = 320.2f, fxEnd = 340.9f, xStart = 320, xEnd = 341. That's a real width of 20.7, but an iteration of 21 pixels in which the first and last pixel of each row is probably going to report as not being inside the triangle.

I could try rounding up on the start and truncating the end, but I'm wondering if that would produce holes between adjoining triangles. In the case above, one row would start at 321 and end at 340, the adjoining row would then start at 341, so maybe not.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Barycentric or Line Raster

Post by albinopapa » July 16th, 2018, 8:22 pm

Ok, so not the branch prediction as previously thought. I removed the branching completely, relying solely on the fact that the coordinates are guaranteed to be within the triangle and still got:

Code: Select all

Frames rendered: 218, in 5 seconds.
Average framerate: 43.600002
So now I'm thinking that the calculations for edges is the culprit. In smaller triangles, the edges would have to be calculated more often while in larger triangles like the ones that take up the entire screen, only the barycentric coordinates would be calculated for the entire row. There would still be more edges to calculate, but more work is being done between edge calculations.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Barycentric or Line Raster

Post by albinopapa » July 16th, 2018, 9:00 pm

Ok, so I have a new theory or expanding the old one. In each iteration of the outer Y loop, I have to calculate the xStart and xEnd which would change the range of the loop. So the time spent on calculations of the edges probably isn't the bottleneck, but the fact that the range changes possibly causing caching issues as each set of X iterations would be different.

Then again, it could just be as I said earlier, when it takes up the entire screen it only has to calculate the edges once for the entire 800 pixel row, meaning it can unroll the loop a couple times since all X iterations would be the same just incrementing X. Maybe it can vectorize the code easier, this could also be why just iterating over a bounding area is faster, the compiler might be able to vectorize or unroll the loop easier.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Barycentric or Line Raster

Post by albinopapa » July 16th, 2018, 11:20 pm

Ok, so I'm gonna call it.

Here are some results:

Code: Select all

// Scale is 1x window size rotated 45 degrees
Frames rendered: 216, in 5 seconds.
Average framerate: 43.200001

// Scale is 2x window size rotated 45 degrees
Frames rendered: 360, in 5 seconds.
Average framerate: 72.000000

// Scale is 1x window size rotated 0 degrees
Frames rendered: 452, in 5 seconds.
Average framerate: 90.400002
I changed the benchmark code a bit, but here is the repo link.
Sprite rendering using Chili Framework
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

Post Reply