Sprites are kinda slow

The Partridge Family were neither partridges nor a family. Discuss.
Post Reply
SunTroll
Posts: 22
Joined: April 19th, 2019, 8:25 pm

Sprites are kinda slow

Post by SunTroll » March 26th, 2020, 3:03 am

Don't misunderstand me they are great but when I compared the draw speed of sprite, noChromaSprite, and just a yxPutPixelLoop I was able draw:
4000, 6500, 11000 sprites in simple for loop (before it started to stutter in release64).
That's about 1.6 performance up for nonChroma and another 1.6 for loop and about 2.7 for the sprite to loop (to be fair I'm not sure if the compiler didn't optimize the shit out of the loop but still).
(I was using the sprite from tut I11 so 32 by 48 iirc.)

From what I gathered there will be more optimized way to draw in tut I20 but that's too far (A: I want to make a game and take a break from learning for a while (also learning blender) and more importantly B: I want to get some practice in).

So I was wondering if I would get more performance if instead of storing the pixels as array of colors I stored them as array of "pixels"?

imagine the following image:
+ is rendering pixel
= is transparent pixel

=+=
+++

in this case we store it as:
Colors[6];

if instead of that I stored it as:
class Pixel
Color
int x
int y

Pixel[4]

wouldn't that then draw faster?
Shouldn't it get near the same performance as the PutPixel novels?
(Did the put pixel novels have the same performance as just drawing square with for loop?)
In theory they should have even better performance since you will be drawing less pixels but not sure if that is the case since for loop takes pretty much no memory and the novels do. So that may be also matter of cpu caches (no idea).
If the PutPixelnovel has better performance and assuming most things are roundish that's another roughly 1.3 perf gain.

I suppose the answer may just be "test it" but was just wondering if that isn't something someone already did (or is just plain obvious for someone with more experience).
Last edited by SunTroll on March 26th, 2020, 6:47 pm, edited 2 times in total.

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Sprites are kinda slow

Post by albinopapa » March 26th, 2020, 9:01 am

Not sure what some of the numbers represent, but honestly you shouldn't have too many issues displaying enough sprites to cover the 800x600 pixel window.
4000, 6500, 11000
What are these numbers?
That's about 1.6 performance up for nonChroma and another 1.6 for loop and about 2.7 for the sprite to loop
What are you trying to say here?

How big are your sprites? ( Width x Height )

Are you skipping sprites that aren't in view?

The Pixel class route might be faster if more of the image was transparent than opaque.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

SunTroll
Posts: 22
Joined: April 19th, 2019, 8:25 pm

Re: Sprites are kinda slow

Post by SunTroll » March 26th, 2020, 6:46 pm

albinopapa wrote:
March 26th, 2020, 9:01 am
4000, 6500, 11000
What are these numbers?
That is the number of sprites I was able to draw before I started to stutter.
albinopapa wrote:
March 26th, 2020, 9:01 am
That's about 1.6 performance up for nonChroma and another 1.6 for loop and about 2.7 for the sprite to loop
What are you trying to say here?
how many more sprites I was able to draw with the faster code (tested with the sprite from tut I11).

Made an edit to the original post to hopefully make it more clear.

I don't have any sprite by now but my quess would be about 20x20 to 30x30 for the most numerous ones.

If you look on some bullet hell games the number of bullets on screen can be pretty insane.
(to clarify I don't expect to make a game of this quality but it works as example):
https://youtu.be/e4Gll_61anw?t=81
https://youtu.be/wvhTT7GtZtA?t=482
seems to be about 500 to 1000 by eyeballing it (probably closer to 500).
probably the most extreme case:
https://youtu.be/Inq1LYYxyt0?t=48
This is hard to even eyeball but about around 3000 would be my quess.

Now my sprites would be about half the size of the one I tested with but I also need to draw the background and enemies so that will also take some time and then the update so realistically I would expect to have enough time do draw about 1000 to 2000 bullets wich is probably enough but still made me consider to try doing it in faster way.

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Sprites are kinda slow

Post by albinopapa » March 26th, 2020, 8:12 pm

Well, you have a few options.

The 2016 chili framework is all software based, there is not GPU acceleration. So to get 2,000 sprites of 30x30 ( 900 pixels ) drawn with chroma key transparency using this framework, you might look into some SSE/AVX instructions and maybe some multi-threading.

Chili does have a hardware 3D tutorial series that uses D3D11 hardware acceleration for graphics, so at least the sprites will be drawn quickly, you'll just have to optimize your updates of all the objects before drawing.

900 pixels per sprite times 2,000 sprites is 1.8 million pixels. Assuming your resolution stays at 800x600, that's another 480,000 for the background. You're looking to push at the very least 2.28 million pixels per frame. With GPU acceleration this would be child's play, but considering all the CPU has to do aside from blasting pixels, this is going to be a bit difficult.

That being said, it's not entirely impossible with SSE/AVX instructions and the correct memory layout.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Sprites are kinda slow

Post by albinopapa » March 27th, 2020, 5:05 am

I was able to get around 1,100 sprites ( custom Alpha effect ), text ( Chroma effect ) and a background image ( Copy effect ) in x64 release mode and was able to hover around 60 fps sometimes dipping to around 54 fps.

Details:
The "bullet" sprites were alpha blended and were 32x32 pixels and a custom alpha blend effect
The font used was the fixed_sys font that comes with the chili framework and the chroma effect
The background image was 800x2400 clipped to the window resolution of 800x600 and the copy effect

I was able to go over 2,000 but started dipping into the 40s for fps, but still playable. This was without any of the things I mentioned ( SSE/AVX or mutli-threading ). No collision detection though, that would slow it down some.

Image

With over 1100 32x32 bullet sprites, the screen is almost entirely filled. I guess what I'm trying to say is if you keep your resolution at 800x600 and limit the amount of ammo being fired, you can still get a playable frame rate and have a lot of ammo on screen.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Sprites are kinda slow

Post by albinopapa » March 27th, 2020, 5:30 am

The forum only allows resolutions of 700x700, so this isn't the entire window. I could have scaled it down and probably should have.

The 30 spawners text is how many objects are firing the "bullets".
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

SunTroll
Posts: 22
Joined: April 19th, 2019, 8:25 pm

Re: Sprites are kinda slow

Post by SunTroll » March 27th, 2020, 7:17 pm

Yeah that seems workable but Is chili framework locked to 60 fps or does it run faster when connected to faster monitor?

Also wonder how hard it would be to seperate the draw and updating thread?

I did a super simple multithreading test about a weak ago and setting that up was fairly easy.

Code: Select all

#include <iostream>
#include <chrono>
#include <thread>

static constexpr int testsize0 = 5000000;
static constexpr int testsize1 = 5000000;

void aa(float val, float* const retVal)
{
	for (int i = 0; i < testsize0; i++)
	{
		val -= 3.1f; //0
		val -= 1.1f; //1
		val += 22.5f;//2
		val -= 5.2f; //3
		val -= 2.2f; //4
		val -= 3.2f; //5
		val -= 1.2f; //6
		val += 4.2f; //7
		val -= 4.2f; //8
		val -= 2.2f; //9
	}
	*retVal = val;
}

void bb(float val, float* const retVal)
{
	for (int i = 0; i < testsize1; i++)
	{
		val += 3.7f; //0
		val += 1.7f; //1
		val += 22.6f;//2
		val -= 5.4f; //3
		val += 2.4f; //4
		val += 3.4f; //5
		val -= 1.4f; //6
		val += 4.4f; //7
		val += 4.4f; //8
		val += 2.4f; //9
	}
	*retVal = val;
}

float cc(float val)
{
	for (int i = 0; i < testsize0; i++)
	{
		val += 3.7f; //0
		val += 1.7f; //1
		val += 22.6f;//2
		val -= 5.4f; //3
		val += 2.4f; //4
		val += 3.4f; //5
		val -= 1.4f; //6
		val += 4.4f; //7
		val += 4.4f; //8
		val += 2.4f; //9
	}
	return val;
}

float dd(float val)
{
	for (int i = 0; i < testsize1; i++)
	{
		val += 3.7f; //0
		val += 1.7f; //1
		val += 22.6f;//2
		val -= 5.4f; //3
		val += 2.4f; //4
		val += 3.4f; //5
		val -= 1.4f; //6
		val += 4.4f; //7
		val += 4.4f; //8
		val += 2.4f; //9
	}
	return val;
}



int main()
{
	while (true)
	{
		char c;
		std::cout << "Enter what test you want to do (0-9) or (q)uit: ";
		std::cin >> c;
		switch (c)
		{
		case '0':
		{
			//single
			std::cout << "Started test 0\n";
			using namespace std::chrono;
			steady_clock::time_point begin = steady_clock::now();
			float x = 0;
			float y = 0;
			aa(1.0f, &x);
			bb(-1.0f, &y);
			std::cout << "        ignore this(" << x << y << ")\n";
			steady_clock::time_point end = steady_clock::now();
			float dur = duration<float>(end - begin).count();
			std::cout << "test took:" << dur << " s";
			std::cout << "Finished test 0\n";
			break;
		}
		case '1':
		{
			//multi
			std::cout << "Started test 1\n";
			using namespace std::chrono;
			steady_clock::time_point begin = steady_clock::now();
			float x = 0;
			float y = 0;
			std::thread one(aa, 1.0f, &x);
			std::thread two(bb, -0.1f, &y);
			one.join();
			two.join();
			std::cout << "        ignore this(" << x << y << ")\n";
			steady_clock::time_point end = steady_clock::now();
			float dur = duration<float>(end - begin).count();
			std::cout << "test took:" << dur << " s";
			std::cout << "Finished test 1\n";
			break;
		}
		case '2':
		{
			//control single
			std::cout << "Started test 2\n";
			using namespace std::chrono;
			steady_clock::time_point begin = steady_clock::now();
			float x = 0;
			float y = 0;
			x = cc(1.0f);
			y = dd(-0.1f);
			std::cout << "        ignore this(" << x << y << ")\n";
			steady_clock::time_point end = steady_clock::now();
			float dur = duration<float>(end - begin).count();
			std::cout << "test took:" << dur << " s";
			std::cout << "Finished test 2\n";
			break;
		}
		case '3':
		{
			//control empty
			std::cout << "Started test 3\n";
			using namespace std::chrono;
			steady_clock::time_point begin = steady_clock::now();
			float x = 0;
			float y = 0;
			std::cout << "        ignore this(" << x << y << ")\n";
			steady_clock::time_point end = steady_clock::now();
			float dur = duration<float>(end - begin).count();
			std::cout << "test took:" << dur << " s";
			std::cout << "Finished test 3\n";
			break;
		}
		case '4':
			std::cout << "Started test 4\n";
			std::cout << "Finished test 4\n";
			break;
		case '5':
			std::cout << "Started test 5\n";
			std::cout << "Finished test 5\n";
			break;
		case '6':
			std::cout << "Started test 6\n";
			std::cout << "Finished test 6\n";
			break;
		case '7':
			std::cout << "Started test 7\n";
			std::cout << "Finished test 7\n";
			break;
		case '8':
			std::cout << "Started test 8\n";
			std::cout << "Finished test 8\n";
			break;
		case '9':
			std::cout << "Started test 9\n";
			std::cout << "Finished test 9\n";
			break;
		case 'q':
			std::cout << "Exiting\n";
			break;
		default:
			std::cout << "Invalid imput!\n";
			break;
		}
		if (c == 'q')
		{
			std::cin.clear();
			std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
			break;
		}
		std::cin.clear();
		std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
	}
	return 0;
}
Kinda feel bad for you doing so much work (didn't expect you would go out and code a test with sprites and all the stuff) so thanks a lot.

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Sprites are kinda slow

Post by albinopapa » March 27th, 2020, 9:26 pm

The chili framework is capped at v-sync, so 60fps for 60hz, 75fps for 75hz. I think the framework has the refresh ratio set incorrectly though, it has 1/60 instead of 60/1, but I haven't noticed it affect anything.

As far as the mt stuff goes, seems to be fine, though you might even look into using std::async and returning values from functions. Std::async will return a future that stores the result of a function when the function has completed. You can then do other things while the function is running and retrieve the future whenever needed. The thread retrieving the future waits for the result if it isn't ready, and simply returns the value if it is. It's like using std::thread and std::condition_variable in one easy interface.

Don't feel bad, I know I didn't have to, but I was kind of curious myself and have been wanting to create something a little less intense, but something similar to a "bullet storm/hell" type game. I don't have the creativity it seems because my game is still on the back burner until I can come up with behavior patterns for the enemies. Vivid colors, lot's of bloom ( jk ) and fast fluid gameplay are kind of stimulating.

For splitting up the update and rendering threads, there are more than a few ways to do so.

You could have the Update thread running in it's own thread updating all the time or sleep for the remaining difference between time taken to update and frame update time.
this_thread::sleep( 1.f / 60.f - dt ) for instance, this way it only updates once every 16 milliseconds for 60 fps. If the update thread runs without sleeping, you'd have shorter time steps so more accurate collision detection and resolution.

The Render thread would just need to query the objects needing to be rendered to know where to render.

If everything is going to be sprite based, you'd only need to create a struct with position and a pointer to each sprite needing to be drawn.

Option 1: Double buffer ) Have two buffers of this struct, one for reading and one for writing. Update the objects and at the end of each update cycle, fill in the write buffer. While filling in the write buffer, the render thread will be reading from the read buffer and rendering using that information. When the render thread is done, it can signal the update thread to swap the buffers. When the update thread gets done writing to the write buffer, and the render thread signals that it's done, you can swap the buffers so now the read buffer is the write buffer and the write buffer is the read buffer.

Option 2: Task buffer ) Create a single buffer of the position and sprite pointer struct and on each update cycle push a copy of each position and sprite pointer onto the buffer. Once the update cycle is completed, you can send the buffer to the render thread. The render thread will render asynchronously while the update thread prepares the next batch. Of course, updating will be faster, so you will still have to synchronize when the update thread is done or when the render thread is done. If you synchronize when the render thread is done, you can do multiple update cycles while waiting then when signaled fill the buffer and send to render thread.

There are probably more options, but those are the two that popped in my head.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

SunTroll
Posts: 22
Joined: April 19th, 2019, 8:25 pm

Re: Sprites are kinda slow

Post by SunTroll » March 28th, 2020, 12:19 am

I was having something in mind not sure how easy it would be.

From what I know if you have something like:

Code: Select all

bullet
{
	pos;
	vel;
}
and one thread tries to read pos while another thread is writing into it that is BAD.

But it should be fine having something like:

Code: Select all

bullet
{
	pos;
	vel;
	renderPos;
}
and then one thread writes into pos while other is reading from renderPos.
Is that fine?

I mean then I could have something like:

Code: Select all

Thread0
{
	GetDeltaTime();
	start Thread1;
	DrawBullets(renderPos);
	wait for Thread1; //if it by some miracle isn't done yet also stop/pause it
	CopyPosToRenderPos();
}

Thread1
{
	UpdateBullets(deltaTime);
}
I'm not aware how exactly threads work. That little experiment in previous post is all I have done with them, so take the "code" above as a very rough example of what I was hoping I could do.

Could also run the draw in thread 1 and the rest in 0 but since draw interacts with different parts of chiliFramework I thint that may be a bit harder.

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Sprites are kinda slow

Post by albinopapa » March 28th, 2020, 6:53 am

Different threads can use the same functions and read from the same variable just fine, it is as you say a bad thing to read and write to the same variable from different threads at the same time. What you have should be fine.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

Post Reply