Noob learns to code in 3 months

The Partridge Family were neither partridges nor a family. Discuss.
User avatar
LuisR14
Posts: 1248
Joined: May 23rd, 2013, 3:52 pm
Location: USA
Contact:

Re: Noob learns to code in 3 months

Post by LuisR14 » December 25th, 2017, 9:31 am

hahaha xD
always available, always on, about ~10 years c/c++, java[script], win32/directx api, [x]html/css/php/some asp/sql experience. (all self taught)
Knows English, Spanish and Japanese.
[url=irc://irc.freenode.net/#pchili]irc://irc.freenode.net/#pchili[/url] [url=irc://luisr14.no-ip.org/#pchili]alt[/url] -- join up if ever want real-time help or to just chat :mrgreen: --

User avatar
Yumtard
Posts: 575
Joined: January 19th, 2017, 10:28 pm
Location: Idiot from northern Europe

Re: Noob learns to code in 3 months

Post by Yumtard » March 5th, 2018, 2:16 pm

Hey guys! I'm still alive.

Haven't played around in the chili framework in quite some time but felt the itch this morning.
Made a 2d particle system. I've never done something similar before and just started coding, so I'm sure there's tons of stuff to improve upon. If anyone wants to play around with it or make it better I put it on github

https://github.com/Yumtard/2D-particlesystem

User avatar
Yumtard
Posts: 575
Joined: January 19th, 2017, 10:28 pm
Location: Idiot from northern Europe

Re: Noob learns to code in 3 months

Post by Yumtard » March 5th, 2018, 3:04 pm

Very fun to play around with the variables.

ParticleSystemData psData;
psData.birthColor = Colors::Yellow;
psData.deathColor = Colors::Red;
psData.gravity = Vec2D(0.0f, -1.0f);
psData.lifeTime = 2.0f;
psData.maxVel = Vec2D(20.0f, 5.0f);
psData.minVel = Vec2D(-20.0f, -60.0f);
psData.position = Vec2D(400.0f, 300.0f);
psData.shape = Shapes::SHAPE_CIRCLE;
psData.minSize = 2.0f;
psData.maxSize = 4.0f;
psData.spawnRate = 100;
ps = ParticleSystem2D(psData);

these makes it almost look like fire :)

User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: Noob learns to code in 3 months

Post by chili » March 11th, 2018, 3:16 pm

Imma check this one out :)

Glad to see you around Yum.
Chili

User avatar
Yumtard
Posts: 575
Joined: January 19th, 2017, 10:28 pm
Location: Idiot from northern Europe

Re: Noob learns to code in 3 months

Post by Yumtard » March 11th, 2018, 8:06 pm

Thanks :)
If you do check it out I'd be happy to hear how I can improve performance :D

I tried to look into it and I think what costs the most is drawing the particles. If I increase the size of the particles the program slows down quite a bit

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Noob learns to code in 3 months

Post by albinopapa » March 12th, 2018, 9:29 am

Here's the settings I tested with:

Code: Select all

	ParticleSystemData psData;
	psData.birthColor = Colors::Yellow;
	psData.deathColor = Colors::Red;
	psData.gravity = Vec2D(0.0f, -1.0f);
	psData.lifeTime = 2.0f;
	psData.maxVel = Vec2D(20.0f, 5.0f);
	psData.minVel = Vec2D(-20.0f, -60.0f);
	psData.position = Vec2D(400.0f, 300.0f);
	psData.shape = Shapes::SHAPE_CIRCLE;
	psData.minSize = 10.0f;
	psData.maxSize = 20.0f;
	psData.spawnRate = 10;
	psData.frameRate = 60;
For the UpdateModel function:

Code: Select all

Min frame time: 0.000002   // 2 ns
Max frame time: 0.000130  // 130 ns
Avg frame time: 0.000043  // 43 ns
For the ComposeFrame function:

Code: Select all

Min frame time: 0.000041   // 41 ns
Max frame time: 0.002345  // 2.3 ms
Avg frame time: 0.001693  // 1.6 ms
After a little tweaking of the update and draw functions
For the UpdateModel function:

Code: Select all

Min frame time: 0.000003   // 3 ns
Max frame time: 0.000099  // 99 ns
Avg frame time: 0.000026  // 26 ns
For the ComposeFrame function:

Code: Select all

Min frame time: 0.000027   // 27 ns
Max frame time: 0.001274  // 1.2 ms
Avg frame time: 0.000856  // 856 ns

For the update function of the particles, instead of looping from the end to the beginning removing particles as you go, I looped through them all, then used std::remove_if:

Code: Select all

// Particle::Advance
void Particle::Advance(float dt)
{
	m_Data.position += m_Data.velocity * dt;
	m_Data.lifeTime -= dt;

	m_BlendFactor += dt / m_TotalLifeTime;
	const auto blendFactor = static_cast< unsigned char >( m_BlendFactor * 255.f );

	m_Color = m_Data.birthColor.Blend( m_Data.deathColor, blendFactor );

	m_Dead = 
		m_Data.position.x < 0 || m_Data.position.x > Graphics::ScreenWidth ||
		m_Data.position.y < 0 || m_Data.position.y > Graphics::ScreenHeight ||
		m_Data.lifeTime <= 0.0f;
}

Code: Select all

// ParticleSystem2D::Advance
void ParticleSystem2D::Advance(float dt)
{
	m_SpawnTimer += dt;

	if (m_SpawnTimer >= (1.0f / (float)m_Data.frameRate))
	{
		m_SpawnTimer = 0.0f;
		Spawn();
	}

	for( auto& p : m_Particles )
	{
		p.Advance( dt );
		p.AddVelocity( m_Data.gravity );
	}

	m_Particles.erase(
		std::remove_if( m_Particles.begin(), m_Particles.end(),
			[]( const Particle& p )
	{
		return p.IsDead();
	} ), m_Particles.end() );
}
For the draw functions, I have an early return if nothing is to be drawn, a section of code that calculates the beginning and end of the drawing operation and made everything possible integers.

Code: Select all

void Graphics::DrawSquare(const Vec2D & pos, float size, Color color, float blendFactor)
{
	// ( if right is less_equal to 0 or left is greater_equal to screen right or
	//   if top is less_equal to 0 or bottom is greater_equal to screen bottom )
	// Early out 
	if( pos.x + size <= 0.f || pos.x >= ScreenWidth ||
		pos.y + size <= 0.f || pos.y >= ScreenHeight )
	{
		return;
	}

	// If any part on screen, calculate offsets
	const auto xStart = static_cast< int >( std::max( -pos.x, 0.f ) );
	const auto xEnd = static_cast< int >( std::min( ScreenWidth - size, size ) );
	const auto yStart = static_cast< int >( std::max( -pos.y, 0.f ) );
	const auto yEnd = static_cast< int >( std::min( ScreenHeight - size, size ) );

	// Casting once per call instead of per loop iteration
	const auto _x = static_cast< int >( pos.x );
	const auto _y = static_cast< int >( pos.y );
	const auto _size = static_cast< int >( size );
	const auto _blendFactor = static_cast< unsigned char >( blendFactor * 255.f );

	// Loop from position + offset to position + size + offset
	for( int y = yStart + _y; y < yEnd + ( _y + _size ); ++y )
	{
		for( int x = xStart + _x; x < xEnd + ( _x + _size ); ++x )
		{
			const Color dstPixel = GetPixel( x, y );
			const Color blendedPixel = color.Blend( dstPixel, _blendFactor );
			PutPixel( x, y, blendedPixel );
		}
	}
}

void Graphics::DrawCircle(const Vec2D & pos, float radius, Color color, float blendFactor)
{
	// ( if right is less_equal to 0 or left is greater_equal to screen right or
	//   if top is less_equal to 0 or bottom is greater_equal to screen bottom )
	// Early out 
	if( pos.x + radius <= 0.f || pos.x - radius >= ScreenWidth ||
		pos.y + radius <= 0.f || pos.y - radius >= ScreenHeight )
	{
		return;
	}

	// If any part on screen, calculate offsets
	const auto xStart = static_cast< int >( std::max( -pos.x, -radius ) );
	const auto xEnd = static_cast< int >( std::min( ScreenWidth - radius, radius ) );
	const auto yStart = static_cast< int >( std::max( -pos.y, -radius ) );
	const auto yEnd = static_cast< int >( std::min( ScreenHeight - radius, radius ) );

	// Casting once per call instead of per loop iteration
	const auto _x = static_cast< int >( pos.x );
	const auto _y = static_cast< int >( pos.y );
	const auto _size = static_cast< int >( radius );
	const auto radiSq = static_cast< int >( radius * radius );
	const auto _blendFactor = static_cast< unsigned char >( blendFactor * 255.f );

	// Loop from offset to size + offset
	for( int iy = yStart; iy < yEnd; ++iy )
	{
		for( int ix = xStart; ix < xEnd; ++ix )
		{
			const auto sqDist = ( ix * ix ) + ( iy * iy );

			if( sqDist <= radiSq )
			{
				const auto x = ix + _x;
				const auto y = iy + _y;

				const Color dstPixel = GetPixel( x, y );
				const Color blendedPixel = color.Blend( dstPixel, _blendFactor );

				PutPixel( x, y, blendedPixel );
			}
		}
	}
}
This allows for less condition checking and float to int conversions during the draw loops.
The .Blend function I added just to clean up the code

Now, making the size of the particles bigger and everything else remaining the same: minSize = 50, maxSize = 100

Code: Select all

// UpdateModel yours
Min frame time: 0.000026   // 26 ns
Max frame time: 0.000246  // 246 ns
Avg frame time: 0.000065  // 65 ns

// ComposeFrame yours
Min frame time: 0.031235   // 31.2 ms
Max frame time: 0.037946  // 37.9 ms
Avg frame time: 0.033004  // 33.0 ms

Code: Select all

// UpdateModel after tweaks
Min frame time: 0.000024   // 24 ns
Max frame time: 0.000153  // 153 ns
Avg frame time: 0.000038  // 38 ns

// ComposeFrame after tweaks
Min frame time: 0.022510   // 22.5 ms
Max frame time: 0.033995  // 33.9 ms
Avg frame time: 0.023709  // 23.7 ms
The advance functions are so fast as is, I don't think even SSE/AVX instructions would give you much, it might even slow things down actually in this case. The draw functions however, could probably benefit provided you could process four pixels at a time.

Just for comparison, I commented out the blend code and just passed on the color parameter:


Non-transparent times
Min frame time: 0.000326 // 326 ns
Max frame time: 0.000695 // 695 ns
Avg frame time: 0.000372 // 372 ns


Transparent times
Min frame time: 0.000027 // 27 ns
Max frame time: 0.001274 // 1.2 ms
Avg frame time: 0.000856 // 856 ns


Doing the blending in the drawing is about 2.5 times slower.

I'm working on an SSE version and so far the avg frame time is about 500 ns, so only ~70% faster, but it's not displaying correctly at the moment, so I still have some work to do before I share the code :)
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Noob learns to code in 3 months

Post by albinopapa » March 13th, 2018, 3:07 am

Ok, here's my SSE result with the minSize = 10, maxSize = 20
Min frame time: 0.000363
Max frame time: 0.000819
Avg frame time: 0.000434

Here's the result with the minSize = 50, maxSize = 100

Code: Select all

// ComposeFrame yours
Min frame time: 0.031235   // 31.2 ms
Max frame time: 0.037946  // 37.9 ms
Avg frame time: 0.033004  // 33.0 ms

// ComposeFrame after tweaks
Min frame time: 0.022510   // 22.5 ms
Max frame time: 0.033995  // 33.9 ms
Avg frame time: 0.023709  // 23.7 ms

// ComposeFrame with SSE
Min frame time: 0.005853   // 5.8 ms
Max frame time: 0.008483  // 8.4 ms
Avg frame time: 0.006656  // 6.6 ms
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Noob learns to code in 3 months

Post by albinopapa » March 13th, 2018, 3:14 am

The code is all in the DrawCircle function, I didn't modify the DrawSquare function for SSE, but here it is if you want to take a look. It's kind of messy...ok, really messy.

Code: Select all

#include <intrin.h>
void Graphics::DrawCircle(const Vec2D & pos, float radius, Color color, float blendFactor)
{
	// ( if right is less_equal to 0 or left is greater_equal to screen right or
	//   if top is less_equal to 0 or bottom is greater_equal to screen bottom )
	// Early out 
	if( pos.x + radius <= 0.f || pos.x - radius >= ScreenWidth ||
		pos.y + radius <= 0.f || pos.y - radius >= ScreenHeight )
	{
		return;
	}

	auto CalculateAlignedBoundary = []( int X )
	{
		return X & ( ~3 );
	};
	auto CalculateGraphicsBoundary = 
		[]( const Bounds& Src, const Bounds& Clip)
	{
		return Bounds{
			std::max( -Src.left, Clip.left ),
			std::max( -Src.top, Clip.top ),
			std::min( Clip.right - Src.left, Src.right - Src.left ),
			std::min( Clip.bottom - Src.top, Src.bottom - Src.top )
		};
	};

	struct Unpacked_8_m128i_16
	{
		Unpacked_8_m128i_16( __m128i Value )
			:
			lo( _mm_unpacklo_epi8( Value, _mm_setzero_si128() ) ),
			hi( _mm_unpackhi_epi8( Value, _mm_setzero_si128() ) )
		{}
		Unpacked_8_m128i_16( __m128i lo, __m128i hi )
			:
			lo( lo ), hi( hi )
		{}
		Unpacked_8_m128i_16 operator*( __m128i other )const
		{
			return {
				_mm_mullo_epi16( lo, other ),
				_mm_mullo_epi16( hi, other )
			};
		}
		Unpacked_8_m128i_16 operator+( Unpacked_8_m128i_16 other )const
		{
			return {
				_mm_add_epi16( lo,other.lo ),
				_mm_add_epi16( hi,other.hi )
			};
		}
		Unpacked_8_m128i_16 operator>>( const int Imm8 )const
		{
			return {
				_mm_srli_epi16( lo, Imm8 ),
				_mm_srli_epi16( hi, Imm8 )
			};
		}
		__m128i Pack()const
		{
			return _mm_packus_epi16( lo, hi );
		}

		__m128i lo, hi;
	};
	auto SSE_ColorBlend = []( __m128i SrcColor, __m128i DstColor, 
		__m128i BlendFactor, __m128i InvBlendFactor )
	{
		auto result = ( 
			( Unpacked_8_m128i_16( SrcColor ) * InvBlendFactor ) + 
			( Unpacked_8_m128i_16( DstColor ) * BlendFactor ) ) >> 8;
		return result.Pack();
	};
	auto SSE_IsInCircle = []( const int IX, const int IY, const __m128i RadiusSq )
	{
		// Load index to index + 3 into SSE register
		const auto mX = _mm_setr_epi32( IX, IX + 1, IX + 2, IX + 3 );
		const auto mY = _mm_set1_epi32( IY );
		const auto mxSq = _mm_mullo_epi32( mX, mX );
		const auto mySq = _mm_mullo_epi32( mY, mY );
		const auto mDelta = _mm_add_epi32( mxSq, mySq );

		// Get mask of pixels within circumference
		const auto inRange = _mm_cmplt_epi32( mDelta, RadiusSq );
		return inRange;
	};
	auto SSE_IfElseBlend = []( const __m128i ifTrue, const __m128i ifFalse, const __m128i Mask )
	{
		const auto use_true = _mm_and_si128( Mask, ifTrue );
		const auto use_false = _mm_andnot_si128( Mask, ifFalse );
		const auto blended = _mm_or_si128( use_true, use_false );

		return blended;
	};
	auto SSE_BlendPixels = 
		[this, SSE_ColorBlend, SSE_IsInCircle, SSE_IfElseBlend ](
			const int PosX, const int PosY, const int IY,
			const int xStart, const int xEnd, 
			const __m128i RadiusSq, const __m128i Src, 
			const __m128i BlendFactor, const __m128i InvBlendFactor)
	{
		for( int ix = xStart; ix < xEnd; ix += 4 )
		{
			const auto inRange = SSE_IsInCircle( ix, IY, RadiusSq );

			// If not inside circle, continue
			if( _mm_movemask_epi8( inRange ) == 0 ) continue;

			const auto index = ( PosX + ix ) + ( ( PosY + IY ) * ScreenWidth );
			
			auto* bg = reinterpret_cast< __m128i* >( &pSysBuffer[ index ] );
			const auto dst = _mm_load_si128( bg );


			// Else, do color blending
			auto result = SSE_ColorBlend( Src, dst, BlendFactor, InvBlendFactor );

			// Use inRange mask to determine which pixels will be 
			// background color or blended color
			result = SSE_IfElseBlend( result, dst, inRange );

			_mm_store_si128( bg, result );
		}
	};
	auto x86_BlendPisels = 
		[ this, color ]( const int PosX, const int PosY, const int IY,
			const int xStart, const int xEnd, const int RadiusSq, 
			const int BlendFactor, const int InvBlendFactor )
	{
		for( int ix = xStart; ix < xEnd; ++ix )
		{
			if((ix * ix)+(IY * IY) < RadiusSq )
			{
				const auto dst = GetPixel( PosX + ix, PosY + IY );

				PutPixel( PosX + ix, PosY + IY, 
					color.Blend( dst, static_cast< unsigned char >( BlendFactor ) ) );
			}
		}
	};


	// Casting once per call instead of per loop iteration
	const auto _x = static_cast< int >( pos.x );
	const auto _y = static_cast< int >( pos.y );
	const auto _size = static_cast< int >( radius );
	const auto radSq = static_cast< int >( radius * radius );
	const auto _blendFactor = static_cast< unsigned char >( blendFactor * 255.f );

	// If any part on screen, calculate offsets
	const auto bounds = CalculateGraphicsBoundary(
		{ ( _x - _size ), ( _y - _size ), ( _x + _size ), ( _y + _size ) },
		{ 0,0,ScreenWidth,ScreenHeight } );

	// Preload SSE register with color, radius and _blendFactor
	const __m128i mColor = _mm_set1_epi32( color.dword );
	const __m128i mRadSq = _mm_set1_epi32( radSq );
	const __m128i mBlendFactor = _mm_set1_epi16( _blendFactor );
	const __m128i mInvBlendFactor = _mm_sub_epi16( _mm_set1_epi16( 255 ), mBlendFactor );

	// SSE registers are 16 bytes wide, so need to start and end at a multiple of 16
	// The x86 version will pick up the rest at the end of each row
	const auto sse_xStart = CalculateAlignedBoundary( _x + ( bounds.left - _size) );
	const auto sse_xEnd = CalculateAlignedBoundary( _x + ( bounds.right - _size ) );
	const auto x86_xStart = sse_xEnd;
	const auto x86_xEnd = _x + bounds.right - _size;
	
	for( int iy = bounds.top - _size; iy < bounds.bottom - _size; ++iy )
	{
		SSE_BlendPixels( _x, _y, iy, sse_xStart - _x, sse_xEnd - _x, mRadSq, mColor, mBlendFactor, mInvBlendFactor );
		x86_BlendPisels( _x, _y, iy, x86_xStart - _x, x86_xEnd - _x, radSq, _blendFactor, ( 255 - _blendFactor ) );
	}
}
It's kind of hard to believe that there are so many more instructions yet runs almost 4 times faster for the larger circles and nearly 2 times faster for the smaller ones.

Some of the lambda functions could be moved out to be free functions so they can be reused for other drawing methods.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Teehee

Post by albinopapa » March 13th, 2018, 8:40 am

Having a little menacing fun with the particles

Image
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
Yumtard
Posts: 575
Joined: January 19th, 2017, 10:28 pm
Location: Idiot from northern Europe

Re: Noob learns to code in 3 months

Post by Yumtard » March 13th, 2018, 8:39 pm

Wow thanks albino! This was a great read. Glad to see you managed to have some fun with the system and hope it gave you a nice little challenge since you were lacking those :)

Do you mind telling me how you're doing the speed tests?
I need to look into this SSE thingy. Will watch chilis videos on the topic as soon as I've caught up on intermediate.

Post Reply