Coding Challenge 5 - Meme Graveyard Keeper

thesmallcreeper · Post by **thesmallcreeper** » November 7th, 2018, 3:03 am

albinopapa wrote:I've been looking ( not extensively ) for this info on AMD chips, where did you find this? Intel has the intrinsics guide which has a lot of this info there, but haven't found anything similar for AMD.

There you go dude

https://www.agner.org/optimize/instruction_tables.pdf

albinopapa wrote:I personally never got any speed increase from loop unrolling. In most of my trials, the compiler usually unrolled the loops already or there wasn't enough work to be done between the loads/stores, or was probably already memory bandwidth limited. The prefetcher gets 64 bytes ( four SSE lanes worth ) in an array as it is, so four single iteration loops are already cached if you are processing arrays. This is probably the reason why unrolling usually doesn't do much of anything.

I use a instruction (VPCMPGTQ) that on Haswell has latency of 5 and throughput 1 CPI, so by "unrolling" the AVX loop I can complete (in theory) 2 VPCMPGTQ instructions in 6 clocks

.. thus the speed up

Post by **albinopapa** » November 7th, 2018, 4:48 am

Good luck getting the compiler to listen to instruction order.

Although, I have notices at times, loading data in different orders changes speed as well. Interleaving loads and calculations hasn't worked out for me in the past. Doing it the C way, load a bunch, do calculations, then store seems to work best in my experience. Probably because by the time I get to the calculation portion, the first loads I need to use are done loading, thus preventing a stall, just speculation of course, I have no idea about how sensitive all this shit is. I only know what I've experienced and even then, things seem to change right under my nose sometimes.

It could also be my memory fails me which could definitely be the case.

Thanks for the info by the way.

thesmallcreeper · Post by **thesmallcreeper** » November 8th, 2018, 10:45 pm

Looks like I had a bad approach to the challenge so far ://
I dont have too much time on my desk left and I want to do something else, so maybe a white flag from me...

glhf to everyone still pushing his code

cyboryxmen · Post by **cyboryxmen** » November 14th, 2018, 11:40 am

Wait, was the challenge over already?

That's too bad. I just finished an SIMD sorting algorithm that could complete it in 40ms single threaded too. Guess I'll just show everyone the algorithm once my new parallel computing language is done.

Post by **chili** » November 15th, 2018, 6:31 am

Yeah, simd sort is the only real effective way of optimizing this one past the standard impl + exepol. No way I was gonna do that tho.

jacobaea · Post by **jacobaea** » December 5th, 2019, 4:24 pm

Code: Select all

#include <iostream>
#include <vector>
#include<iterator>
#include <algorithm>
#include <cctype>
#include <string>
#include <list>

using namespace std;

int main() {
	vector<int> alpha = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 };
	vector<int> beta = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 };

	int value, offset, quantity;

	string a;
	bool session = true;
	while(session) {
		cout << "proceed?";
		cin >> a;
		if (a != "y") { session = false; }
		cout << "Input value, starting index number; and how many indexes thereafter to add value to:\n";

		cin >> value >> offset >> quantity;

		transform(alpha.begin() + offset, alpha.end() - (alpha.size()-quantity) + offset, alpha.begin() + offset, [value](unsigned int c) -> unsigned int { return (c + value); });
			
		transform(beta.begin() + offset, beta.end() - (beta.size() - quantity) + offset, beta.begin() + offset, [](unsigned int c) -> unsigned int { return (c + 1); });
		

		cout << "\n\nCurrent system values:\n";
		for (vector<int>::iterator itba = alpha.begin(); itba != alpha.end(); itba++)
			cout << *itba << " ";
		cout << "\n\nInput count:\n";
		for (vector<int>::iterator itbb = beta.begin(); itbb != beta.end(); itbb++)
			cout << *itbb << " ";
	}
	return 0;
}

Post by **chili** » December 7th, 2019, 11:17 am

You should try std::partial_sum for the 1st xform there.

Planet Chili

Coding Challenge 5 - Meme Graveyard Keeper

Re: Coding Challenge 5 - Meme Graveyard Keeper

Re: Coding Challenge 5 - Meme Graveyard Keeper

Re: Coding Challenge 5 - Meme Graveyard Keeper

Re: Coding Challenge 5 - Meme Graveyard Keeper

Re: Coding Challenge 5 - Meme Graveyard Keeper

Re: Coding Challenge 5 - Meme Graveyard Keeper

Re: Coding Challenge 5 - Meme Graveyard Keeper