Page 3 of 3

Re: Coding Challenge 5 - Meme Graveyard Keeper

Posted: November 7th, 2018, 3:03 am
by thesmallcreeper
albinopapa wrote:I've been looking ( not extensively ) for this info on AMD chips, where did you find this? Intel has the intrinsics guide which has a lot of this info there, but haven't found anything similar for AMD.
There you go dude :)

https://www.agner.org/optimize/instruction_tables.pdf
albinopapa wrote:I personally never got any speed increase from loop unrolling. In most of my trials, the compiler usually unrolled the loops already or there wasn't enough work to be done between the loads/stores, or was probably already memory bandwidth limited. The prefetcher gets 64 bytes ( four SSE lanes worth ) in an array as it is, so four single iteration loops are already cached if you are processing arrays. This is probably the reason why unrolling usually doesn't do much of anything.
I use a instruction (VPCMPGTQ) that on Haswell has latency of 5 and throughput 1 CPI, so by "unrolling" the AVX loop I can complete (in theory) 2 VPCMPGTQ instructions in 6 clocks :D.. thus the speed up

Re: Coding Challenge 5 - Meme Graveyard Keeper

Posted: November 7th, 2018, 4:48 am
by albinopapa
Good luck getting the compiler to listen to instruction order.

Although, I have notices at times, loading data in different orders changes speed as well. Interleaving loads and calculations hasn't worked out for me in the past. Doing it the C way, load a bunch, do calculations, then store seems to work best in my experience. Probably because by the time I get to the calculation portion, the first loads I need to use are done loading, thus preventing a stall, just speculation of course, I have no idea about how sensitive all this shit is. I only know what I've experienced and even then, things seem to change right under my nose sometimes.

It could also be my memory fails me which could definitely be the case.

Thanks for the info by the way.

Re: Coding Challenge 5 - Meme Graveyard Keeper

Posted: November 8th, 2018, 10:45 pm
by thesmallcreeper
Looks like I had a bad approach to the challenge so far ://
I dont have too much time on my desk left and I want to do something else, so maybe a white flag from me...

glhf to everyone still pushing his code

Re: Coding Challenge 5 - Meme Graveyard Keeper

Posted: November 14th, 2018, 11:40 am
by cyboryxmen
Wait, was the challenge over already?

That's too bad. I just finished an SIMD sorting algorithm that could complete it in 40ms single threaded too. Guess I'll just show everyone the algorithm once my new parallel computing language is done.

Re: Coding Challenge 5 - Meme Graveyard Keeper

Posted: November 15th, 2018, 6:31 am
by chili
Yeah, simd sort is the only real effective way of optimizing this one past the standard impl + exepol. No way I was gonna do that tho.

Re: Coding Challenge 5 - Meme Graveyard Keeper

Posted: December 5th, 2019, 4:24 pm
by jacobaea

Code: Select all

#include <iostream>
#include <vector>
#include<iterator>
#include <algorithm>
#include <cctype>
#include <string>
#include <list>

using namespace std;

int main() {
	vector<int> alpha = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 };
	vector<int> beta = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 };

	int value, offset, quantity;

	string a;
	bool session = true;
	while(session) {
		cout << "proceed?";
		cin >> a;
		if (a != "y") { session = false; }
		cout << "Input value, starting index number; and how many indexes thereafter to add value to:\n";

		cin >> value >> offset >> quantity;

		transform(alpha.begin() + offset, alpha.end() - (alpha.size()-quantity) + offset, alpha.begin() + offset, [value](unsigned int c) -> unsigned int { return (c + value); });
			
		transform(beta.begin() + offset, beta.end() - (beta.size() - quantity) + offset, beta.begin() + offset, [](unsigned int c) -> unsigned int { return (c + 1); });
		

		cout << "\n\nCurrent system values:\n";
		for (vector<int>::iterator itba = alpha.begin(); itba != alpha.end(); itba++)
			cout << *itba << " ";
		cout << "\n\nInput count:\n";
		for (vector<int>::iterator itbb = beta.begin(); itbb != beta.end(); itbb++)
			cout << *itbb << " ";
	}
	return 0;
}

Re: Coding Challenge 5 - Meme Graveyard Keeper

Posted: December 7th, 2019, 11:17 am
by chili
You should try std::partial_sum for the 1st xform there.