Coding Challenge 5 - Meme Graveyard Keeper

The Partridge Family were neither partridges nor a family. Discuss.
thesmallcreeper
Posts: 14
Joined: September 24th, 2018, 1:20 pm

Re: Coding Challenge 5 - Meme Graveyard Keeper

Post by thesmallcreeper » November 7th, 2018, 3:03 am

albinopapa wrote:I've been looking ( not extensively ) for this info on AMD chips, where did you find this? Intel has the intrinsics guide which has a lot of this info there, but haven't found anything similar for AMD.
There you go dude :)

https://www.agner.org/optimize/instruction_tables.pdf
albinopapa wrote:I personally never got any speed increase from loop unrolling. In most of my trials, the compiler usually unrolled the loops already or there wasn't enough work to be done between the loads/stores, or was probably already memory bandwidth limited. The prefetcher gets 64 bytes ( four SSE lanes worth ) in an array as it is, so four single iteration loops are already cached if you are processing arrays. This is probably the reason why unrolling usually doesn't do much of anything.
I use a instruction (VPCMPGTQ) that on Haswell has latency of 5 and throughput 1 CPI, so by "unrolling" the AVX loop I can complete (in theory) 2 VPCMPGTQ instructions in 6 clocks :D.. thus the speed up

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Coding Challenge 5 - Meme Graveyard Keeper

Post by albinopapa » November 7th, 2018, 4:48 am

Good luck getting the compiler to listen to instruction order.

Although, I have notices at times, loading data in different orders changes speed as well. Interleaving loads and calculations hasn't worked out for me in the past. Doing it the C way, load a bunch, do calculations, then store seems to work best in my experience. Probably because by the time I get to the calculation portion, the first loads I need to use are done loading, thus preventing a stall, just speculation of course, I have no idea about how sensitive all this shit is. I only know what I've experienced and even then, things seem to change right under my nose sometimes.

It could also be my memory fails me which could definitely be the case.

Thanks for the info by the way.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

thesmallcreeper
Posts: 14
Joined: September 24th, 2018, 1:20 pm

Re: Coding Challenge 5 - Meme Graveyard Keeper

Post by thesmallcreeper » November 8th, 2018, 10:45 pm

Looks like I had a bad approach to the challenge so far ://
I dont have too much time on my desk left and I want to do something else, so maybe a white flag from me...

glhf to everyone still pushing his code

User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Re: Coding Challenge 5 - Meme Graveyard Keeper

Post by cyboryxmen » November 14th, 2018, 11:40 am

Wait, was the challenge over already?

That's too bad. I just finished an SIMD sorting algorithm that could complete it in 40ms single threaded too. Guess I'll just show everyone the algorithm once my new parallel computing language is done.
Zekilk

User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: Coding Challenge 5 - Meme Graveyard Keeper

Post by chili » November 15th, 2018, 6:31 am

Yeah, simd sort is the only real effective way of optimizing this one past the standard impl + exepol. No way I was gonna do that tho.
Chili

jacobaea
Posts: 1
Joined: December 5th, 2019, 4:39 am

Re: Coding Challenge 5 - Meme Graveyard Keeper

Post by jacobaea » December 5th, 2019, 4:24 pm

Code: Select all

#include <iostream>
#include <vector>
#include<iterator>
#include <algorithm>
#include <cctype>
#include <string>
#include <list>

using namespace std;

int main() {
	vector<int> alpha = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 };
	vector<int> beta = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 };

	int value, offset, quantity;

	string a;
	bool session = true;
	while(session) {
		cout << "proceed?";
		cin >> a;
		if (a != "y") { session = false; }
		cout << "Input value, starting index number; and how many indexes thereafter to add value to:\n";

		cin >> value >> offset >> quantity;

		transform(alpha.begin() + offset, alpha.end() - (alpha.size()-quantity) + offset, alpha.begin() + offset, [value](unsigned int c) -> unsigned int { return (c + value); });
			
		transform(beta.begin() + offset, beta.end() - (beta.size() - quantity) + offset, beta.begin() + offset, [](unsigned int c) -> unsigned int { return (c + 1); });
		

		cout << "\n\nCurrent system values:\n";
		for (vector<int>::iterator itba = alpha.begin(); itba != alpha.end(); itba++)
			cout << *itba << " ";
		cout << "\n\nInput count:\n";
		for (vector<int>::iterator itbb = beta.begin(); itbb != beta.end(); itbb++)
			cout << *itbb << " ";
	}
	return 0;
}

User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: Coding Challenge 5 - Meme Graveyard Keeper

Post by chili » December 7th, 2019, 11:17 am

You should try std::partial_sum for the 1st xform there.
Chili

Post Reply