Coding Challenge 4!

The Partridge Family were neither partridges nor a family. Discuss.
User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: Coding Challenge 4!

Post by chili » September 26th, 2018, 1:46 am

Yup, yum has it. You need to find the largest v[j] - v under the condition that j > i. This is not limited to j = i + 1 :D
Chili

thesmallcreeper
Posts: 14
Joined: September 24th, 2018, 1:20 pm

Re: Coding Challenge 4!

Post by thesmallcreeper » September 26th, 2018, 7:20 pm

by the way...

Look like NF 1 differs a lot with NF 2.

For example I think that NF 1 has a lot more values that are bigger than 32,767 :thinking:

Any info or creation code of NF files plzz? :)

Slidy
Posts: 80
Joined: September 9th, 2017, 1:19 pm

Re: Coding Challenge 4!

Post by Slidy » September 26th, 2018, 11:02 pm

Not sure about NF files, but I created my files with the following Python script:

Code: Select all

import random

TOTAL_NUMS = 10000

nums = ""
for i in range(TOTAL_NUMS):
	nums += str(random.randint(0,65535)) + " "

nums = nums[:-1] # remove trailing space

with open('randnums.txt', 'w') as file:
	file.write("Slidy's Test ({:,.0f} nums)\n".format(TOTAL_NUMS))
	file.write("0\n") # replace with real solution manually
	file.write(nums)

thesmallcreeper
Posts: 14
Joined: September 24th, 2018, 1:20 pm

Re: Coding Challenge 4!

Post by thesmallcreeper » September 27th, 2018, 1:47 am

Thank you for the script. :D Waiting the python script right now to output 80m numbers :p (I hate creating programs that read-write files :p)

edit: Tried my code on a friends nootbook with Haswell processor ;). Well about 14.5ms/8.2ms at NF1 and 11.3ms/7.79ms at NF2 (singlethreaded/multithreaded). Hope I wont find a bug on it xD

Bottom wow line. Haswell has about 2+ times better singlethreaded performance on my code that Piledriver. -_-.

Checking an instruction table .pdf I found that on one of the instructions that is being used on the hottest part of my code has a stupitibly high latency on Steamroller (successor) thus on Piledriver.

Moreover. On my case GCC (-O3) creates assembly has one less jmp vs MSVC compliler. Worth giving a look on the disassembly of others compiler at https://godbolt.org/ (compiler explorer). Tried to avoid it with "goto"s but with no success, MSVC kept outputing more less the same code.
Last edited by thesmallcreeper on September 27th, 2018, 3:43 am, edited 1 time in total.

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Coding Challenge 4!

Post by albinopapa » September 27th, 2018, 3:04 am

Dammit, if it weren't for the values over 32767, my new SSE routine would have completed the NF test in ~16ms.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

thesmallcreeper
Posts: 14
Joined: September 24th, 2018, 1:20 pm

Re: Coding Challenge 4!

Post by thesmallcreeper » September 27th, 2018, 3:26 am

hihihihi!! I can see what you did there boyyy! ;) <3

When I was looking for SIMD I tilted the shit out of me with Intel because there is no "_mm_cmplt_epu16", just "_mm_cmplt_epi16". I mean why the fack they did not add it to SSE2!!!

Traded some performance because Intel engineers fap with their fingers in their ass.

Also dear Slidy, your script was cute dude but sorry python suchs. Left the script running for 80m random numbers for about 1 hour and 30 mins... End up opening Visual Studio write some code, run it and have a nice 450mb file in about 30 seconds. xD

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Coding Challenge 4!

Post by albinopapa » September 27th, 2018, 4:25 am

I think I figured it out. The changes bumped the time up to ~19ms on my machine, but I'm thinking I traded some performance for ease of use. I overloaded some operators, including the bitwise not (~), just because. If I had not done that one, I still would have gotten around the 16 ms mark.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

AverageWhale
Posts: 57
Joined: August 13th, 2018, 2:33 pm

Re: Coding Challenge 4!

Post by AverageWhale » September 27th, 2018, 8:59 am

can someone try my solution on their pc or it is not allowed to share my solution?

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Coding Challenge 4!

Post by albinopapa » September 27th, 2018, 9:25 am

I can try it if you'd like. Are you just wanting to check the time on someone else's machine?
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Coding Challenge 4!

Post by albinopapa » September 27th, 2018, 9:28 am

Image
Multi-threaded with 8 calls to async ( I'm guessing equivalent to 8 threads? ) and SSE.

I split the input into 8, and sent each async call an eighth of the work, with the remaining 16 being done on main thread after after the other jobs were done.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

Post Reply