Coding Challenge 4!
Re: Coding Challenge 4!
Yup, yum has it. You need to find the largest v[j] - v under the condition that j > i. This is not limited to j = i + 1
Chili
-
- Posts: 14
- Joined: September 24th, 2018, 1:20 pm
Re: Coding Challenge 4!
by the way...
Look like NF 1 differs a lot with NF 2.
For example I think that NF 1 has a lot more values that are bigger than 32,767
Any info or creation code of NF files plzz?
Look like NF 1 differs a lot with NF 2.
For example I think that NF 1 has a lot more values that are bigger than 32,767
Any info or creation code of NF files plzz?
Re: Coding Challenge 4!
Not sure about NF files, but I created my files with the following Python script:
Code: Select all
import random
TOTAL_NUMS = 10000
nums = ""
for i in range(TOTAL_NUMS):
nums += str(random.randint(0,65535)) + " "
nums = nums[:-1] # remove trailing space
with open('randnums.txt', 'w') as file:
file.write("Slidy's Test ({:,.0f} nums)\n".format(TOTAL_NUMS))
file.write("0\n") # replace with real solution manually
file.write(nums)
-
- Posts: 14
- Joined: September 24th, 2018, 1:20 pm
Re: Coding Challenge 4!
Thank you for the script. Waiting the python script right now to output 80m numbers :p (I hate creating programs that read-write files :p)
edit: Tried my code on a friends nootbook with Haswell processor . Well about 14.5ms/8.2ms at NF1 and 11.3ms/7.79ms at NF2 (singlethreaded/multithreaded). Hope I wont find a bug on it xD
Bottom wow line. Haswell has about 2+ times better singlethreaded performance on my code that Piledriver. -_-.
Checking an instruction table .pdf I found that on one of the instructions that is being used on the hottest part of my code has a stupitibly high latency on Steamroller (successor) thus on Piledriver.
Moreover. On my case GCC (-O3) creates assembly has one less jmp vs MSVC compliler. Worth giving a look on the disassembly of others compiler at https://godbolt.org/ (compiler explorer). Tried to avoid it with "goto"s but with no success, MSVC kept outputing more less the same code.
edit: Tried my code on a friends nootbook with Haswell processor . Well about 14.5ms/8.2ms at NF1 and 11.3ms/7.79ms at NF2 (singlethreaded/multithreaded). Hope I wont find a bug on it xD
Bottom wow line. Haswell has about 2+ times better singlethreaded performance on my code that Piledriver. -_-.
Checking an instruction table .pdf I found that on one of the instructions that is being used on the hottest part of my code has a stupitibly high latency on Steamroller (successor) thus on Piledriver.
Moreover. On my case GCC (-O3) creates assembly has one less jmp vs MSVC compliler. Worth giving a look on the disassembly of others compiler at https://godbolt.org/ (compiler explorer). Tried to avoid it with "goto"s but with no success, MSVC kept outputing more less the same code.
Last edited by thesmallcreeper on September 27th, 2018, 3:43 am, edited 1 time in total.
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Coding Challenge 4!
Dammit, if it weren't for the values over 32767, my new SSE routine would have completed the NF test in ~16ms.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
-
- Posts: 14
- Joined: September 24th, 2018, 1:20 pm
Re: Coding Challenge 4!
hihihihi!! I can see what you did there boyyy! <3
When I was looking for SIMD I tilted the shit out of me with Intel because there is no "_mm_cmplt_epu16", just "_mm_cmplt_epi16". I mean why the fack they did not add it to SSE2!!!
Traded some performance because Intel engineers fap with their fingers in their ass.
Also dear Slidy, your script was cute dude but sorry python suchs. Left the script running for 80m random numbers for about 1 hour and 30 mins... End up opening Visual Studio write some code, run it and have a nice 450mb file in about 30 seconds. xD
When I was looking for SIMD I tilted the shit out of me with Intel because there is no "_mm_cmplt_epu16", just "_mm_cmplt_epi16". I mean why the fack they did not add it to SSE2!!!
Traded some performance because Intel engineers fap with their fingers in their ass.
Also dear Slidy, your script was cute dude but sorry python suchs. Left the script running for 80m random numbers for about 1 hour and 30 mins... End up opening Visual Studio write some code, run it and have a nice 450mb file in about 30 seconds. xD
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Coding Challenge 4!
I think I figured it out. The changes bumped the time up to ~19ms on my machine, but I'm thinking I traded some performance for ease of use. I overloaded some operators, including the bitwise not (~), just because. If I had not done that one, I still would have gotten around the 16 ms mark.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
-
- Posts: 57
- Joined: August 13th, 2018, 2:33 pm
Re: Coding Challenge 4!
can someone try my solution on their pc or it is not allowed to share my solution?
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Coding Challenge 4!
I can try it if you'd like. Are you just wanting to check the time on someone else's machine?
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Coding Challenge 4!
Multi-threaded with 8 calls to async ( I'm guessing equivalent to 8 threads? ) and SSE.
I split the input into 8, and sent each async call an eighth of the work, with the remaining 16 being done on main thread after after the other jobs were done.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com