C++ Winsock Networking Tutorials [04-24-2017]

cameron · Post by **cameron** » February 24th, 2016, 12:49 am

Are you going to be covering overlapped IOCP model?

Post by **albinopapa** » February 24th, 2016, 1:06 am

cameron wrote:Dynamic memory allocation is thread safe. Accessing global variables from multiple threads is safe(reading), but writing to it from one or more threads is unsafe. Have a look at Interlocked functions, that provide a fence guard for variable write.

I was under the impression the purpose of Interlocked functions was to maintain write order, how do interlocked functions differ from critical sections? I haven't attempted using them yet, haven't found a reason to as of yet, as most of the stuff I've been playing with seems to be fast enough just converting to SSE. I have played around with the PPL library Concurrency::parallel_for and in some cases I get a speed boost and in some cases it decreases performance by more than half.

Perhaps a MT tutorial from what you've learned cameron? lol.

cameron · Post by **cameron** » February 24th, 2016, 1:22 am

I am actually in the opposite scenario, I have never(or seldom) found a good reason to use SSE at least while working on my networking library. You never process or do intensive calculations on large chunks of data for the most part. Due to this I haven't seen any cases where I could benefit from SSE really at all. I have gained much more performance from using memory pools, and efficient multi threading and scheduling than SSE. From what I have seen SSE is only used for intensive math calculations or processing large amounts of data. Am I under the right impression here?

Critical Sections are more or less software implemented locks. Whereas Interlocked functions are intrinsics. Interlocked functions are meant only for access to a single variable, whereas critical sections/mutexs are to protect a block of code.

I had thought about it at one point but, I have difficulty explaining things for the most part, which is why I haven't attempted making tutorials for anything.

Post by **albinopapa** » February 24th, 2016, 2:02 am

SSE: Yes, using SSE is best used in those circumstances. Since I'm really in to the graphics portion or coding SSE has been a fun experience. I believe though, that if you ever get into encryption/decryption, SSE might benefit you...but don't know enough to say for sure.

Interlocked functions: Let me ask you this then, say I have a group a nested for loop, 2 loops, that iterate over the same data, both inner and outer loops iterate over the same array, and in the inner for loop a variable is updated from a calculation, you're saying that an interlocked function would be used to update that variable as to not cause access violations...data race?

Code: Select all

for(int j = 0; j < numElements; ++j)
{
	for(int i = j + 1; i < numElements; ++i)
	{
		int dx = element[j].px - element[i].px;
		int dy = element[j].py - element[i].py;
		// blah, blah
		element[i].xAcceleration += calculatedAccel.x;
		element[i].yAcceleration += calculatedAccel.y;
		element[j].xAcceleration -= calculatedAccel.x;
		element[j].yAcceleration -= calculatedAccel.y;
	}

	element[i].vx += element[i].xAcceleration;
	element[i].vy += element[i].yAcceleration;
	element[i].px += element[i].vx;
	element[i].py += element[i].vy;
	element[i].xAcceleration = 0;
	element[i].yAcceleration = 0;
}

BTW, this is derived from an nbody simulation, got the idea from chili's nbody project.
So given the cut down code, I would want to use one of the interlocked functions? I've tried critical sections and wow, that slowed my program down from 30 fps, to 6.

cameron · Post by **cameron** » February 24th, 2016, 2:28 am

Interlocked calls are about 3X faster than critical sections with no contention, and up to 100X faster with contention. Im curious can I look at the code you implimented a critical section with?

Critical sections are also a major reason why heap allocation can be really slow and degrade performance. One reason why I use memory pools sometimes is not only to improve performance because the code is more light weight, but it also reduces contention with the heap which can be huge in mutlithreaded apps.

Data races will only occur when modifying the same variable from multiple threads. Maybe I dont understand your example. Is it multi-threaded?

But Interlocked functions and Critical sections are only used for synchronization purposes between threads, and to provide thread safety.

Post by **albinopapa** » February 24th, 2016, 4:27 am

Yeah, after I posted that, I forgot to signify what was multithreaded. Spreading the workload among the threads evenly, like workload = numElements / numThreads. So, the outer for loop would be divided among the threads evenly, but the inner thread would still need to iterate over all elements, from index of outer loop to numElements.

I'll have to look and see if i still have the code, I've made so many changes to it trying to figure things out, I may have scrapped it.

Ok, I didn't scrap the project, but I did scrap the MT code. I'll post the code when I get a chance.

Here is the address to the git repo where the current version lies. I tend to forget to make new branches for different tests.

https://github.com/albinopapa/VectorTest

Pindrought · Post by **Pindrought** » February 24th, 2016, 5:33 am

cameron wrote:Are you going to be covering overlapped IOCP model?

No. I have not yet used an overlapped I/O model, so I can't cover it at this time. Sorry.

Edit:

Uploaded Solutions for Tutorial 5 & 6. Created the video today for Tutorial 5 and will be probably uploading it tomorrow afternoon.

adabo · Post by **adabo** » February 24th, 2016, 3:24 pm

Pindrought: I'm almost through with lesson 3. Correct me if I'm wrong, but could I monkey-rig what you've taught into a cheap multiplayer implementation? I was thinking a mp pong clone as an exercise of data type transfer. I don't think I'd need more than the default 256 bytes since I'd only be transferring x/y+vx/vy for player and x/y+vx/vy for the ball. So that's 8 ints which is 32 bytes, right? So even if I need a few more bytes for something else, couldn't I just stuff it into the packet?

Pindrought · Post by **Pindrought** » February 24th, 2016, 4:50 pm

adabo wrote:Pindrought: I'm almost through with lesson 3. Correct me if I'm wrong, but could I monkey-rig what you've taught into a cheap multiplayer implementation? I was thinking a mp pong clone as an exercise of data type transfer. I don't think I'd need more than the default 256 bytes since I'd only be transferring x/y+vx/vy for player and x/y+vx/vy for the ball. So that's 8 ints which is 32 bytes, right? So even if I need a few more bytes for something else, couldn't I just stuff it into the packet?

You technically could, but you should really wait until future tutorials, because you will run into issues.

Here is why.

1. We aren't verifying that there weren't any connection issues until tutorial 5. If one of the clients disconnect, the server breaks until tutorial 5.

2. We aren't verifying that a packet was completely sent or received until tutorial 7. This means you can partially send or receive packets. Just because we call recv with a length of 256 bytes does not actually mean we will receive 256 bytes in one recv call. It is the same with send. While we will probably 99% of the time receive the full packet on our own computer with small packets like that, one partial packet = program breaks if we aren't verifying that we are receiving the full packet(s).

3. We aren't managing connections in a reasonable manner. We are using a static array of 100 connections. What if someone connects, disconnects and reconnects? We will probably address this in tutorial 8.

4. We aren't doing anything to handle synchronization yet.

5. We aren't handling Big Endian / Little Endian and serialization. You might send an int from Computer 1 to Computer 2 and the data doesn't come back the same. Some computers store the ints in a form called little endian and some store them as big endian. We have to convert the bytes to network byte order and send them and then convert them to host byte order to use them. We will cover this in a later tutorial. We will need to use serialization in a similar aspect to floats/doubles (see IEEE 754). Note: One rigged up way to get passed serialization/big endian&little endian conversions is to just convert all the numbers to strings and send them over, but this is slow so we would want to avoid this.

There are probably more reasons I have not listed, but those are just a few. You are free to try to rig up a pong game with what is in the existing tutorials, but the only problem is you will probably be rewriting the whole thing and it will have a lot of problems since it's not handling everything that can/will go wrong.

Post by **chili** » February 25th, 2016, 1:33 am

cameron wrote:I am actually in the opposite scenario, I have never(or seldom) found a good reason to use SSE at least while working on my networking library. You never process or do intensive calculations on large chunks of data for the most part. Due to this I haven't seen any cases where I could benefit from SSE really at all. I have gained much more performance from using memory pools, and efficient multi threading and scheduling than SSE. From what I have seen SSE is only used for intensive math calculations or processing large amounts of data. Am I under the right impression here?

Critical Sections are more or less software implemented locks. Whereas Interlocked functions are intrinsics. Interlocked functions are meant only for access to a single variable, whereas critical sections/mutexs are to protect a block of code.

I had thought about it at one point but, I have difficulty explaining things for the most part, which is why I haven't attempted making tutorials for anything.

Yeah, I see no need for SSE in networking code, unless maybe as papa points out you need to do encryption/decryption.

Critical sections are usually implemented with atomic operations, so I would say they are hardware implemented in essence, but there is a robust software layer on top. They prevent multiple threads from executing the same section of code (mutual exclusion). The reason you use them is to protect shared memory from race conditions.

Interlocked functions are basically another way of saying atomic operations. If you can use atomic operations in place of mutexes, you will see a huge performance increase, especially in contention, as cam pointed out. Classes etc. that are implemented with atomics instead of mutexes are generally referred to as 'lockless' or 'lock-free' (e.g. lockless queue).

Writing proper lock-free code is often harder than protecting shared memory with mutexes, and not all mutex-based code can be converted to lock-free routines.

Planet Chili

C++ Winsock Networking Tutorials [04-24-2017]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]

Re: C++ Winsock Networking Tutorials [2-23-2016]