Code: Select all
struct Body
{
Vector position;
Vector velocity;
};
struct Plane : public Body
{
};
struct Missile : public Body
{
const Body* target;
};
Code: Select all
void Update ( const float time )
{
const auto target_position = target->position;
const auto relative_position_to_target = target_position - position;
const auto direction_to_target = relative_position_to_target.Normalized ( );
velocity = direction_to_target * speed;
position += velocity * time;
}
Code: Select all
for ( auto missile_it = missiles.begin ( ), end = missiles.end ( ); missile_it != end; ++missile_it )
{
missile_it->Update ( delta_time );
}
Code: Select all
void UpdateVelocity ( )
{
const auto target_position = target->position;
const auto relative_position_to_target = target_position - position;
const auto direction_to_target = relative_position_to_target.Normalized ( );
velocity = direction_to_target * speed;
}
void UpdatePosition ( const float time )
{
position += velocity * time;
}
Code: Select all
for ( auto missile_it = missiles.begin ( ), end = missiles.end ( ); missile_it != end; ++missile_it )
{
missile_it->UpdateVelocity ( );
}
for ( auto missile_it = missiles.begin ( ), end = missiles.end ( ); missile_it != end; ++missile_it )
{
missile_it->UpdatePosition ( delta_time );
}
I started this experiment to test out what I read online on the cpu's i/o capability. You see, cpus are getting faster and faster nowadays. They're so fast that reading and writing to memory is actually incredibly slow in comparison. Today, optimization doesn't focus on making efficient algorithms but rather efficient ways to do i/o.
OoO(Out of Order) processors have one such way of optimizing for this. These processors take advantage of the fact that reads do not depend on each other to run. You do not need to know of the value of target->position to start reading your own position. Since these instructions do not rely on each other finishing to work, the reads can all be done all at at the same time. Assuming that your cpu is capable of having [N] outstanding reads, you can read [N] number of variables at once for the entire loop! Furthermore, the reads can also be processed in OoO(Out of Order) where the read that finishes first(because it happened to have already been read before/is stored closer to the cpu) gets processed first instead of waiting for the reads that come before to finish.
Writes on the other hand are different. Unlike reads, writes can have dependencies on the previous writes to work. In this case, write to position must wait until write to velocity is done. The cpu can't do multiple writes at once as easily as reads and pretty much has to do them in order.
By having two loops, first for updating velocity and second for updating position, you prevent the writes from blocking each other. By the time you need to write to position in the second loop, the write to velocity in the first loop would have already finished. Therefore, the write to position can proceed pretty much immediately. In comparison, the first version demands that the write to velocity gets completed immediately instead of allowing it to take its time.
Course, all of this is just theory. For all we know, it could also be specific to my system itself. Thus, I am giving all of you the repository containing the test itself. I added a few macros that allows you to customize the test. The test is currently configured to run at maximum efficiency. No matter how you customize it though, it seems like the two loop solution is the most optimal version. I am very interested in the results you get on your own systems compared to mine.