Table based processing

The Partridge Family were neither partridges nor a family. Discuss.
Post Reply
User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Table based processing

Post by cyboryxmen » January 26th, 2018, 5:22 pm

Polymorphism is great. It allows you to make decisions based on the type of the object you're processing. However, virtual calls can get pretty expensive when called millions of times per frame making them undesirable when optimising for performance. One way of avoiding virtual function calls is by placing objects in different tables and processing them based on the table they are in. Instead of having one table containing objects of different types, we can have multiple tables; each containing objects of a specific type. When you process the objects in one of these tables, you will know that those objects will always be of a certain type allowing you to call their functions directly instead of having to call a virtual function.

My concern with this method is that this would result in the code being forced to update a metric shit ton of tables; most of which are either empty or contain very few objects. Keep in mind that you're supposed to have a table for literally every type! Different AI types, controller types, AI state types, game state types, everything!

From my first test trying to loop through all these tables without any objects stored yet, the cost of having to process these additional tables(for 400 unique types) is significant; nearly a 1000 times slower! However, when the tables are populated with actual objects to process, the implementation using table based processing preforms 5 times faster or more despite the huge bottleneck of looping through these additional tables. It seems that the overhead of a virtual function call and the inability to store polymorphic objects contiguously is much more significant than having more tables to process.

One disadvantage of using table based processing over polymorphism is how tedious it would be to set it up. Instead of one table to process, you have multiple tables for each and every type and you have to write a lot of code to process every single one of them. Not to mention that you must include all the types into one file where the processing will occur which would drastically increase compile times. Fortunately, I have an idea of a template implementation that'll help reduce the amount of boilerplate code needed(once I'm crazy enough to actually do this!)

I suppose that this design would be better suited for a Component Entity System framework. That way, instead of multiple unique types, you have different components that you can mix and match to create these types in your program. By processing these components instead, you can significantly reduce the number of processing tables while keeping type variety high.

In the meantime, here's the test that I implemented to benchmark table based processing against type based processing. I hope that you find my results interesting. Just use the POLYMORPH macro to switch between the polymorphic implementation and the table based processing implementation.
Zekilk

User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Re: Table based processing

Post by cyboryxmen » January 27th, 2018, 5:05 pm

Table based processing preforms a lot better than I originally thought it would. I'm now convinced that I should start implementing it into my codebase. However, that does not mean that I'm going to throw polymorphism out the window. Table based processing can be used for common components like health, ammo and ai states but there are a bunch of other situations that are better handled polymorphicly.

For example, in video games, there are going to be a lot of one time events that only occur at a certain point in the game. This can include things like an event where, when the mini boss dies, the game would spawn a castle in the sky that'll contain the final boss. Video games are filled with these various one time events and giving each of them a table that'll be processed each frame is wasteful since they are only fired off at a specific point and then never used again.

Instead, we can make an event system where events register themselves as "listeners" to certain activities(player goes into boss area, boss hp goes to 0 etc.). When these activities occur, they would trigger these events through a virtual function call . Once the events are triggered and processed, they can be deleted from the game and never processed again. In comparison, you cannot delete a table from a game as easily. That amount of dynamism would require some polymorphic mechanism which ironicly is what table based processing is supposed to replace.

It's like what Chili has told us numerous times: don't limit the tools you use in your C++ toolkit and always use the best tool for the situation.
Zekilk

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Table based processing

Post by albinopapa » January 28th, 2018, 7:28 am

I challenge your results a bit. I agree dereferencing a pointer and/or accessing a lookup table may introduce a bit of overhead, however, I believe the bottleneck here isn't that at all, but the fact you are allocating memory that is completely and forcefully non-contiguous. Because of the nature of polymorphism, one must use references or pointers to the base class and this seems to lead one to believe one must allocated one element at a time to a vector<unique_ptr<T>> for instance. This can be dealt with differently and create quite a difference in performance compared to the method I just mentioned.

If you were to make vectors of the child types, then add the addresses of those elements to a vector of the base type, you get nearly the same results as with your templated polymorphism.

Contiguous Polymorphic vectors
Lowest time: 6578
Highest time: 246532
Mean: 8286
Standard deviation: 11592
Median: 7367

Templated polymorphism
Lowest time: 5788
Highest time: 519376
Mean: 8297
Standard deviation: 17762
Median: 6578

Non-contiguous polymorphism
Lowest time: 32099
Highest time: 660402
Mean: 38302
Standard deviation: 27419
Median: 34993

In the first test, I created 4 ( not 400 ) different child classes. Used the rng % 4 to distribute the max objects between all 4 classes and pushed back the elements to their respective containers. When that loop was complete, I looped through each container, pushing back the address of each element onto a vector<base>. This way, each address is contiguous until it gets to an address of a different child.

The second and third tests are from your project. As you can see, there is some overhead in pointer dereferencing and function lookup, but the biggest performance hit is searching/accessing of singly allocated elements.

Now, it could be said that keeping track of the base vector with the other typed vectors would take some craftiness, but the same is true with template programming. I believe just creating a class that the user interfaces with instead of one vector or another would suffice.

Template metaprogramming has is place and usefulness don't get me wrong, but if there's a comparable and possibly more sane approach to a problem, then I'll take it over TMP any day.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Table based processing

Post by albinopapa » January 28th, 2018, 8:32 am

I found a little bug in the code. It causes their to be 1992 polymorphic elements to be allocated while 1000 elements of the templated version. If max_objects is suppose to be 1000, there is something else I haven't found. The bug I found created an instance of std::unique_ptr<Derived<0>> after every std::unique<Base>.

Templated: 1000 objects
Lowest time: 6051
Highest time: 169441
Mean: 8830
Standard deviation: 9568
Median: 7367

New times based on 997 std::unique_ptr<Base> elements
Lowest time: 17365
Highest time: 339673
Mean: 21866
Standard deviation: 21865
Median: 18418

1000 contiguously allocated children being called through vector of base*
Lowest time: 6840
Highest time: 116294
Mean: 7959
Standard deviation: 6970
Median: 7104
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Re: Table based processing

Post by cyboryxmen » January 28th, 2018, 11:13 am

I kept in mind that the polymorphic objects are not stored contiguously is also a bottleneck in the polymorphic implementation. I actually stated it in the original post but probably should put more emphasis on it.

With that said, if you have 400 vectors and have to update every single one of them, templates are the sane solution. I've done this manually before with 10 types and I was painful enough. I can't imagine doing it for 390 more types. If you have a better idea at automating this dynamiclly without templates(and without any additional runtime overhead of course), I wouldn't mind hearing about it.
Zekilk

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Table based processing

Post by albinopapa » January 28th, 2018, 11:28 pm

Hmm, haven't done anything professional, so this may be a naive statement: When do you ever need 400 unique tables/vectors/lists whatever you want to call them? I guess frameworks might have that many different types.

I've seen people do structure of arrays just fine inside their Enemy class. This is somewhat similar. You must maintain an array/vector for each "mission critical" member,

Code: Select all

class Entity
{
public:
     void Update( float dt );

private:
     std::vector<Position> px,py,pz,
     std::vector<Orientation> prx,pry,prz,
     std::vector<Acceleration> pax,pay,paz,
     std::vector<Velocity> pvelx,pvely,pvelz.
};
Ok, maybe they were all just raw float*, but the concept is the same.
As far as NO runtime overhead, I guess you got me on that one. I'd have to make an attempt at writing up such a manager class, though I'm not sure how or where I'd store the concrete vector types, unless I store them in the manager class to begin with. It would not be a library implementation, but a user implementation. In your test, all 400 types do the same thing, you'd have to write 400 specializations to have 400 different behaviors. Anyway, I'll give it a shot and post something here if/when I come up with something lol.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Table based processing

Post by albinopapa » February 13th, 2018, 10:28 pm

Well, in the back of my mind I've been mulling this over for the past two weeks. The only thing I come up with is different ways to handle the interface, but not any ways to handle the data itself. The closest I come is creating a pool of memory and allocating your objects from the pool. Pass that address as a void*/char* to one of the "polymorphic" types and they can cast it to whatever needed and initialize it however they need. The next step would be creating an interface class that the derived classes can inherit from, but not have a particular state ( no data stored ). This way, you have contiguous data, allocated from the memory pool and the only "bottleneck" would be the pointer indirection.

This isn't 100% type-safe, however, and doing this would only really be to allow for cache coherency. Memory pools and custom allocators are something that I've heard developers talk about using for games especially, but other software as well to get the best memory usage and cache coherency they can. I believe they would refer this to data oriented programming as opposed to object oriented programming. Make it more about processing the data, and less about the objects.

Anyway, those are my thoughts on the matter. It sounds like it could be done if you hack your way out of the C++ type system and just deal with memory and data. That being said, I like the strongly typed type system of C++ keeping my code safer. Doing something as stated above should only be used after plenty of testing for pass/fail scenarios before using in a real project and of course benchmarked.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Table based processing

Post by albinopapa » February 13th, 2018, 10:28 pm

Well, in the back of my mind I've been mulling this over for the past two weeks. The only thing I come up with is different ways to handle the interface, but not any ways to handle the data itself. The closest I come is creating a pool of memory and allocating your objects from the pool. Pass that address as a void*/char* to one of the "polymorphic" types and they can cast it to whatever needed and initialize it however they need. The next step would be creating an interface class that the derived classes can inherit from, but not have a particular state ( no data stored ). This way, you have contiguous data, allocated from the memory pool and the only "bottleneck" would be the pointer indirection.

This isn't 100% type-safe, however, and doing this would only really be to allow for cache coherency. Memory pools and custom allocators are something that I've heard developers talk about using for games especially, but other software as well to get the best memory usage and cache coherency they can. I believe they would refer this to data oriented programming as opposed to object oriented programming. Make it more about processing the data, and less about the objects.

Anyway, those are my thoughts on the matter. It sounds like it could be done if you hack your way out of the C++ type system and just deal with memory and data. That being said, I like the strongly typed type system of C++ keeping my code safer. Doing something as stated above should only be used after plenty of testing for pass/fail scenarios before using in a real project and of course benchmarked.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Table based processing

Post by albinopapa » February 14th, 2018, 6:27 am

One of the things I think I like about your approach is you are still able to retain a "look up" for a specific type. If you had specializations for the 400 Derived classes and each had their own corresponding type_index, you could just have a member function like:

Code: Select all

template<class T>
std::vector<Derived<T::type_index>>& TypeManager::GetObjects();
template<class T>
const std::vector<Derived<T::type_index>>& TypeManager::GetObjects()const;
That means your AddInstance method could change also.

Code: Select all

template<class T, class...Args>
void TypeManager::AddInstance(Args&&...args)
{
     if(num_types == T::type_index)
     {
          _table.emplace_back(std::forward<Args>(args)...);
          return;
     }

     TypeManager<T::type_index>::AddInstance(std::forward<Args>(args)...);
}

The reason I bring this up is wanting to use this design pattern to have objects interact ( collision between objects, AI might need to know; what it's colliding with, how to avoid collisions or where the player is located, etc... ).

My previous suggestion might lose a lot of that capability with types being erased. Anyway, just wanted to put that out there.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

Post Reply