Data Caching and Code Optimizations
Posted: January 3rd, 2017, 6:06 am
So lately I've been reading a little about code optimization in hopes of applying it to an RPG that I've been working on. I've come across the word "cache" a lot, and I've done quite a bit of research. For those of you who don't know what the cache is, it's essentially a small memory unit that stores very little amount of information, and it is located very close to the CPU as in interface between the CPU and the actual memory of the system. It stores relevant data that is used most frequently and allows the user to access it much faster than if it were stored in regular memory. A diagram looks a little like this:
CPU -----> Caches -------------------> System memory
If it finds the address it is looking for in the cache, it results in a "cache hit" and if it doesn't the end result is a "cache miss", in which the cache fetches the address from system memory and then gives it to the program to use.
Just as a clarifying note, there isn't just one cache attached to the CPU, and caches are found all over hardware in other places as well. (Including things like web caches etc...)
I was wondering if anyone knew about this and was looking to have a discussion about it. I'm fairly new to the topic and thought that someone here might have some insight into it, as the forum seems to be knowledgable in a wide field of subjects. Anyways, here's a brief list of some things that I've found to help improve performance in any project, and they all help the cache to store more relevant data and access it faster.
1) Use smaller data types. This allows the cache to store more memory, as the cache is typically about 32kb in size.
2) Don't use linked lists, and use arrays that store contiguous data where possible. This allows the CPU to load blocks of relevant data into the CPU and let the code access them more readily. If the addresses were scattered (non-contiguous) then the CPU can't store as much relevant data for a single object, as it is scattered across the system memory and isn't easily fetchable.
3) Avoid multithreading in simple applications if possible, as the cores can actually fight over addresses, resulting in many more cache hits than normal.
4) Avoid the bottleneck of OOP, and don't make EVERYTHING and object. Even if it helps with organization.
Those are just a few things that I've found throughout research. To me it sounds like this drives any programmer away from OOP, and creates a more data driven program. This is a challenge for someone like me who enjoys the flexibility and power of OOP. I've been looking into places to cut objects and to remove dependencies from complex relationships. If anyone has any knowledge into this subject area I'd love to hear it.
CPU -----> Caches -------------------> System memory
If it finds the address it is looking for in the cache, it results in a "cache hit" and if it doesn't the end result is a "cache miss", in which the cache fetches the address from system memory and then gives it to the program to use.
Just as a clarifying note, there isn't just one cache attached to the CPU, and caches are found all over hardware in other places as well. (Including things like web caches etc...)
I was wondering if anyone knew about this and was looking to have a discussion about it. I'm fairly new to the topic and thought that someone here might have some insight into it, as the forum seems to be knowledgable in a wide field of subjects. Anyways, here's a brief list of some things that I've found to help improve performance in any project, and they all help the cache to store more relevant data and access it faster.
1) Use smaller data types. This allows the cache to store more memory, as the cache is typically about 32kb in size.
2) Don't use linked lists, and use arrays that store contiguous data where possible. This allows the CPU to load blocks of relevant data into the CPU and let the code access them more readily. If the addresses were scattered (non-contiguous) then the CPU can't store as much relevant data for a single object, as it is scattered across the system memory and isn't easily fetchable.
3) Avoid multithreading in simple applications if possible, as the cores can actually fight over addresses, resulting in many more cache hits than normal.
4) Avoid the bottleneck of OOP, and don't make EVERYTHING and object. Even if it helps with organization.
Those are just a few things that I've found throughout research. To me it sounds like this drives any programmer away from OOP, and creates a more data driven program. This is a challenge for someone like me who enjoys the flexibility and power of OOP. I've been looking into places to cut objects and to remove dependencies from complex relationships. If anyone has any knowledge into this subject area I'd love to hear it.