Cache aware allocator
- cyboryxmen
- Posts: 190
- Joined: November 14th, 2014, 2:03 am
Cache aware allocator
Modern computers don't fetch memory byte by byte. Instead, they prefer to fetch large multi-byte chunks of memory called words. Specifically, most modern CPUs that you find in personal computers fetch memory as 64 byte words. The reason behind this size is that 64 bytes is the size of a cache line in the CPU's cache. Reading from RAM can be slow so CPUs prefer to read from their cache instead which is much faster. The cache is split into multiple cache lines and when it reads or writes to data that's not in the cache, it fetches it and puts it into a cache line.
With the way modern memory is hard coded, you have read a word from memory from a starting address that's a multiple of that word's size. For example, a 2 byte word has be read from the starting addresses of 0x00, 0x02, 0x04, 0x06, 0x08 etc. Likewise, a 4 byte word can only be read from the starting addresses 0x00, 0x04, 0x08 etc.
The concept of words only able to start at certain addresses is called alignment. A 4 byte word has to be aligned to 4 byte addresses and 8 byte words have to be aligned to 8 byte addresses. Making sure that your data is aligned properly in memory is important to optimising memory reads. Imagine if your 4 byte int is not aligned properly and crosses the boundary between two 4 byte words. The poor CPU has to load in those two words and shift their bits to the right alignment before they can be used! What if you have an array of these ints all misaligned? You'll need to do this extra work for every single one of them!
In particular, you would want your arrays to allocate according to the alignment of the cache line(which again, is 64 bytes for most commodity CPUs). This way, when the CPU fetches your array, it'll fit more nicely in cache and won't cause your cache to be filled with garbage data that you didn't intend on using. The standard unfortunately gives no guarantees to the alignment of data returned by new, delete and by association, std::vector.
Fortunately for us, the std::vector template allows you to input an allocator policy to customise how the vector allocates memory. I've created an allocator class that'll customise the vector to allocate to memory with a specific alignment. That alignment can be set to anything you want it to; whether that be the cache line or something even bigger! You can go check it out here!
With the way modern memory is hard coded, you have read a word from memory from a starting address that's a multiple of that word's size. For example, a 2 byte word has be read from the starting addresses of 0x00, 0x02, 0x04, 0x06, 0x08 etc. Likewise, a 4 byte word can only be read from the starting addresses 0x00, 0x04, 0x08 etc.
The concept of words only able to start at certain addresses is called alignment. A 4 byte word has to be aligned to 4 byte addresses and 8 byte words have to be aligned to 8 byte addresses. Making sure that your data is aligned properly in memory is important to optimising memory reads. Imagine if your 4 byte int is not aligned properly and crosses the boundary between two 4 byte words. The poor CPU has to load in those two words and shift their bits to the right alignment before they can be used! What if you have an array of these ints all misaligned? You'll need to do this extra work for every single one of them!
In particular, you would want your arrays to allocate according to the alignment of the cache line(which again, is 64 bytes for most commodity CPUs). This way, when the CPU fetches your array, it'll fit more nicely in cache and won't cause your cache to be filled with garbage data that you didn't intend on using. The standard unfortunately gives no guarantees to the alignment of data returned by new, delete and by association, std::vector.
Fortunately for us, the std::vector template allows you to input an allocator policy to customise how the vector allocates memory. I've created an allocator class that'll customise the vector to allocate to memory with a specific alignment. That alignment can be set to anything you want it to; whether that be the cache line or something even bigger! You can go check it out here!
Zekilk
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Cache aware allocator
Yep, same concept when dealing with SIMD registers, though on modern CPUs, there's less of a penalty with unaligned access as far as SIMD loads and stores are concerned.
I think cameron did something similar a while back with a custom aligned allocator, actually. He used _aligned_malloc and _aligned_free though to manage memory allocation and deallocation.
I think cameron did something similar a while back with a custom aligned allocator, actually. He used _aligned_malloc and _aligned_free though to manage memory allocation and deallocation.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
- cyboryxmen
- Posts: 190
- Joined: November 14th, 2014, 2:03 am
Re: Cache aware allocator
Just wanted to update this post saying that Visual Studio 2017 suports C++17's new dynamic allocation for over-aligned types. new(), delete() and std::allocator allocates objects in accordance to their specified alignment. All you have to do is create a struct that's specified to align to the cache line and you're golden.
So I guess I'm never turning the [/std:c++17] switch off ever.
Code: Select all
constexpr auto cache_alignment = std::size_t { 64 };
class alignas ( cache_alignment ) Object
{
};
// std::vector uses std::allocator so it'll allocate to the object's specified alignment.
std::vector<Object> objects;
Last edited by cyboryxmen on June 21st, 2018, 8:14 am, edited 3 times in total.
Zekilk
- cyboryxmen
- Posts: 190
- Joined: November 14th, 2014, 2:03 am
Re: Cache aware allocator
I've updated the repository to test this. See for yourself if it works properly
Zekilk
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Cache aware allocator
shouldn't it be alignas? not alignof?
alignof is like sizeof, it reports what the alignment of a data type is...so wouldn't alignof(cache_alignment ) be the same as alignof( size_t )? which would be either 4 bytes or 8 bytes depending on the architecture you compile for. Plus, class alignof( cache_alignment )Object gives an intellisense error "Expected a definition or tag name".
alignof is like sizeof, it reports what the alignment of a data type is...so wouldn't alignof(cache_alignment ) be the same as alignof( size_t )? which would be either 4 bytes or 8 bytes depending on the architecture you compile for. Plus, class alignof( cache_alignment )Object gives an intellisense error "Expected a definition or tag name".
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Cache aware allocator
That's funny. You had said before you work on projects that aren't C++17 ready, so you couldn't use C++ features...I suppose for your personal projects. I do this for all mine as well. Between the RAII , constexpr, auto and move semantics stuff that came with C++11 ( unique_ptr ), and the extended constexpr stuff in C++17 it's been nice to work in C++. I've even delved into templates a lot further than I ever would have thanks to stuff like if constexpr. Oh, and lambdas, can't forget about lambdas.So I guess I'm never turning the [/std:c++17] switch off ever.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
- cyboryxmen
- Posts: 190
- Joined: November 14th, 2014, 2:03 am
Re: Cache aware allocator
Thought you were referring to my repository code and I got confused because I was pretty sure I checked to see if it ran before uploading it. Yeah that's just a slight mistake and I've already rectified it. My work has finally allowed me to dive into C++17 and I'm discovering all sorts of crazy features of the new standard. I've created a bunch of tests trying them out ever since!
Zekilk
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Cache aware allocator
I was partially sure you weren't using it in live code lol. Just wanted to make sure someone running across the post didn't get the wrong information and cause frustration.
The next standard C++20 seems like it's going to have some stuff in it that seems kind of complex. Metaclasses, Reflection, Coroutines, Concepts, etc...
The next standard C++20 seems like it's going to have some stuff in it that seems kind of complex. Metaclasses, Reflection, Coroutines, Concepts, etc...
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com