Cache aware allocator

The Partridge Family were neither partridges nor a family. Discuss.
Post Reply
User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Cache aware allocator

Post by cyboryxmen » January 22nd, 2018, 9:39 am

Modern computers don't fetch memory byte by byte. Instead, they prefer to fetch large multi-byte chunks of memory called words. Specifically, most modern CPUs that you find in personal computers fetch memory as 64 byte words. The reason behind this size is that 64 bytes is the size of a cache line in the CPU's cache. Reading from RAM can be slow so CPUs prefer to read from their cache instead which is much faster. The cache is split into multiple cache lines and when it reads or writes to data that's not in the cache, it fetches it and puts it into a cache line.

With the way modern memory is hard coded, you have read a word from memory from a starting address that's a multiple of that word's size. For example, a 2 byte word has be read from the starting addresses of 0x00, 0x02, 0x04, 0x06, 0x08 etc. Likewise, a 4 byte word can only be read from the starting addresses 0x00, 0x04, 0x08 etc.

The concept of words only able to start at certain addresses is called alignment. A 4 byte word has to be aligned to 4 byte addresses and 8 byte words have to be aligned to 8 byte addresses. Making sure that your data is aligned properly in memory is important to optimising memory reads. Imagine if your 4 byte int is not aligned properly and crosses the boundary between two 4 byte words. The poor CPU has to load in those two words and shift their bits to the right alignment before they can be used! What if you have an array of these ints all misaligned? You'll need to do this extra work for every single one of them!

In particular, you would want your arrays to allocate according to the alignment of the cache line(which again, is 64 bytes for most commodity CPUs). This way, when the CPU fetches your array, it'll fit more nicely in cache and won't cause your cache to be filled with garbage data that you didn't intend on using. The standard unfortunately gives no guarantees to the alignment of data returned by new, delete and by association, std::vector.

Fortunately for us, the std::vector template allows you to input an allocator policy to customise how the vector allocates memory. I've created an allocator class that'll customise the vector to allocate to memory with a specific alignment. That alignment can be set to anything you want it to; whether that be the cache line or something even bigger! You can go check it out here!
Zekilk

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Cache aware allocator

Post by albinopapa » January 24th, 2018, 12:17 am

Yep, same concept when dealing with SIMD registers, though on modern CPUs, there's less of a penalty with unaligned access as far as SIMD loads and stores are concerned.

I think cameron did something similar a while back with a custom aligned allocator, actually. He used _aligned_malloc and _aligned_free though to manage memory allocation and deallocation.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Re: Cache aware allocator

Post by cyboryxmen » June 21st, 2018, 6:48 am

Just wanted to update this post saying that Visual Studio 2017 suports C++17's new dynamic allocation for over-aligned types. new(), delete() and std::allocator allocates objects in accordance to their specified alignment. All you have to do is create a struct that's specified to align to the cache line and you're golden.

Code: Select all

constexpr auto cache_alignment = std::size_t { 64 };
class alignas ( cache_alignment ) Object
{
};

// std::vector uses std::allocator so it'll allocate to the object's specified alignment.
std::vector<Object> objects;
So I guess I'm never turning the [/std:c++17] switch off ever.
Last edited by cyboryxmen on June 21st, 2018, 8:14 am, edited 3 times in total.
Zekilk

User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Re: Cache aware allocator

Post by cyboryxmen » June 21st, 2018, 7:13 am

I've updated the repository to test this. See for yourself if it works properly
Zekilk

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Cache aware allocator

Post by albinopapa » June 21st, 2018, 7:57 am

shouldn't it be alignas? not alignof?

alignof is like sizeof, it reports what the alignment of a data type is...so wouldn't alignof(cache_alignment ) be the same as alignof( size_t )? which would be either 4 bytes or 8 bytes depending on the architecture you compile for. Plus, class alignof( cache_alignment )Object gives an intellisense error "Expected a definition or tag name".
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Cache aware allocator

Post by albinopapa » June 21st, 2018, 8:04 am

So I guess I'm never turning the [/std:c++17] switch off ever.
That's funny. You had said before you work on projects that aren't C++17 ready, so you couldn't use C++ features...I suppose for your personal projects. I do this for all mine as well. Between the RAII , constexpr, auto and move semantics stuff that came with C++11 ( unique_ptr ), and the extended constexpr stuff in C++17 it's been nice to work in C++. I've even delved into templates a lot further than I ever would have thanks to stuff like if constexpr. Oh, and lambdas, can't forget about lambdas.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
cyboryxmen
Posts: 190
Joined: November 14th, 2014, 2:03 am

Re: Cache aware allocator

Post by cyboryxmen » June 21st, 2018, 8:14 am

Thought you were referring to my repository code and I got confused because I was pretty sure I checked to see if it ran before uploading it. Yeah that's just a slight mistake and I've already rectified it. My work has finally allowed me to dive into C++17 and I'm discovering all sorts of crazy features of the new standard. I've created a bunch of tests trying them out ever since!
Zekilk

albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: Cache aware allocator

Post by albinopapa » June 21st, 2018, 4:53 pm

I was partially sure you weren't using it in live code lol. Just wanted to make sure someone running across the post didn't get the wrong information and cause frustration.

The next standard C++20 seems like it's going to have some stuff in it that seems kind of complex. Metaclasses, Reflection, Coroutines, Concepts, etc...
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: Cache aware allocator

Post by chili » June 23rd, 2018, 5:48 am

Concepts <3
Chili

Post Reply