Page 1 of 3
Posted: November 13th, 2019, 10:20 pm
Hey all...new guy here.
First of all, Chili, your stuff is INSANELY good. I've seen probably most(?) of your youtube videos, and I'm freakin' amazed at how excellent they are. Don't change a thing. Freakin' brilliant. And funny. I'm planning on doing that Patreon thing once I get it figured out.
And a question for anyone who might have some insight into this...
My goal has been for a long time to learn about and write a very simple ray tracer that uses my GPU (GTX-1080ti). I've found some github stuff that implements the excellent "Raytracing in a Weekend" simple raytracer by Peter Shirley, but also incorporates CUDA, and was written by an NVIDIA engineer (Roger Allen). The problem for me is that both of these efforts merely print the image to a PPM text file, not directly to a window.
The thing that's got me a big overwhelmed is trying to figure how to merge that with Chili's super excellent Framework (gfx.PutPixel) and maybe incorporate the excellent IMGUI that Chili describes so I can have a realtime raytracer with a nice GUI to move the camera, etc. But there's the choices of DirectX, CUDA, or maybe OpenGL, and so on, and all of that has me kinda scratching my head on which way to go.
Anyway, now with the gaming interest in the new NVIDIA RTX raytracing stuff (something else I'd like to learn about with all of this), I'm hoping that there's more stuff out there explaining how to make it all work.
Any insights would be greatly appreciated. I don't *think* that Chili has a specific raytracing video, right?
Anyway,thanks Chili for the incredible learning opportunities. I'm thinking this weekend I'll take a shot at integrating the Chili framework/windows into the code for the CUDA raytracer, and hope that I can do a gfx.PutPixel to put it on the screen without it all blowing up.
Posted: November 14th, 2019, 1:36 am
CUDA + Chili Framework are the only things you need for Raytracing. If you want, you can also try Vulkan and use Nvidia's raytracing extensions but that limits your software to Nvidia's raytracing cards.
Posted: November 14th, 2019, 10:44 am
DX12 - DXR is raytracing for any card, but Windows specific. You don't want to use the PutPixel function that's for sure. There's also C++AMP which is C++ front end, but Windows D3D back end.
There's OpenCL which is multi-platform for GPGPU ( which you can use for RT and works for any OS and any GPU that supports OpenCL most do ) then use the chili framework to copy the raytraced pixel data to the back buffer using the Graphics::EndFrame() function.
Posted: November 14th, 2019, 12:12 pm
Thanks. I've pretty much got the Chili framework set up, and I integrated about 50% of the ray tracing code into that solution, and it's working fine, rendering out a simple raytraced sphere. Though as you say it's slow as balls (especially when I increase the # of samples per pixel) since I'm using the serial PutPixel.
I'm thinking I'll try starting with the code written by the NVIDIA guy which combines the core raytracing code plus CUDA to have it all done on the GPU (which I've also got working fine), and next I'll add the Chili framework to get it to render to a window, at least as a starting point.
Worst case is that it will render and fill the framebuffer super fast via the GPU, then relatively slowly draw to the window using PutPixel. Not optimal, but at least a first step. And then figure how to swap the FB directly to the screen with the framework. Though I recall that's what Chili already did in his advanced series, I just need to figure how to replicate.
Posted: November 14th, 2019, 6:21 pm
So now if I can just integrate a graphics pipeline with, say, CUDA and do that framebuffer swapping thing it should be pretty cool. And of course add something like IMGUI for some control.
Cool !! I integrated the ray tracer with the Chili framework and came up with the basic raytraced render above. Had to tweak some stuff, but it seems to work fine now. This is with only 3 samples per pixel, and using the PutPixel function. It updates the screen every 4-5 seconds with 3 samples per pixel. Ideally, it should be a LOT more samples to get it less grainy, but it's a start.
Posted: November 14th, 2019, 6:29 pm
And here's a yummy raytraced image with a much nicer 20 samples per pixel. Takes about 20 or so seconds to render, but hey, it's a start.
Posted: November 14th, 2019, 6:46 pm
Looks like it was using only 1 thread of my 8-core Ryzen, which I suppose makes sense since I didn't do any parallel-izing stuff.
Ahhh...okay...lesson learned. I cranked it up to 100 samples and got the above render (nice and smooth) but also changed from Debug to Release, and each update to the render took only about 5 seconds, rather than what would be probably minutes in Debug. Sweet. And this is with PutPixel. Damn.
Posted: November 14th, 2019, 9:21 pm
Yeah, an 8 thread ray tracer in your case would be a theoretical .625 second frame render, which is still < 2 fps. ( 1.6 fps ). If you aren't using SIMD instructions, you could get some performance gains by doing so. This will boost it up to a theoretical 6 fps.
There might be other ways to increase performance depending on how "naive" your implementation is. Are you using search and sorting algorithms to track objects and lights in the scene so you aren't iterating over every object N squred times for instance? Is your data close together to allow for the CPU cache to be used efficiently? These are the only two I could think of right now, but I'm sure there are other such optimizations that could carry over to GPU rendering so squeeze the most performance out.
Algorithms - complex algorithms require more cpu time
Data structure - data that is close together is prefetched and has quicker access
Instruction set - ( x86/x64 vs SSE/AVX ),
branch predictions - complex if statements or multiple if statements can cause the CPU to choose the wrong branch requiring it to start over
Thread count - the more threads you have, the more potential data gets processed
Contention - Threads fighting for resources requires locking the resource so that a data race does not happen, locking resources is costly
Allocations - While allocations can be costly, deallocations are more costly because the CPU has to look for the pointer in some list or table. This has been my experience anyway, but since you have to deallocate anything you allocate most people just consider the allocation as the bottleneck.
These things will all affect the performance of your program a lot.
Posted: November 14th, 2019, 9:38 pm
Thanks. Yeah, my goal here really has little to do with performance, and more to do with learning: figuring out how to implement a raytracer, how to do a GPU graphics pipeline with a raytracer, and how to implement a GUI to allow me to control the scene contents in real time. I have pretty much zero interest in games and stuff, so fps really isn't on my list.
Right now I've got a ray tracing core, and it's rendering to a window, and now I need to figure out how to add the GPU and then maybe IMGUI. Though I'm still scratching my head about how to proceed. I'm hoping it won't be a big deal to hand over the Chili framework window handle or whatever to CUDA so the GPU can do its thing. Or maybe I'll try to get IMGUI implemented first, then go for the whole CUDA thing. Or maybe I should follow the Chili tutorials on Direct3D and IMGUI to get both of these accomplished.
Heck, I dunno....
Posted: November 14th, 2019, 10:00 pm
I don't think CUDA uses window handles. You would allocate a buffer on the GPU and have the image rendered to that, then copy the buffer to the Graphics::pSysBuffer ( for instance ) then that buffer will be copied to the D3D buffer on the GPU and displayed. This is probably a huge over simplification, but it's the general idea.