Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with source)
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
Instead of CUDA, I would go DirectCompute since it works on all DX 11 and higher cards, not just nVidia.
As far as gimble lock, you would need to use quaternian rotations. Unfortunately, I've not been able to wrap my head around how they work or how to implement, guess I just haven't had it explained well enough. I know the basic concept of rotating by half the angle in one direction and half in another, and it's represented as a hypersphere. I'm kind of thinking you rotate a normal around the X, Y and Z axes, then scale.
As far as occlusion goes, you could start with bounding boxes. Transforming the bounding boxes to 2D screen space, should make it easier to detect if the object is visible or not and throw it out and not have to transform the rest of the vertices for that object. You may even be able to calculate which points need to be drawn based on that information as well.
As far as gimble lock, you would need to use quaternian rotations. Unfortunately, I've not been able to wrap my head around how they work or how to implement, guess I just haven't had it explained well enough. I know the basic concept of rotating by half the angle in one direction and half in another, and it's represented as a hypersphere. I'm kind of thinking you rotate a normal around the X, Y and Z axes, then scale.
As far as occlusion goes, you could start with bounding boxes. Transforming the bounding boxes to 2D screen space, should make it easier to detect if the object is visible or not and throw it out and not have to transform the rest of the vertices for that object. You may even be able to calculate which points need to be drawn based on that information as well.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
The only reason I would use CUDA is because it's the only GPU compute language that seems to make intuitive sense to me and it should also be very fast.
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
Hey can you check and see if this runs on your computer? It runs on mine, but that's not saying much as the last SSE program I uploaded it didn't work on others...probably Intel chips.
Some stuff got cut like the specular lighting by accident. Also, the view is off, it's rotated clockwise 90 degrees for some reason.
The biggest slowdown after all that is actually filling in the holes between points when you are super close or zoom in a lot.
Some stuff got cut like the specular lighting by accident. Also, the view is off, it's rotated clockwise 90 degrees for some reason.
The biggest slowdown after all that is actually filling in the holes between points when you are super close or zoom in a lot.
- Attachments
-
- SSE_PRayEngine.zip
- Implemented SSE for rendering anyway.
- (1.6 MiB) Downloaded 167 times
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
Wont run for me, just crashes when I try to start it. I'm using an AMD FX-4300 Quad Core.
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
Dang it, I need more core architectures to work with.
Does it give you the option to debug when it crashes? Perhaps it would tell you where the problem is. I'll try it on my other 3 computers and see if I find anything.
Does it give you the option to debug when it crashes? Perhaps it would tell you where the problem is. I'll try it on my other 3 computers and see if I find anything.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
- viruskiller
- Posts: 399
- Joined: June 14th, 2012, 5:07 pm
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
just ran albinopapa's version and it seems to go fulll 60 fps untill about 3-4 k focal length, then fps jumps up and down depending on the box orientation,
not quite clear, but at 6k focal length it seems to go ok on the red part,jump up on the part clipping into the camera, and then down to 30 ish when starting with the back side,
for some reason the transform operation takes longer when showing the bar coded side of the box:/
not quite clear, but at 6k focal length it seems to go ok on the red part,jump up on the part clipping into the camera, and then down to 30 ish when starting with the back side,
for some reason the transform operation takes longer when showing the bar coded side of the box:/
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
Ok, made some more changes to various things to force the matchbox to show correctly and I think the view is right side up...can't really tell with only one object on screen.
I've tried this on 2 other core architectures, an AMD APU from a laptop and a Intel Core 2 duo and it runs on both and really well. The transform times are between 2 and 5 ms at start instead of 24 and that's on all 4 of my computers, 2 of which are Phenom II models just different speeds and one is a DDR2 vs DDR3.
Must have a 64 bit machine and if you recompile, you must compile as 64 bit or it will fail. I didn't use _align_malloc. It would crash if I used it while compiling in 64 bit mode, so for now I just use _declspec(align(16)) in front of classes and structs that need to be aligned and I just use new for memory allocation.
Anyway, here are all the changes I made that I can remember in no particular order.
-Compiled using x64.
-Moved all the rendering code to a Render class. Just to clean up the Compose frame function
-Reduced the number of if statements to check for offscreen - was able to use SSE to do most of the calculations for boundaries and clipping.
-Now offsets the pixels on load instead of getting offset from center for every point every frame, this took me a while to get the calculations in the right order.
-For some things I decided to use the DirectXMath library as I've not figured out how to do the inverse of a matrix and my rotations were off a bit so I used it to create the rotation matrices from the camera and object orientation vectors.
-Added a vec4 class to use with the matrix class, was getting confused how to do the math with a 3x3 matrix for rotation and translation, another reason is the Vect class was only 12 bytes, and I wanted to align to 16 byte boundaries for better speed.
-Also, on load, after offsetting all the vertices I moved them to an array, which I think will allow me to make even a few more tweaks for speed.
-Added a backup case for creating a d3d9 device as the core2 duo I have doesn't support Harware Vertex Processing and Pure Device as it only has on board graphics.
-Optimized the memory copy from system to video card using SSE to transfer 16 bytes at a time instead of 4 bytes at a time using memcopy.
Some things that need to be readded:
-ability to distinctly render more than one item. What I mean is, right now you could add all the vertices from all objects in at once and render them. You would be relying on the vertex shader code to clip everything which would be slow.
-ability to render more than one light. I'm not sure this is broken, just have it statically set to one light for testing.
-specular lighting <- was removed by accident, should still be there just commented out.
I think that's it, hope it works.
I've tried this on 2 other core architectures, an AMD APU from a laptop and a Intel Core 2 duo and it runs on both and really well. The transform times are between 2 and 5 ms at start instead of 24 and that's on all 4 of my computers, 2 of which are Phenom II models just different speeds and one is a DDR2 vs DDR3.
Must have a 64 bit machine and if you recompile, you must compile as 64 bit or it will fail. I didn't use _align_malloc. It would crash if I used it while compiling in 64 bit mode, so for now I just use _declspec(align(16)) in front of classes and structs that need to be aligned and I just use new for memory allocation.
Anyway, here are all the changes I made that I can remember in no particular order.
-Compiled using x64.
-Moved all the rendering code to a Render class. Just to clean up the Compose frame function
-Reduced the number of if statements to check for offscreen - was able to use SSE to do most of the calculations for boundaries and clipping.
-Now offsets the pixels on load instead of getting offset from center for every point every frame, this took me a while to get the calculations in the right order.
-For some things I decided to use the DirectXMath library as I've not figured out how to do the inverse of a matrix and my rotations were off a bit so I used it to create the rotation matrices from the camera and object orientation vectors.
-Added a vec4 class to use with the matrix class, was getting confused how to do the math with a 3x3 matrix for rotation and translation, another reason is the Vect class was only 12 bytes, and I wanted to align to 16 byte boundaries for better speed.
-Also, on load, after offsetting all the vertices I moved them to an array, which I think will allow me to make even a few more tweaks for speed.
-Added a backup case for creating a d3d9 device as the core2 duo I have doesn't support Harware Vertex Processing and Pure Device as it only has on board graphics.
-Optimized the memory copy from system to video card using SSE to transfer 16 bytes at a time instead of 4 bytes at a time using memcopy.
Some things that need to be readded:
-ability to distinctly render more than one item. What I mean is, right now you could add all the vertices from all objects in at once and render them. You would be relying on the vertex shader code to clip everything which would be slow.
-ability to render more than one light. I'm not sure this is broken, just have it statically set to one light for testing.
-specular lighting <- was removed by accident, should still be there just commented out.
I think that's it, hope it works.
- Attachments
-
- SSE_PRayEngine_preAlpha.zip
- Aug 13, more optimizations
- (1.61 MiB) Downloaded 146 times
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
Re: Pray Engine: 3D Point-Cloud Ray-Tracing Engine (with sou
Yeah that alpha upload is working for me, very nice. Lots of things broken though.