RPU: A Programmable Ray Processing Unit for Realtime Ray Tracing

Sven Woop, Jörg Schmittler, and Philipp Slusallek

accepted for: SIGGRAPH 2005
PDF (4,8 MB) BIB-TEX (full version only in conference proceedings)


The RPU is a fully programmable ray tracing hardware architecture, with support for programmable material, geometry and lighting. The RPU combines the efficiency of GPUs with the advantages of ray tracing. The instruction set of the RPU is GPU like, which is optimal for shading purposes. In addition the RPU supports fast ray traversal through an k-D tree using a dedicated hardware unit and recursive function calls, usefull for recursive ray tracing. To increase efficiency always 4 rays are handled in a packet and multi-threading allows for high utilization of the hardware units.

A working prototype of this hardware architecture has been developed based on FPGA technology. The ray tracing performance of the FPGA prototype running at 66 MHz is comparable to the OpenRT ray tracing performance of a Pentium 4 clocked at 2.6 GHz, despite the available memory bandwith to our RPU prototype is only about 350 MB/s. These numbers show the efficiency of the design, and one might estimate the performance degrees reachable with todays high end ASIC technology. High end graphics cards from NVIDIA provide 23 times more programmable floating point performance and 100 times more memory bandwidth as our prototype. The prototype can be parallelized to several FPGAs, each holding a copy of the scene. A setup with two FPGAs delivering twice the performance of a single FPGA is running in our lab. Scalability to up to 4 FPGA has been tested.

Screenshots are presented 1024x768 resolution with oversampling turned on for most scenes. Please note that all lights, shadows and reflections are calculated and no lightmaps or environmental maps have been used. Detailed measurements of all scenes can be found in the paper.

The following video is computed in realtime on two FPGAs cards.
Video : 36 MB, MPEG-4, 512 x 384



Spheres: 15000 triangles, 6 objects

Some spheres bouncing around. The spheres are analytically intersected by a special geometry shader. The caustic in the center is not computed physically correct, but approximated by a kind of shadow shader.

Porsche: 82,836 triangles, 1 object

A Porsche 996 model, with realistic car paint and glass shader. One can see the correct reflection of the environment in the car.


Quake3-p: 52,790 triangles, 17 objects

This scene shows some animated monsters running around casting correct shadows.


Gael: 52,479 triangles, 1 object

Scene taken from UT2003 illuminated by a light source.


Conference: 282,805 triangles, 54 objects

A conference room, each chair is an object and can thus be moved around. The last image shows the edge filtering performed for adaptive oversampling. Only for the detected edges more rays are shoot dynamically to get high image quality.



Scene 6: 806 triangles, 1 object

A quite simple scene, including a single point lightsource.




Sven Woop, 28.06.05