2007-04-06

2007-04-05 Fun With Matrices

There is nothing interesting to talk about today, so I won't even try. Instead, I will write about transformation matrices. Not very interesting, I admit, but I have this scratch sheet on my desk that I want to throw away without losing the information it contains. That's why I decided to simply blog it. Enjoy.

Consider you have a camera, a tripod, an angle gauge and a measuring tape. You take a photo of something and measure where exactly the camera is standing and the direction in which it is looking, relative to a certain reference object. This is usually done in professional advertising photography and of course in Hollywood movies, when some computer-generated elements are added to the scene in post production. Needless to say, in reality the whole measurement is done by a computer.
If, for some reasons, no measurement took place (handheld cameras, inaccessible environment or just laziness), it can be done by software, too. In fact, a Technical Oscar has been given to the guys who first figured it out for Lord Of The Rings. Nowadays it's become much simpler. There is free software out there that can compute the position of the camera relative to the photographed objects, as long as the user provides some information like the size of the object. What the software usually returns is a matrix that contains all necessary information. I'm not going into detail here, but I'm talking about the OpenGL modelview matrix. If you're not into nerdy talk, this is where you might want to stop reading. Seriously.

Here is how to get yaw, pitch, roll, direction, distance and height of the camera from this modelview matrix.

Let M be the 4x4 column major modelview matrix. Yaw is defined as the rotation of the looking-axis around the y-axis. Pitch is the angle between the looking-axis and its projection to the xz-plane. Roll is the rotation around the looking-axis. Direction describes the rotation around the y-axis. The distance is actually the projected distance and the height is the distance to the xz-plane. Here we go:

M = rotate( roll, z-axis ) * rotate( pitch, x-axis ) * rotate( yaw, y-axis ) + translate( t )

M0 = cos(roll)*cos(yaw) - sin(roll)*sin(pitch)*sin(yaw)
M1 = sin(roll)*cos(yaw) - cos(roll)*sin(pitch)*sin(yaw)
M2 = -cos(pitch)*sin(yaw)
M3 = 0
M4 = -sin(roll)*cos(pitch)
M5 = cos(roll)*cos(pitch)
M6 = sin(pitch)
M7 = 0
M8 = cos(roll)*sin(yaw) + sin(roll)*sin(pitch)*cos(yaw)
M9 = sin(roll)*sin(yaw) + cos(roll)*sin(pitch)*cos(yaw)
M10 = cos(pitch)*cos(yaw)
M11 = 0
M12 = tx
M13 = ty
M14 = tz
M15 = 1

As intimidating as it looks, we only need to solve a linear system of equations. That's all there is to it.

But first some observations:
- M is orthonormal, which means that the column vectors are normalized and orthogonal. Hence M-1 = MT.
- We run into a Gimbal Lock if cos(pitch) is zero, i.e. if we look up or down. Even worse, atan2 is numerically unstable if cos(pitch) is close to zero.

And the solution is:
yaw = atan2( -M2, M10 )
pitch = asin( M6 )
roll = atan2( -M4, M5 )

For the translation part we need R-1t, where R is the upper left 3x3 part of M and t is the translation part of M.

x = -M0*M12 - M1*M13 - M2*M14
y = -M4*M12 - M5*M13 - M6*M14
z = -M8*M12 - M9*M13 - M10*M14

direction = -atan2( x, z )
distance = sqrt( x2 + z2 )
height = y

These data can comfortably be fed into rendering software to add artificial objects to a photorealistic scene.

I'm awfully sorry that I couldn't make it sound interesting. I had a professor once who could. Computer graphics by itself can be absolutely cool, but he made it drop-dead gorgeous.

No comments: