Lec 3. Transformation

约 1516 字大约 5 分钟

2025-10-11

Transformations and homogeneous coordinates

Basic idea: $f$ transforms point $x$ to point $f(x)$ .

Transformation

Linear transformations: cheap to compute.
Composition of linear transformations is still linear.
Scale
- Uniform scale
  $S_a(x) = ax$
- Non-uniform scale
  $S_s = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix}$
Rotation
$R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$
Preserves length and angle.
$\|R_\theta x\| = \|x\|$
Translation (not linear)
$T_b(x) = x + b$
Reflection
$Re_{x} = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}, \quad Re_{y} = \begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix}$
Shear
$H_{xs} = \begin{bmatrix} 1 & s \\ 0 & 1 \end{bmatrix}, \quad H_{ys} = \begin{bmatrix} 1 & 0 \\ s & 1 \end{bmatrix}$

Summary of transformations

Linear: scale, rotation, reflection, shear
Non-linear: translation
Affine: linear + translation
$f(x) = g(x) + b$
Euclidean (isometric)
- Translation, rotation, reflection
- Preserves length and angle
  $\|f(x) - f(y)\| = \|x - y\|$
Rigid body transformations are distance-preserving motions that also preserve orientation.
Composition: order of operations matters.

Homogeneous coordinates (in 2D)

Idea: represent 2D points with THREE values.
$\begin{bmatrix} x \\ y \end{bmatrix} \rightarrow \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}$
Transforms are represented as $3\times 3$ matrices.
Let us encode translations as linear transformations.
Recover final 2D point by dividing by last coordinate.
$\begin{bmatrix} x' \\ y' \\ w' \end{bmatrix} \rightarrow \begin{bmatrix} x'/w' \\ y'/w' \end{bmatrix}$
Scale and rotation
$S_s = \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{bmatrix}, \quad R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}$
Translation with 2D-H
$T_b = \begin{bmatrix} 1 & 0 & b_x \\ 0 & 1 & b_y \\ 0 & 0 & 1 \end{bmatrix}$
Moving to 3D (and 3D-H)
- Scale
  $S_s = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$
- Shear (in x, based on y and z)
  $H_{x, d} = \begin{bmatrix} 1 & d_y & d_z & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$
- Translation
  $T_b = \begin{bmatrix} 1 & 0 & 0 & b_x \\ 0 & 1 & 0 & b_y \\ 0 & 0 & 1 & b_z \\ 0 & 0 & 0 & 1 \end{bmatrix}$

Inverse transformation

$M^{-1}$ is the inverse of transform $M$ in both a matrix and geometric sense.

Rotation and $SO(n)$

Topology

Structural properties of a manifold.
Two surfaces $M$ and $N$ are topologically equivalent if there is a differentiable bijection between $M$ and $N$ .

Orientation

Use rotation to represent the relative orientation between two frames.
- Space frame: $\{s\} = \{x_s, y_s, z_s\}$
- Body frame: $\{b\} = \{x_b, y_b, z_b\}$
- $R_{sb}$ rotates the frame of the space to the frame of the body after the origins are aligned.

The set of rotations

The special orthogonal group in $n$ dimensions.
$SO(n) = \{ R \in \mathbb{R}^{n \times n} | R^T R = I, \det(R) = 1 \}$
- $SO(2)$ is topologically equivalent to a circle.
- $SO(3)$ is topologically equivalent to a sphere with antipodal points identified.
- Circles do not have the same topology as $(-1, 1)^n$ , which means there is no differentiable bijection between $SO(2)$ and $(-1, 1)^n$ .
- The topology of $SO(3)$ is also different from $(-1, 1)^n$ .

Parameterize rotation is tricky

Ideal parameterization for $f(\theta): U \mapsto SO(2)$ in networks:
1. Domain should be $(-l, l)^n$ (as network output).
2. $f$ must be a differentiable bijection.
Issues arise when:
- Input data points are close, but $\theta$ predictions are far apart.
- Continuous network functions lead to poor intermediate predictions.
Special network designs are required to address these challenges.

3D Rotation representation

Euler angles

Rotation about principal axes:
$R_x(\alpha) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & \cos\alpha & -\sin\alpha \\ 0 & \sin\alpha & \cos\alpha \end{bmatrix}$ $R_y(\beta) = \begin{bmatrix} \cos\beta & 0 & \sin\beta \\ 0 & 1 & 0 \\ -\sin\beta & 0 & \cos\beta \end{bmatrix}$ $R_z(\gamma) = \begin{bmatrix} \cos\gamma & -\sin\gamma & 0 \\ \sin\gamma & \cos\gamma & 0 \\ 0 & 0 & 1 \end{bmatrix}$
Combined rotation (e.g., ZYX order) for arbitrary rotation:
$R = R_z(\gamma) R_y(\beta) R_x(\alpha)$
Gimbal lock
- Euler Angle is not unique for some rotations.
  $R_z(45^\circ) R_y(90^\circ) R_x(45^\circ) = R_z(90^\circ) R_y(90^\circ) R_x(90^\circ)$
- For example: when $\beta = \pi / 2$
  $R = R_z(\gamma) R_y(\pi/2) R_x(\alpha) = \begin{bmatrix} 0 & 0 & 1 \\ \sin(\alpha + \gamma) & \cos(\alpha + \gamma) & 0 \\ -\cos(\alpha + \gamma) & \sin(\alpha + \gamma) & 0 \end{bmatrix}$
  - $\alpha$ and $\gamma$ are coupled.
  - Lose one degree of freedom.

Axis-angle

Euler theorem: any rotation in $SO(3)$ is equivalent to rotation about a fixed axis $\omega \in \mathbb{R}^3$ through a positive $\theta$ .
$\hat{\omega}$ : unit vector of rotation axis ( $\|\hat{\omega}\| = 1$ ).
$\theta$ : angle of rotation
$\hat\omega \times a = [\hat\omega] a$
$[\hat\omega] = \begin{bmatrix} 0 & -\omega_3 & \omega_2 \\ \omega_3 & 0 & -\omega_1 \\ -\omega_2 & \omega_1 & 0 \end{bmatrix}$
$R\in SO(3) = Rot(\hat{\omega}, \theta)$
$Rot(\hat{\omega}, \theta) = e^{[\hat\omega] \theta} = I + \sin\theta [\hat\omega] + (1 - \cos\theta) [\hat\omega]^2$
Is there a unique parameterization? No.
1. $(\hat\omega, \theta)$ and $(-\hat\omega, -\theta)$ represent the same rotation.
2. when $\theta = 0$ , $\hat\omega$ can be any unit vector.
When 2 does not happen, and restrict $\theta \in (0, \pi]$ , a unique representation can be achieved.
- If $tr(R) \neq -1$ , then
  $\theta = \cos^{-1} \frac{1}{2}\left[ tr(R) - 1 \right]$ $\hat\omega = \frac{1}{2 \sin\theta} (R - R^T)$
- If $tr(R) = -1$ , they are the cases that $\theta = \pi$ for rotations around $x, y, z$ axes.
Distance between Rotations $R_1, R_2$ is the (minimal) effort to rotate the body from $R_1$ pose to $R_2$ .
$(R_2 R_1^T) R_1 = R_2$ $dist(R_1, R_2) = \cos^{-1} \frac{1}{2} [tr(R_2 R_1^T) - 1]$

Quaternion

A unit quaternion $\lVert q \rVert = 1$ can represent a rotation.
Rotate a vector $\vec{x}$ by quaternion $q$ :
$x' = q x q^{-1}$
where
$x = (0, \vec{x})$ $q = \left( \cos\frac{\theta}{2}, \hat\omega \sin\frac{\theta}{2} \right)$
Compose rotations by multiplying quaternions.
Given quaternion, the rotation matrix is
$R(q) = E(q)G(q)^T$
where
$E(q) = [-\vec{v}, wI + [\vec{v}]]$ $G(q) = [-\vec{v}, wI - [\vec{v}]]$

Summary

Representation	Inverse?	Composing?
Rotation Matrix	Good	Good
Euler Angle	Complicated	Complicated
Angle-axis	Good	Complicated
Skew-symmetrical Matrix	Good	Complicated
Quaternion	Good	Good

Viewing transformation

How to take a photo?
- Find a good place and arrange people (model transformation)
- Find a good "angle" to put the camera (view transformation)
- Cheese! (projection transformation)

View / Camera Transformation

Camera
- Position: $\vec{e}$
- Look-at: $\hat g$
- Up direction: $\hat t$ (perpendicular to $\hat g$ )
If the camera and all objects move together, the image does not change.
Transform the camera by $M_{view}$ .
$M_{view} = R_{view} T_{view}$
- Translation:
  $T_{view} = \begin{bmatrix} 1 & 0 & 0 & -e_x \\ 0 & 1 & 0 & -e_y \\ 0 & 0 & 1 & -e_z \\ 0 & 0 & 0 & 1 \end{bmatrix}$
  Move the camera and objects so that the camera is at the origin.
- Rotation:
  $R_{view} = \begin{bmatrix} x_{\hat g \times \hat t} & y_{\hat g \times \hat t} & z_{\hat g \times \hat t} & 0 \\ x_{\hat t} & y_{\hat t} & z_{\hat t} & 0 \\ -x_{\hat g} & -y_{\hat g} & -z_{\hat g} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$
  Project world coordinates to camera coordinates. (Equivalent to rotating the camera frame to align with world frame)

Projection Transformation

3D to 2D
Orthographic / Perspective projection
After MVP (Model-View-Projection) transformation $\to$ normalized device coordinate (NDC)
How to do perspective projection?
1. "Squish" the frustum into a cuboid: $M_{persp\to ortho}$ .
2. Do orthographic projection: $M_{ortho}$ , already known.
"Squish" matrix
$M_{persp\to ortho} = \begin{bmatrix} n & 0 & 0 & 0 \\ 0 & n & 0 & 0 \\ 0 & 0 & n + f & -nf \\ 0 & 0 & 1 & 0 \end{bmatrix}$
- Matrix elements only depend on near and far plane distances.
- The third row can be derived by the requirement that $z$ coordinate at near and far planes remains unchanged after transformation.
- Notice that the depth mapping is non-linear.
Orthographic projection matrix
$M_{ortho} = \begin{bmatrix} \frac{2}{r - l} & 0 & 0 & -\frac{r + l}{r - l} \\ 0 & \frac{2}{t - b} & 0 & -\frac{t + b}{t - b} \\ 0 & 0 & -\frac{2}{f - n} & -\frac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \end{bmatrix}$
Perspective projection matrix
$M_{persp} = M_{ortho} M_{persp\to ortho} = \begin{bmatrix} \frac{2n}{r - l} & 0 & \frac{r + l}{r - l} & 0 \\ 0 & \frac{2n}{t - b} & \frac{t + b}{t - b} & 0 \\ 0 & 0 & -\frac{f + n}{f - n} & -\frac{2fn}{f - n} \\ 0 & 0 & -1 & 0 \end{bmatrix}$
MVP matrix
$M_{MVP} = M_{persp} M_{view} M_{model}$
Finally transform the NDC space to screen space (another affine transformation).