Lec 3. Transformation
约 1516 字大约 5 分钟
2025-10-11
Transformations and homogeneous coordinates
- Basic idea: f transforms point x to point f(x).
Transformation
Linear transformations: cheap to compute.
Composition of linear transformations is still linear.
Scale
Uniform scale
Sa(x)=ax
Non-uniform scale
Ss=[sx00sy]
Rotation
Rθ=[cosθsinθ−sinθcosθ]
Preserves length and angle.
∥Rθx∥=∥x∥
Translation (not linear)
Tb(x)=x+b
Reflection
Rex=[100−1],Rey=[−1001]
Shear
Hxs=[10s1],Hys=[1s01]
Summary of transformations
Linear: scale, rotation, reflection, shear
Non-linear: translation
Affine: linear + translation
f(x)=g(x)+b
Euclidean (isometric)
Translation, rotation, reflection
Preserves length and angle
∥f(x)−f(y)∥=∥x−y∥
Rigid body transformations are distance-preserving motions that also preserve orientation.
Composition: order of operations matters.
Homogeneous coordinates (in 2D)
Idea: represent 2D points with THREE values.
[xy]→xy1
Transforms are represented as 3×3 matrices.
Let us encode translations as linear transformations.
Recover final 2D point by dividing by last coordinate.
x′y′w′→[x′/w′y′/w′]
Scale and rotation
Ss=sx000sy0001,Rθ=cosθsinθ0−sinθcosθ0001
Translation with 2D-H
Tb=100010bxby1
Moving to 3D (and 3D-H)
Scale
Ss=sx0000sy0000sz00001
Shear (in x, based on y and z)
Hx,d=1000dy100dz0100001
Translation
Tb=100001000010bxbybz1
Inverse transformation
M−1 is the inverse of transform M in both a matrix and geometric sense.
Rotation and SO(n)
Topology
Structural properties of a manifold.
Two surfaces M and N are topologically equivalent if there is a differentiable bijection between M and N.

Orientation
Use rotation to represent the relative orientation between two frames.
Space frame: {s}={xs,ys,zs}
Body frame: {b}={xb,yb,zb}
Rsb rotates the frame of the space to the frame of the body after the origins are aligned.
The set of rotations
The special orthogonal group in n dimensions.
SO(n)={R∈Rn×n∣RTR=I,det(R)=1}
SO(2) is topologically equivalent to a circle.
SO(3) is topologically equivalent to a sphere with antipodal points identified.
Circles do not have the same topology as (−1,1)n, which means there is no differentiable bijection between SO(2) and (−1,1)n.
The topology of SO(3) is also different from (−1,1)n.
Parameterize rotation is tricky
Ideal parameterization for f(θ):U↦SO(2) in networks:
Domain should be (−l,l)n (as network output).
f must be a differentiable bijection.
Issues arise when:
Input data points are close, but θ predictions are far apart.
Continuous network functions lead to poor intermediate predictions.
Special network designs are required to address these challenges.
3D Rotation representation
Euler angles
Rotation about principal axes:
Rx(α)=1000cosαsinα0−sinαcosα
Ry(β)=cosβ0−sinβ010sinβ0cosβ
Rz(γ)=cosγsinγ0−sinγcosγ0001
Combined rotation (e.g., ZYX order) for arbitrary rotation:
R=Rz(γ)Ry(β)Rx(α)
Gimbal lock
Euler Angle is not unique for some rotations.
Rz(45∘)Ry(90∘)Rx(45∘)=Rz(90∘)Ry(90∘)Rx(90∘)
For example: when β=π/2
R=Rz(γ)Ry(π/2)Rx(α)=0sin(α+γ)−cos(α+γ)0cos(α+γ)sin(α+γ)100
α and γ are coupled.
Lose one degree of freedom.
Axis-angle
Euler theorem: any rotation in SO(3) is equivalent to rotation about a fixed axis ω∈R3 through a positive θ.
ω^: unit vector of rotation axis (∥ω^∥=1).
θ: angle of rotation
ω^×a=[ω^]a
[ω^]=0ω3−ω2−ω30ω1ω2−ω10
R∈SO(3)=Rot(ω^,θ)
Rot(ω^,θ)=e[ω^]θ=I+sinθ[ω^]+(1−cosθ)[ω^]2
Is there a unique parameterization? No.
(ω^,θ) and (−ω^,−θ) represent the same rotation.
when θ=0, ω^ can be any unit vector.
When 2 does not happen, and restrict θ∈(0,π], a unique representation can be achieved.
If tr(R)=−1, then
θ=cos−121[tr(R)−1]
ω^=2sinθ1(R−RT)
If tr(R)=−1, they are the cases that θ=π for rotations around x,y,z axes.
Distance between Rotations R1,R2 is the (minimal) effort to rotate the body from R1 pose to R2.
(R2R1T)R1=R2
dist(R1,R2)=cos−121[tr(R2R1T)−1]
Quaternion
A unit quaternion ∥q∥=1 can represent a rotation.
Rotate a vector x by quaternion q:
x′=qxq−1
where
x=(0,x)
q=(cos2θ,ω^sin2θ)
Compose rotations by multiplying quaternions.
Given quaternion, the rotation matrix is
R(q)=E(q)G(q)T
where
E(q)=[−v,wI+[v]]
G(q)=[−v,wI−[v]]
Summary
| Representation | Inverse? | Composing? |
|---|---|---|
| Rotation Matrix | Good | Good |
| Euler Angle | Complicated | Complicated |
| Angle-axis | Good | Complicated |
| Skew-symmetrical Matrix | Good | Complicated |
| Quaternion | Good | Good |
Viewing transformation
How to take a photo?
Find a good place and arrange people (model transformation)
Find a good "angle" to put the camera (view transformation)
Cheese! (projection transformation)

View / Camera Transformation
Camera
Position: e
Look-at: g^
Up direction: t^ (perpendicular to g^)

If the camera and all objects move together, the image does not change.
Transform the camera by Mview.
Mview=RviewTview
Translation:
Tview=100001000010−ex−ey−ez1
Move the camera and objects so that the camera is at the origin.
Rotation:
Rview=xg^×t^xt^−xg^0yg^×t^yt^−yg^0zg^×t^zt^−zg^00001
Project world coordinates to camera coordinates. (Equivalent to rotating the camera frame to align with world frame)
Projection Transformation
3D to 2D
Orthographic / Perspective projection

After MVP (Model-View-Projection) transformation → normalized device coordinate (NDC)
How to do perspective projection?
"Squish" the frustum into a cuboid: Mpersp→ortho.
Do orthographic projection: Mortho, already known.
"Squish" matrix

Mpersp→ortho=n0000n0000n+f100−nf0
Matrix elements only depend on near and far plane distances.
The third row can be derived by the requirement that z coordinate at near and far planes remains unchanged after transformation.
Notice that the depth mapping is non-linear.
Orthographic projection matrix

Mortho=r−l20000t−b20000−f−n20−r−lr+l−t−bt+b−f−nf+n1
Perspective projection matrix
Mpersp=MorthoMpersp→ortho=r−l2n0000t−b2n00r−lr+lt−bt+b−f−nf+n−100−f−n2fn0
MVP matrix
MMVP=MperspMviewMmodel
Finally transform the NDC space to screen space (another affine transformation).
