Let’s talk coordinate systems.

Yes, I understand that this will involve that most dreaded of four letter words, math. I promise I’ll be gentle, and I’ll try my best to make this as easy to follow as possible.

Some of this may be a little basic; feel free to skip ahead if you already know something I’m about to talk about.

## 2D Coordinates

First, let’s talk about finding the position of a pixel on a computer screen. This is something most of us are pretty familiar with. So if we want to specify the location of the yellow pixel above, well, we’d find the column (“x”) under which the pixel is located, and we’d find the row (“y”). For the yellow pixel above, x = 3, y = 1, and that’s our pixel’s location.

We can generalize this idea. For example, if you want to find the location of a pin on a piece of paper you can use a ruler and measure it’s distance from the left edge, the distance from the top edge, and those two locations (like, for example, x = 1.3 inches, y = 4.5 inches) gives us the location of that point on a piece of paper.

Notice in both cases our position is a measure of distance from some relative location: for our point on a piece of paper our “units” are inches, and the relative location is the upper-left corner of the piece of paper. For our computer display, our “units” are pixels, and it’s all measured relative to the upper-left corner of the screen.

Some terms.

With this in mind, it’s useful to define some terms.

We will call the point from which we’re measuring our point’s relative location the Origin, because this is where our measurements originate. Often we represent the origin in a diagram with a zero.

We will call the relative directions we measure from (horizontally and vertically across the page, across rows and down columns of our display) as our Axis. (In math, an “Axis” is a line that serves to orient ourselves–and it can also be something we rotate around.)

Note: The whole concept of axis in Math is a bit of a rabbit hole, but we’ll ignore it for now. For now we’ll just treat it as “up/down”, “left/right” and “near/far.”

And we’ll introduce some compact notation:

`(x, y)`

This is a compact way for us to represent x and y.

So our yellow pixel is at `(3,1)`, our point on a piece of paper is at `(1.3 inches, 4.5 inches)`. Ususally, however, our units are implied–so we’d write our point on paper as at `(1.3, 4.5)`.

## 3D Coordinates

This whole idea of 2D coordinates can be extended to the world of 3D by considering “depth.” Take our yellow dot above. Relative to our X and Y axes, the object is 4 units along X and 3 units up along Y. (What this means is, looking at the diagram above, if you were to measure from the wall that goes up along the Y-Z plane, the yellow dot is 4 units away from that wall.

From the floor (that is, from the X-Z plane), our yellow dot is 3 units up.

And it’s 5 units away from the X-Y plane.

We would represent the location of our yellow dot as x = 4 units, y = 3 units and z = 5 units, or–more compactly, as

`(4, 3, 5)`

## On the math below.

I think at this point it is fair to say that we’re about to dive into a topic called “linear algebra”, and while I’ll only write out the things that are useful to us, the whole topic of linear algebra contains some interesting stuff that can be useful if you decide to do more interesting things in computer graphics than just draw a handful of pretty pictures with half-understood equations.

And for that, if you want, there is a wonderful series on The essence of linear algebra by Grant Sanderson of 3Blue1Brown is absolutely fantastic and explains all this stuff at a level of detail with pretty pictures that can be understood by nearly anyone.

For our purposes we’ll introduce the stuff we need–but if you want to understand “why”, you’ll want to check out his videos. They’re well done, well thought out and, I think, very easy for anyone to understand.

## Perspective

Now when we talk about computer graphics, what we’re really thinking about is the whole idea of “perspective:” the idea that as things get farther away they appear smaller.

The moon, for example, is 2,150 miles in diameter. A quarter, on the other hand, is slightly less than an inch in diameter. But if you hold up the quarter to the night sky, you can cover the moon with your quarter. That’s perspective: your quarter, being inches from your eye, is a lot closer than the moon, which is 240,000 miles away.

You can think of this by drawing a line from a far away object to your eye: The blue dot is your eye looking at the far away object.

Now suppose you are looking at the object at a computer screen some distance near your eye. The object would appear smaller–because the screen is nearer than the object you are looking at: So how big is that object on the screen?

For convenience sake, let’s consider the distance from the screen to your eye `1 unit`. A little basic geometry gives us an answer: suppose the distance to the object from your eye is Z, the height Y, then the law of similar triangles suggests the height Y’ on the screen would be given by the formula below. This means the size on our screen (that is one unit away) would be Y/Z.

## Perspective and Homogeneous Coordinates

When we talk about perspective, we are talking about Projective Geometry, or the geometric principles of projecting stuff.

Which is what we are doing when we project our object onto a computer screen.

And in 1827, Ferdinand Möbius devised the concept of homogeneous coordinates as a means of representing projective coordinates–the coordinates of things as they are projected somewhere else.

Now the whole topic of homogeneous coordinates, like mathematical axis, is a rabbit hole we can easily spend a lot of time on. But for our purposes, the way this works is as follows:

First, we add a new dimension, `w`, associated with all of our points. Generally we can take a coordinate (x, y, z) and map it into homogeneous coordinates by adding w = 1, which we show as a fourth value in our coordinates.

`(x, y, z) -> (x, y, z, 1)`

And we can map any homogeneous point (x, y, z, w) back to 3D coordinates by:

`(x, y, z, w) -> (x/w, y/w, z/w)`

Now why divide by w and not z? That’s in part a matter of convention, and in part because it works well with our 3D clipping for hand-wavey reasons.

(Basically, `w` serves as a projection in our 3D coordinate system. And when it comes to clipping we want to preserve all the information we can so that we can represent clipping with as much accuracy as possible, and because we may wish to represent things like points on an “infinite” sphere–like stars in the sky–through using ‘w = 0’.)

Why is this interesting?

One thing you can use homogeneous coordinates for is dealing with all your rotations, scaling operations, and translation operations (moving things around in your system) using matrix multiply operations.

Here’s an example. Suppose we have an object at (3, 4, 5)–and we want to represent it moving over by 2 units along X.

Normally we’d do this by addition: `(3, 4, 5) + (2, 0, 0) = (3+2, 4, 5) = (5, 4, 5).`

Matrix multiplication may seem a little more convoluted–but trust me, this will make our lives easier.

## Small rabbit hole: matrix multiplication.

Before we talk about translation with matrix multiplication, let’s define a few terms.

First, for our purposes, a matrix is just a two-dimensional array of numbers. Throughout computer graphics we always use 4×4 matrices, so whenever you hear “matrix” in computer graphics, usually what you should hear is: That is, a 4×4 array of numbers.

Now when we multiply a matrix by a vector, we essentially treat the vector as a 4×1 matrix (if you’re following along by looking stuff up on Wikipedia), and we essentially do the following:

We can see this with the code we use for multiplying matrices, though we store a matrix as a 2D array:

```void G3DVector::multiply(const G3DMatrix &m, const G3DVector &v)
{
x = m.a * v.x + m.a * v.y + m.a * v.z + m.a * v.w;
y = m.a * v.x + m.a * v.y + m.a * v.z + m.a * v.w;
z = m.a * v.x + m.a * v.y + m.a * v.z + m.a * v.w;
w = m.a * v.x + m.a * v.y + m.a * v.z + m.a * v.w;
}```

The ways, whys and wherefores of matrix multiplication aren’t really that important here, though you can read more at the above linked Wikipedia article.

## Back to Translations

Notice something interesting about our matrix multiplication results: they include addition as well as multiplication. We can use this to our advantage by constructing our matrix appropriately.

So back to our point at `(3, 4, 5)`.

If we represent this as a homogeneous coordinate `(3, 4, 5, 1)` (because remember: in general when we go from a point in 3D space to a homogeneous coordinate we append a `w = 1` at the end), then we can construct a matrix to handle translating by x = 2: (Follow along with the animation if you need to convince yourself this is the correct answer.)

Now if we perform the actual multiplications and additions we get our final result `(5, 4, 5, 1)`, and from our rules above, in 3D space, this would be `(5/1, 4/1, 5/1) = (5, 4, 5)`.

We can generalize this by moving through Y and Z, giving us our translation matrix (that is, a matrix which moves our object by a distance in X, Y and Z) as: There is an interesting property of these translation matrices. We can chain them together using matrix multiplication. And it just works out the way we would think: if we have two translation matrices, one that translates by (x,y,z) and another that translates by (a,b,c), if we multiply the two matrices together we get: (If you have to read the article on matrix multiplications and follow along, that’s fine. I can wait.)

This hints at something terribly clever going on here:

With a single matrix you can represent the concatenation of a whole bunch of rotations, scale operations and translations.

And it means we aren’t constantly moving things around the screen a pixel at a time, through a whole chain of “rotate”, “move” and “scale” operations. We simply concat all this into a single matrix through multiplication, and then all points multiplied by that matrix are moved around according to our whole chain of rotations, movements and scale operations.

## Scaling and Rotations and chaining it all together

Scaling an object by (sx,sy,sz)–that is, multiplying each object’s (x,y,z) coordinate by (sx,sy,sz)–has the following matrix representation: And rotation around the X, Y and Z axis looks like: Now we can chain these matrices together by pre-multiplying the matrix. Meaning if we want to first rotate our object around the Y axis by an angle, then translate the whole thing by (tx,ty,tz), we could first multiply our vector by the rotation matrix, then by the translation matrix, multiplying right to left: Or, you know, we could multiply the matrices together first–then multiply by all of our points.

‘Cause like I said above:

With a single matrix you can represent the concatenation of a whole bunch of rotations, scale operations and translations.

This is, by the way, what happens in an OpenGL graphics pipeline.

Display drivers which display 3D in hardware are very good at multiplying 4×4 matrices and 4×4 matrices with vectors. This math allows them to move objects around in real time on your screen.

Of course we’re not going to be moving stuff around in real time with an Arduino, but the same principles apply.

Oh, yeah. So we’ve gone down a rabbit hole using 4×4 matrices to move our objects around in three-dimensional space. But what about displaying objects in perspective?

There are a number of perspective matrices out there which essentially put the z depth into the w column, so at the end we get `(x/z, y/z, ...)`–we then can display our lines using the (x/z, y/z) value at the end.

The perspective matrix I prefer–and you would multiply this as the last multiplication operation before displaying your stuff–is: Notice what we do here: we move the z coordinate into the w column, and the w column into the z column. This has the nice result that we (eventually) divide by z–and all that perspective stuff we did before works very well.

Note: This is not the perspective matrix used by OpenGL. I provide a link above explaining why this is a preferable way to represent perspective.

## A word about computer graphics math

I once worked with a computer graphics library as part of a project a long time ago–it was required by the project manager, even though I could have rolled my own more quickly.

But the manual did make the following observation:

In the end, you either have an image, or you do not.

There are a lot of things that can go wrong on your path towards drawing something. You can accidentally flip the sign of something–and think you’re rendering an object in front of you when it is behind you and invisible. You can accidentally rotate left when you intended to rotate right. You can stack the perspective matrices backwards, or multiply the matrices together wrong.

So here are some helpful hints to keep in mind when we get to the part where we start putting the code together.

Test the translation matrix first.

This will allow you to make sure you haven’t transposed the matrix: that you haven’t flipped the rows and the columns. If you take a point at (1,2,3,1) and multiply it by the transformation matrix for moving an object by x=5, if you get (6,2,3,1) you’re probably on the right path. If you get (1,2,3,6), you’ve transposed your matrix.

Test to make sure you haven’t flipped a sign somewhere.

It’s inevitable, so much so that it’s nearly a joke amongst those involved in the computer graphics industry, that you will inevitably flip the sign of some term somewhere. Look through your code for a “+” where there should be a “-” and visa-versa.

Right handed rule verses left handed rule.

This is sort of related to the last rule.

Take your right hand, and make a “gun” with your fingers–with your thumb pointing up, your pointing finger pointing out. Now your middle finger points at a right angle to your pointing finger, and your last two fingers curl around.

If your thumb is “X”, your pointing finger “Y” and your middle finger “Z”, this is the right-handed coordinate system. And this is the coordinate system our perspective matrix works in. Notice something interesting here: if you rotate your hand so your thumb is pointing horizontally to the right, your pointing finger is pointing up–well, the middle finger is pointing towards you, not away.

This means if you want to move your object out in front of your virtual camera, you must subtract some value from the Z axis.

When you start building your translations for display–say, you want to move a block representing an arm of a robot to the shoulder–don’t be afraid to build each of the translations slowly, using small values. Bump your arm out by 1 unit. Or rotate it by 5 to 10 degrees. Small motions help you visualize if you’re building the transformation stack (what we call all that matrix to matrix mulitplication) correctly.

And start simply: it’s easier to start with an image (even if it’s a boring cube) and make something cool with it, than to start with a blank screen and pull your hair out trying to figure out why.