Colour Spaces and Transformations
RGB is the most familiar way to represent colour in digital images, but it is far from the only one. Different colour spaces encode the same visual information in different ways and choosing the right colour space for a task can make the difference between an algorithm that works and one that struggles. This lesson covers the most important colour spaces in computer vision and explains when and why to use each.
What is a Colour Space?
A colour space is a mathematical model that describes how colours can be represented as tuples of numbers. Each colour space defines a coordinate system where any representable colour corresponds to a specific point.
You already know RGB: each colour is a point in a three-dimensional space defined by red, green and blue axes. But there are many other ways to decompose colour information, each with its own strengths.
RGB: The Default
Red, Green, Blue is the native colour space of most cameras and displays. It is intuitive (the three channels correspond directly to physical properties of light and display hardware) but it has a significant weakness: the three channels are highly correlated.
In a bright image, all three channels tend to be high. In a dark image, all three tend to be low. This correlation makes it harder to separate colour information from brightness information, which matters enormously for many vision tasks.
Grayscale Conversion
Converting a colour image to grayscale collapses the three RGB channels into a single channel representing perceived brightness. The formula used by most image processing libraries is:
Grayscale = 0.299 × R + 0.587 × G + 0.114 × B
These weights are not arbitrary. Human vision is most sensitive to green light, less sensitive to red and least sensitive to blue. The weights reflect this perceptual sensitivity so that the resulting grayscale image looks natural to human eyes.
Grayscale images are useful when colour is not relevant to the task (counting objects, measuring shapes, detecting edges) and they halve or reduce the data a model needs to process.
HSV: Hue, Saturation, Value
HSV separates colour information into three more meaningful components:
- Hue (H): the pure colour, expressed as an angle from 0 to 360 degrees (red ≈ 0°, green ≈ 120°, blue ≈ 240°)
- Saturation (S): how vivid the colour is, from 0 (grey) to 1 (fully saturated)
- Value (V): the brightness, from 0 (black) to 1 (full brightness)
HSV is extremely useful for tasks that involve selecting objects by colour. Suppose you want to detect a red ball in a video. In RGB, "red" spans a wide and inconsistent region of the colour space because brightness and lighting conditions cause the RGB values to vary wildly. In HSV, red corresponds to a narrow range of hue values regardless of brightness or saturation. This makes colour-based segmentation far more reliable.
Classic use cases for HSV:
- Detecting traffic lights by colour
- Skin detection for gesture recognition
- Segmenting objects with a known colour in robotics
YCbCr: Luminance and Chrominance
YCbCr separates an image into:
- Y: luminance (brightness): essentially a weighted grayscale version of the image
- Cb: blue-difference chrominance: how blue the colour is relative to neutral
- Cr: red-difference chrominance: how red the colour is relative to neutral
YCbCr is the colour space used internally by JPEG compression, digital video standards and many broadcast systems. The key insight that makes it useful is that the human visual system is more sensitive to variations in brightness than to variations in colour. YCbCr exploits this by allocating more precision to the Y channel and compressing the Cb and Cr channels more aggressively: you lose some colour detail but the image still looks sharp to human eyes.
In computer vision, YCbCr is sometimes used for skin tone detection. Skin colour, across a wide range of ethnicities, occupies a surprisingly compact region in the Cb-Cr plane, making it easier to detect than in RGB.
LAB: Perceptually Uniform Colour Space
The LAB (or Lab*) colour space was designed to be perceptually uniform: equal numerical distances between two colours in LAB space correspond to equal perceived differences by the human eye.
- L: lightness, from 0 (black) to 100 (white)
- a: position on a green-to-red axis
- b: position on a blue-to-yellow axis
LAB is particularly useful when you need to measure how similar or different two colours appear to a human observer or when you want to apply colour corrections that look natural. It is also used in some colour-based image segmentation and transfer algorithms.
Histogram Equalisation
A histogram of an image plots the distribution of pixel intensity values: how many pixels have each brightness level from 0 to 255. In a well-exposed image, the histogram is spread across the full range. In an underexposed or overexposed image, the histogram is compressed into a narrow region.
Histogram equalisation is a technique that redistributes pixel values to flatten the histogram, spreading intensity values more evenly across the full range. The result is an image with improved contrast: detail that was hidden in dark or bright regions becomes visible.
Histogram equalisation is applied to the luminance (Y or L) channel when working in YCbCr or LAB, not to the colour channels, to avoid shifting the perceived colour of the image.
A more sophisticated variant called CLAHE (Contrast Limited Adaptive Histogram Equalisation) applies equalisation locally to small regions of the image rather than globally, preventing noise amplification and producing more natural-looking results. CLAHE is widely used in medical imaging.
Choosing the Right Colour Space
| Task | Recommended space |
|---|---|
| Colour-based object detection | HSV |
| Edge detection, shape analysis | Grayscale |
| Compression, video | YCbCr |
| Colour similarity measurement | LAB |
| General neural network input | RGB (normalised) |
| Medical imaging contrast | Grayscale + CLAHE |
Converting between colour spaces is inexpensive computationally and a routine part of building vision pipelines. The key habit to develop is asking: does the colour space I am using make my task easier or am I working against it?
Quiz: Why is HSV preferred over RGB for detecting a coloured object under varying lighting conditions? What does histogram equalisation do to an image and why is it applied to the luminance channel rather than all three channels?