Long-Term Time-Lapse Experiments — part 2

| tags: photo

As a sequel to part 1, let's look at more... alternative ways of rendering a composite image.

Non-Negative Matrix Factorization

Consider the following problem: we have a set of \(N\) images, of size \(3 \times I_h \times I_w\) — that's \(I_h\) pixels tall, \(I_w\) pixels wide and with 3 color channels. We would like to represent each of our images as a weighted sum of \(K\) basis images, where \(K < N\) and the basis images are in some sense optimal, under the constraint that both the weights and the basis images are nonnegative.

(Note that this particular problem does not make all that much physical sense. More reasonable applications can be found in, e.g., decomposing approximatively additive mixtures of multiple sound sources.)

By reshaping our images into single \(3 I_h I_w\)-element column vectors, we can formulate the above as a matrix factorization problem, \[ \mathbf{V} \approx \mathbf{W} \mathbf{H}, \] where \(\mathbf{V}\) is the \(3 I_h I_w \times N\) matrix of original images, \(\mathbf{W}\) the \(3 I_h I_w \times K\) dictionary matrix of vectors representing the basis images, and \(\mathbf{H}\) the \(K \times N\) matrix of weights for each image. Typically, we then solve for \(\mathbf{W}\) and \(\mathbf{H}\) so that they minimize the Kullback-Leibler divergence between \(\mathbf{V}\) and \(\mathbf{W} \mathbf{H}\).

For more details, you may start from the Non-negative matrix factorization Wikipedia article. There are also some similarities to the concept of eigenfaces, often applied in facial recognition systems. Eigenfaces do not have the non-negativity restrictions, however.

What I have rendered here are the basis images for a rank-4 (i.e., \(K=4\)) approximation of \(N=32\) frames from "view #1". The outcome depends a lot on the (random) initialization; two representative samples are shown below. The nimfa Python library was used to do the hard part.

NMF basis images, run #1.

NMF basis images, run #2.

Some common points can be found in both sets of images. For example, both sets have a component image (or two) approximately corresponding to a sunny day, but also images with a "negative sun" in the same location. When approximating an overcast day using these images, the two suns more or less cancel out.

Hierarchical Clustering

Another machine-learning inspired way (can you guess my day job yet?) of looking at our image set is to apply a hierarchical clustering algorithm on it. For the following image, I've used the following, very simple, greedy agglomerative clustering algorithm, restricted to produce a full binary tree from \(2^N\) source images — in this case, \(2^5=32\).

Start with the set of \(2^N\) clusters of individual images. Repeatedly, find the pair of clusters with the smallest distance between them, group them to a single cluster and remove them from the set. When the set is empty, reinitialize it with the \(2^{N-1}\) pairs that were just generated. Continue the process until all images are in a single cluster.

There are several valid ways to define distances between images and clusters. Here, distance between two images was defined as the mean Euclidean distance between corresponding pixels when considered as vectors in \(\mathbb{R}^3\), matching the distances used for the TSP sorting in part 1. The distance between two clusters was defined as the mean distance over all pairwise image distances (mean linkage clustering, UPGMA).

Hierarchical clustering of 32 frames.

Bottom row contains all 32 original source images. For clusters higher up in the hierarchy, the average over all cluster members is shown. The clustering does manage to group "snow" and "not snow" rather nicely. (Admittedly, this is not a difficult task.)

Most Unique Pixels

Going on to less practical visualizations — and that's saying a lot — we can have a look at the composite image formed by selecting pixels from the source image that is, for that pixel, furthest away from the overall mean image. Here, we show both the per-pixel decisions as well as the result of smoothing the distances with a 15-pixel Gaussian blur. This could be considered an opposite of sorts for the commonly used practice of selecting the median pixel in order to, e.g., subtract people from photographs.

(I have heard it said that this looks like a 1980s album cover.)

Pixels most distant from the mean, individually.

Pixels most distant from the mean, smoothed.

Variance Across Overlapping Images

Of course, we can also look at simpler things. Here are the sample variances (for each color channel) across the corresponding pixels from each frame.

Per-pixel sample variance, view #1.

Per-pixel sample variance, view #2.

Histogram Equalization

As a purely stylistic choice, we can also apply some histogram equalization to the simple average images, to make them (arguably) "more otherworldly". Samples follow.

Histogram equalization for luminance, view #1.

Histogram equalization for all color channels, view #1.

Histogram equalization for luminance, view #2.

Histogram equalization for all color channels, view #2.