FFMPEG Fun

Published: Friday, January 05, 2024

After writing the script to create an image collage using Python and its OpenCV2 library, I got curious about using FFMPEG to achieve the same results. The Python script works, but it's kinda slow. FFMPEG is written in C, so it should be a lot faster. Digging into the topic, I found a FFMPEG "filter" called xstack which operates on a concept simple to understand, but somewhat onerous to type out. For example, if you want to generate a grid that's 16x16, you'd have to type out,

xstack=inputs=16:layout=0_0|0_h0|0_h0+h1|0_h0+h1+h2|w0_0|w0_h0|w0_h0+h1|w0_h0+h1+h2|w0+w4_0| w0+w4_h0|w0+w4_h0+h1|w0+w4_h0+h1+h2|w0+w4+w8_0|w0+w4+w8_h0|w0+w4+w8_h0+h1|w0+w4+w8_h0+h1+h2

which represents,

input1(0, 0)       | input5(w0, 0)       | input9 (w0+w4, 0)       | input13(w0+w4+w8, 0)
input2(0, h0)      | input6(w0, h0)      | input10(w0+w4, h0)      | input14(w0+w4+w8, h0)
input3(0, h0+h1)   | input7(w0, h0+h1)   | input11(w0+w4, h0+h1)   | input15(w0+w4+w8, h0+h1)
input4(0, h0+h1+h2)| input8(w0, h0+h1+h2)| input12(w0+w4, h0+h1+h2)| input16(w0+w4+w8, h0+h1+h2)

There is also a grid option, which is a little simpler to type out. I don't know which one is better, so let's start with xstack and go from there.

Starting with a directory with a bunch of random images, it'd be nice to make them easy to refer to in the cli. We can enumerate all files ending with .jpg so they become 001.jpg, 002.jpg, etc. using,

find -name "*.jpg" | cat -n | while read n f; do mv -n "$f" `printf "%03d.jpg" "$n"`; done

Note: mv's -n flag is used to avoid overwriting existing files.

So after experimenting for a while, I ended up with the following xstack solution, which I think is pretty nifty.

ffmpeg -i 2.png -i 001.jpg -i 003.jpg -filter_complex "[0:v] scale=-1:800 [o1]; [1:v] scale=400:-1 [o2]; [2:v] scale=400:-1 [o3]; [o1][o2][o3] xstack=inputs=3:layout=0_0|w0_0|w0_h1:fill=black [o_final1]; [o_final1] scale=600:-1 [o_final2]" -map "[o_final2]" output.jpg

ffmpeg -i 2.png -i 001.jpg -i 003.jpg -filter_complex "[0:v] scale=-1:800 [o1]; [1:v] scale=400:-1 [o2]; [2:v] scale=400:-1 [o3]; [o1][o2][o3] xstack=inputs=3:layout=0_0|w0_0|w0_h1:fill=black [o_final1]; [o_final1] scale=600:-1 [o_final2]" -map "[o_final2]" output.jpg

  • ffmpeg -i 2.png -i 001.jpg -i 003.jpg
    • selects the three images we want to combine. Notice how they can be different types of images.
  • -filter_complex sets up a "filtergraph" which allows us to do a bunch of cool stuff.
  • [0:v], [1:v], and [2:v] are labels for our image inputs. ffmpeg uses square brackets as labels, and in this case, they are generated automatically.
  • [0:v] scale=-1:800 [o1] takes the first image, 2.png, sets its height to 800, and maintains its aspect ratio by using -1 as the scale's width parameter. This scaled image is then labeled as [o1].
  • We do the same thing with the second and third image, then tell xstack we are going to "stack" these resized images with [o1][o2][o3] xstack=inputs=3
  • layout=0_0|w0_0|w0_h1 means the first image's upper left corner gets placed at (0, 0), and the second and third image will form a column beside the first image.
  • fill=black [o_final1] tells ffmpeg the uncovered background should be black, and we label this collage as [o_final1] for later use.
  • With [o_final1] scale=600:-1 [o_final2] we apply a scaling on the final image to have a max width of 600 pixels.
  • -map "[o_final2]" output.jpg tells ffmpeg to take [o_final2] and save it as output.png

Easy, right?

Here are the three images (not to scale).

Here is what the ffmpeg pipeline does with these three images (not to scale).

 

How about doing something similar with video?

ffmpeg -i 111.webm -i 222.webm -i 333.webm -filter_complex "[0:v] scale=500:-1 [o1]; [1:v] scale=500:-1 [o2]; [2:v] scale=500:-1 [o3]; [o1][o2][o3] xstack=inputs=3:layout=0_0|w0_250|0_h0:fill=black [o_final]" -map [o_final] -c:v libvpx-vp9 -b:v 500k -fs 3M output.webm

ffmpeg -i 111.webm -i 222.webm -i 333.webm -filter_complex "[0:v] scale=500:-1 [o1]; [1:v] scale=500:-1 [o2]; [2:v] scale=500:-1 [o3]; [o1][o2][o3] xstack=inputs=3:layout=0_0|w0_250|0_h0:fill=black [o_final]" -map [o_final] -c:v libvpx-vp9 -b:v 500k -fs 3M output.webm

This command is pretty much the same as the image collage, but here we,

  • Specify a codec, VP9, for the final video output with -c:v libvpx-vp9.
  • We also turn the bit rate way down to 500k to save on file size with -b:v 500k.
  • Lastly, we use -fs 3M to set a weakly-enforced file size on the file.

Here is an example of the output.

 

Attached Files:
output.webm
Comment
Optional
No comments yet...