For every JPG or PNG image in a directory, we can embed its
associated JSON file (output from gallery-dl’s –write-metadata flag)
into the EXIF field UserComment
.
import os
from PIL import Image, ExifTags
import json
image_directory = '/'
for filename in os.listdir(image_directory):
json_filename = f'{filename}.json'
image_path = os.path.join(image_directory, filename)
json_path = os.path.join(image_directory, json_filename)
if filename.lower().endswith(('.png', '.jpg', '.jpeg')) and os.path.isfile(json_path):
d = {}
with open(json_path, mode='r') as f:
data = json.load(f)
d['download_url'] = data.get('url', '')
d['origin_url'] = data.get('link', '')
d['auto_alt_text'] = data.get('auto_alt_text', '')
d['created_at'] = data.get('created_at', '')
d['description'] = data.get('description', '')
d['grid_title'] = data.get('grid_title', '')
d['dominant_color'] = data.get('dominant_color', '')
with Image.open(image_path) as img:
exif = img.getexif()
exif[ExifTags.Base.UserComment] = json.dumps(d).encode()
new_image_path = os.path.join(image_directory, f'modified_{filename}')
img.save(new_image_path, exif=exif)
To extract the embedded data, just to a regular dict lookup and decode the bytes.
exif[ExifTags.Base.UserComment].decode()
Remarks:
img.save()
alone will alter an image’s md5sum. The snippet
below demonstrates this. I figure PIL (pillow) must be altering the file
structure somehow.from PIL import Image
import hashlib
image_path = '/home/user/Desktop/temp/flickr.jpg'
def get_md5sum(filename):
return hashlib.md5(open(filename,'rb').read()).hexdigest()
print(f'md5sum, initial: {get_md5sum(image_path)}')
with Image.open(image_path) as img:
img.save(image_path) # save() will alter the md5sum
print(f'md5sum, final: {get_md5sum(image_path)}')
Is this expected? Not really, because
cp flickr1.jpg flickr2.jpg
will result in the flickr1.jpg
and flickr2.jpg having the save md5sum. So what is PIL doing? For the
first few iterations of this script, PIL is modifying 4 bytes in the
image. After several runs, the byte count of both images will start to
differ and the differences begin to compound somehow.
cmp -l -b flickr1.jpg flickr2.jpg
byte val1 val2
237,697 117 77
237,700 57 63
294,748 317 313
294,751 214 215
Why is this happening? We will have to ask the PIL devs.