Ayase Quart, Image Search

Published: Wednesday, September 17, 2025

Ayase Quart aka AQ has search plugins, and I wrote one to search for images. Images are first auto-tagged, and stored in a sqlite database. Here’s what we have.

  • sqlite version: 3.47.0
  • number of images tagged: 518,822
  • number of (image, tag) pairs: 12,151,375
  • number of unique tags: 10.6k
  • average tags per image: 23.42

I’ve put in a few days worth of effort into getting faster image search results at https://ayasequart.org/fts. Previously, the whole server-side request-response process (which includes page rendering) was always taking 0.1 - 1.2s, with image search being nearly all of that. Now it’s always below 0.05s.

Search results by themselves have been reduced to ~0.005s. That’s 100x faster!

The secret sauce relied on many ingredients. A better schema, proper indicies, but also a better query structure. I was originally doing,

-- GROUP-BY

select image.num
from image
    join image_tag using (image_id)
where image.board = 'g'
    and image.general >= 0.1
    and image_tag.tag_id in (9173,4)
    and image_tag.prob >= 0.4
group by image.image_id
having count(*) = 2
limit 250;

-- QUERY PLAN
-- |--SEARCH image_tag USING COVERING INDEX idx_image_tag_tag_prob_image (tag_id=? AND prob>?)
-- |--SEARCH image USING INTEGER PRIMARY KEY (rowid=?)
-- |--USE TEMP B-TREE FOR GROUP BY

USE TEMP B-TREE FOR GROUP BY led to relatively bad performance.

I searched StackOverflow and came across a few query structures for this problem. The best structure I tested used N joins, given N tag ids. This was 6) Sean’s solution in the linked SA answer.

-- MULTI-JOIN

select image.num
from image
    join image_tag t0 using (image_id)
    join image_tag t1 using (image_id)
where image.board = 'g'
    and image.general >= 0.1
    and t0.tag_id = 9173 and t0.prob >= 0.4
    and t1.tag_id = 4 and t1.prob >= 0.4
limit 250;

-- QUERY PLAN
-- |--SEARCH t0 USING COVERING INDEX idx_image_tag_tag_prob_image (tag_id=? AND prob>?)
-- |--SEARCH image USING INTEGER PRIMARY KEY (rowid=?)
-- |--SEARCH t1 USING INDEX sqlite_autoindex_image_tag_1 (image_id=? AND tag_id=?)

Extra performance was also obtained by joining less common tags first. As you’ll see, tag frequency has a large impact on performance. There is a large difference between tags that appear in 0, 1, 10k and 100k images because of the idx_image_tag_tag_prob_image (tag_id, prob) index. I’m not sure if there is way to make this faster.

The second best structure was 4) Derek,

-- IMAGE-IN

select image.num
from image
where board = 'g'
    and general >= 0.1
    and image_id in (select image_id from image_tag where tag_id = 9173 and prob >= 0.4)
    and image_id in (select image_id from image_tag where tag_id = 4 and prob >= 0.4)
limit 250;

-- QUERY PLAN
-- |--SEARCH image USING INTEGER PRIMARY KEY (rowid=?)
-- |--LIST SUBQUERY 1
-- |  --SEARCH image_tag USING COVERING INDEX idx_image_tag_tag_prob_image (tag_id=? AND prob>?)
-- |--LIST SUBQUERY 2
-- |   --SEARCH image_tag USING COVERING INDEX idx_image_tag_tag_prob_image (tag_id=? AND prob>?)

The worst structure, 5) Erwin 2, used exists statments in the where clause,

-- EXISTS

select image.num
from image
where image.board = 'g'
  and image.general >= 0.1
  and image.num is not null
  and exists (
        select 1
        from image_tag it
        where it.image_id = image.image_id
          and it.tag_id = 4
          and it.prob >= 0.4
  )
  and exists (
        select 1
        from image_tag it
        where it.image_id = image.image_id
          and it.tag_id = 9173
          and it.prob >= 0.4
  )
limit 250;

-- QUERY PLAN
-- |--SEARCH image USING INDEX idx_image_board_general_image_id (board=? AND general>?)
-- |--CORRELATED SCALAR SUBQUERY 1
-- |  --SEARCH it USING INDEX sqlite_autoindex_image_tag_1 (image_id=? AND tag_id=?)
-- |--CORRELATED SCALAR SUBQUERY 2
-- |   --SEARCH it USING INDEX sqlite_autoindex_image_tag_1 (image_id=? AND tag_id=?)

Here are the results for the queries above. 1girl, tag id 4, is the most common tag in my database, appearing in 184k images, meanwhile tohru_(maidragon), tag id 9173, only appears in 540 images.

group_by    sum_of_all_min_times=0.1303   wins=0
join        sum_of_all_min_times=0.0011   wins=1
exists      sum_of_all_min_times=0.1747   wins=0
image_in    sum_of_all_min_times=0.0516   wins=0

Here are the other tag combinations. ‘high’, ‘mid’, and ‘low’ crudely refer to a tags frequency in the db. The rightmost column is the number of results obtained from the query.

per_page=250
tag_prob=0.4
rating_general=0.1
board='g'

1high_1mid          group_by   tag_count=2    min=0.1294   avg=0.1327   max=0.1414   225
1high_1mid          join       tag_count=2    min=0.001    avg=0.0011   max=0.0012   225
1high_1mid          exists     tag_count=2    min=0.1725   avg=0.1742   max=0.1754   225
1high_1mid          image_in   tag_count=2    min=0.0506   avg=0.0516   max=0.053    225
3high               group_by   tag_count=3    min=0.3625   avg=0.3656   max=0.3688   98
3high               join       tag_count=3    min=0.0011   avg=0.0012   max=0.0014   98
3high               exists     tag_count=3    min=0.174    avg=0.1755   max=0.1777   98
3high               image_in   tag_count=3    min=0.1372   avg=0.1376   max=0.1385   98
1mid                group_by   tag_count=3    min=0.3585   avg=0.3602   max=0.3625   98
1mid                join       tag_count=3    min=0.0012   avg=0.0012   max=0.0013   98
1mid                exists     tag_count=3    min=0.1751   avg=0.1769   max=0.1799   98
1mid                image_in   tag_count=3    min=0.1376   avg=0.1391   max=0.1428   98
1low                group_by   tag_count=1    min=0.0      avg=0.0      max=0.0001   3
1low                join       tag_count=1    min=0.0      avg=0.0      max=0.0      3
1low                exists     tag_count=1    min=0.1742   avg=0.1772   max=0.1819   3
1low                image_in   tag_count=1    min=0.0      avg=0.0      max=0.0001   3
2low                group_by   tag_count=2    min=0.0001   avg=0.0002   max=0.0006   0
2low                join       tag_count=2    min=0.0001   avg=0.0001   max=0.0002   0
2low                exists     tag_count=2    min=0.1707   avg=0.1724   max=0.1733   0
2low                image_in   tag_count=2    min=0.0001   avg=0.0002   max=0.0004   0
1high_1low          group_by   tag_count=2    min=0.2339   avg=0.2344   max=0.2357   24
1high_1low          join       tag_count=2    min=0.0      avg=0.0001   max=0.0003   24
1high_1low          exists     tag_count=2    min=0.1707   avg=0.1741   max=0.1765   24
1high_1low          image_in   tag_count=2    min=0.0825   avg=0.0832   max=0.0839   24
2high_1mid          group_by   tag_count=3    min=0.3619   avg=0.3636   max=0.366    250
2high_1mid          join       tag_count=3    min=0.0028   avg=0.003    max=0.0033   250
2high_1mid          exists     tag_count=3    min=0.1289   avg=0.1293   max=0.1299   250
2high_1mid          image_in   tag_count=3    min=0.1362   avg=0.1373   max=0.138    250
1high_1mid_1low     group_by   tag_count=3    min=0.1297   avg=0.1309   max=0.1318   0
1high_1mid_1low     join       tag_count=3    min=0.0      avg=0.0001   max=0.0002   0
1high_1mid_1low     exists     tag_count=3    min=0.172    avg=0.1733   max=0.1745   0
1high_1mid_1low     image_in   tag_count=3    min=0.0004   avg=0.0004   max=0.0006   0
2high               group_by   tag_count=2    min=0.3421   avg=0.3465   max=0.3498   250
2high               join       tag_count=2    min=0.0022   avg=0.0023   max=0.0024   250
2high               exists     tag_count=2    min=0.003    avg=0.0032   max=0.0036   250
2high               image_in   tag_count=2    min=0.1259   avg=0.1283   max=0.1294   250
4high               group_by   tag_count=4    min=0.5133   avg=0.5143   max=0.5154   250
4high               join       tag_count=4    min=0.0287   avg=0.0294   max=0.0304   250
4high               exists     tag_count=4    min=0.021    avg=0.0216   max=0.0225   250
4high               image_in   tag_count=4    min=0.1912   avg=0.1916   max=0.1926   250
6high               group_by   tag_count=6    min=0.7655   avg=0.7737   max=0.7797   250
6high               join       tag_count=6    min=0.1519   avg=0.1574   max=0.1629   250
6high               exists     tag_count=6    min=0.2394   avg=0.2402   max=0.2418   250
6high               image_in   tag_count=6    min=0.2833   avg=0.2861   max=0.2903   250
14high              group_by   tag_count=14   min=1.2962   avg=1.3005   max=1.305    3
14high              join       tag_count=14   min=0.1844   avg=0.1855   max=0.1878   3
14high              exists     tag_count=14   min=0.264    avg=0.265    max=0.2659   3
14high              image_in   tag_count=14   min=0.4708   avg=0.473    max=0.4741   3

The multi-join query structure is clearly the winner across a variety of tag searches being made.

group_by    sum_of_all_min_times=4.4931   wins=2
join        sum_of_all_min_times=0.3734   wins=9
exists      sum_of_all_min_times=1.8655   wins=1
image_in    sum_of_all_min_times=1.6158   wins=0

Note that sqlite has a limit of 64 joins.

Comment
Optional
No comments yet...