Apache Lucene 9.9, the fastest Lucene release ever

Apache Lucene development has always been vibrant, but the last few months have seen an especially high number of optimizations to query evaluation. There isn't one optimization that can be singled out, it's rather a combination of many improvements around mechanical sympathy and improved algorithms.

What is especially interesting here is that these optimizations do not only benefit some very specific cases, they translate into actual speedups in Lucene's nightly benchmarks, which aim at tracking the performance of queries that are representative of the real world. Just hover on annotations to see where a speedup (or slowdown sometimes!) is coming from. By the way, special thanks to Mike McCandless for maintaining Lucene's nightly benchmarks on his own time and hardware for almost 13 years now!

Here are some speedups that nightly benchmarks observed between Lucene 9.6 (May 2023) and Lucene 9.9 (December 2023):

AndHighHigh: 35% faster
AndHighMed: 15% faster
OrHighHigh: 60% faster
OrHighMed: 38% faster
CountAndHighHigh: 15% faster
CountAndHighMed: 11% faster
CountOrHighHigh: 145% faster
CountOrHighMed: 155% faster
TermDTSort: 24% faster
TermTitleSort: 290% faster (not a typo!)
TermMonthSort: 7% faster
DayOfYearSort: 25% faster
VectorSearch: 5% faster

In case you are curious about these changes, here are resources that describe some of the optimizations that we applied:

Bringing speedups to top-k queries with many and/or high-frequency terms (annotation FK)
More skipping with block-max MAXSCORE (annotation FU)
Accelerating vector search with SIMD instructions
Vector similarity computations FMA-style

Lucene 9.9 was just released and is expected to be integrated into Elasticsearch 8.12, which should get released soon. Stay tuned!

Ready to try this out on your own? Start a free trial.
Elasticsearch and Lucene offer strong vector database and search capabilities. Dive into our sample notebooks to learn more.

Apache Lucene 9.9, the fastest Lucene release ever

Elasticsearch vs. OpenSearch: Vector Search Performance Comparison

Understanding Int4 scalar quantization in Lucene

Making Lucene Faster with Vectorization and FFI/madvise

Speeding Up Multi-graph Vector Search

Optimizing vector distance computations with the Foreign Function & Memory (FFM) API