Case study: Finding your puppy with image search
Have you ever been in a situation where you found a lost puppy on the street and didn’t know if it had an owner? Using vector search through image processing in Elasticsearch, this task can be as simple as reading a comic strip.
Imagine this scene: On a tumultuous afternoon, Luigi, a small and lively puppy, found himself wandering alone through busy streets after accidentally slipping out of his leash during a walk around Elastic. His desperate owner was searching for him at every corner, calling his name with a voice full of hope and anxiety. Meanwhile, somewhere in the city, an attentive person noticed the puppy with a lost expression and decided to help. Quickly, they took a photo of Luigi and, using the vector image search technology of the company they worked for, began a search in the database hoping to find some clue about the owner of the little runaway.
If you want to follow and execute the code while reading, access the file Python code running on a Jupyter Notebook (Google Collab).
The architecture
We'll solve this problem using a Jupyter Notebook. First, we download the images of the puppies to be registered, and then we install the necessary packages.
*Note: To implement this sample, we will need to create an index in Elasticsearch before populating our vector database with our image data.
- Begin by deploying Elasticsearch (we have a 14-day free trial for you).
- During the process, remember to store the credentials (username, password) to be used in our Python code.
- For simplicity, we will use Python code running on a Jupyter Notebook (Google Colab).
Download the code zip file and install the necessary packages
!git clone https://github.com/salgado/image-search-01.git
!pip -q install Pillow sentence_transformers elasticsearch
Let's create 4 classes to assist us in this task, and they are:
- Util class: responsible for handling preliminary tasks and Elasticsearch index maintenance.
- Dog class: responsible for storing the attributes of our little dogs.
- DogRepository class: responsible for data persistence tasks.
- DogService class: it will be our service layer.
Util class
The Util
class provides utility methods for managing the Elasticsearch index, such as creating and deleting the index.
Methods:
create_index()
: Creates a new index in Elasticsearch.delete_index()
: Deletes an existing index from Elasticsearch.
### Util class
from elasticsearch import Elasticsearch, exceptions as es_exceptions
import getpass
class Util:
@staticmethod
def get_index_name():
return "dog-image-index"
@staticmethod
def get_connection():
es_cloud_id = getpass.getpass('Enter Elastic Cloud ID: ')
es_user = getpass.getpass('Enter cluster username: ')
es_pass = getpass.getpass('Enter cluster password: ')
es = Elasticsearch(cloud_id=es_cloud_id,
basic_auth=(es_user, es_pass)
)
es.info() # should return cluster info
return es
@staticmethod
def create_index(es: Elasticsearch, index_name: str):
# Specify index configuration
index_config = {
"settings": {
"index.refresh_interval": "5s",
"number_of_shards": 1
},
"mappings": {
"properties": {
"image_embedding": {
"type": "dense_vector",
"dims": 512,
"index": True,
"similarity": "cosine"
},
"dog_id": {
"type": "keyword"
},
"breed": {
"type" : "keyword"
},
"image_path" : {
"type" : "keyword"
},
"owner_name" : {
"type" : "keyword"
},
"exif" : {
"properties" : {
"location": {
"type": "geo_point"
},
"date": {
"type": "date"
}
}
}
}
}
}
# Create index
if not es.indices.exists(index=index_name):
index_creation = es.indices.create(index=index_name, ignore=400, body=index_config)
print("index created: ", index_creation)
else:
print("Index already exists.")
@staticmethod
def delete_index(es: Elasticsearch, index_name: str):
# delete index
es.indices.delete(index=index_name, ignore_unavailable=True)
Dog class
The Dog
class represents a dog and its attributes, such as ID, image path, breed, owner name, and image embeddings.
Attributes
dog_id
: The dog's ID.image_path
: The path to the dog's image.breed
: The dog's breed.owner_name
: The dog's owner's name.image_embedding
: The dog's image embedding.
Methods
__init__()
: Initializes a new Dog object.generate_embedding()
: Generates the dog's image embedding.to_dict()
: Converts the Dog object to a dictionary.
import os
from sentence_transformers import SentenceTransformer
from PIL import Image
# domain class
class Dog:
model = SentenceTransformer('clip-ViT-B-32')
def __init__(self, dog_id, image_path, breed, owner_name):
self.dog_id = dog_id
self.image_path = image_path
self.breed = breed
self.image_embedding = None
self.owner_name = owner_name
@staticmethod
def get_embedding(image_path: str):
temp_image = Image.open(image_path)
return Dog.model.encode(temp_image)
def generate_embedding(self):
self.image_embedding = Dog.get_embedding(self.image_path)
def __repr__(self):
return (f"Image(dog_id={self.dog_id}, image_path={self.image_path}, "
f"breed={self.breed}, image_embedding={self.image_embedding}, "
f"owner_name={self.owner_name})")
def to_dict(self):
return {
'dog_id': self.dog_id,
'image_path': self.image_path,
'breed': self.breed,
'image_embedding': self.image_embedding,
'owner_name': self.owner_name
}
DogRepository Class
The DogRepository
class provides methods for persisting and retrieving dog data from Elasticsearch.
Methods
insert()
: Inserts a new dog into Elasticsearch.bulk_insert()
: Inserts multiple dogs into Elasticsearch in bulk.search_by_image()
: Searches for similar dogs by image.
from typing import List, Dict
# persistence layer
class DogRepository:
def __init__(self, es_client: Elasticsearch, index_name: str = "dog-image-index"):
self.es_client = es_client
self._index_name = index_name
Util.create_index(es_client, index_name)
def insert(self, dog: Dog):
dog.generate_embedding()
document = dog.__dict__
self.es_client.index(index=self._index_name, document=document)
def bulk_insert(self, dogs: List[Dog]):
operations = []
for dog in dogs:
operations.append({"index": {"_index": self._index_name}})
operations.append(dog.__dict__)
self.es_client.bulk(body=operations)
def search_by_image(self, image_embedding: List[float]):
field_key = "image_embedding"
knn = {
"field": field_key,
"k": 2,
"num_candidates": 100,
"query_vector": image_embedding,
"boost": 100
}
# The fields to retrieve from the matching documents
fields = ["dog_id", "breed", "owner_name","image_path", "image_embedding"]
try:
resp = self.es_client.search(
index=self._index_name,
body={
"knn": knn,
"_source": fields
},
size=1
)
# Return the search results
return resp
except Exception as e:
print(f"An error occurred: {e}")
return {}
DogService Class
The DogService
class provides business logic for managing dog data, such as inserting and searching for dogs.
Methods
insert_dog()
: Inserts a new dog into Elasticsearch.search_dogs_by_image()
: Searches for similar dogs by image.
# service layer
class DogService:
def __init__(self, dog_repository: DogRepository):
self.dog_repository = dog_repository
def register_dog(self, dog: Dog):
self.dog_repository.insert(dog)
def register_dogs(self, dogs: List[Dog]):
self.dog_repository.bulk_insert(dogs)
def find_dog_by_image(self, image_path: str):
image_embedding = Dog.get_embedding(image_path)
return self.dog_repository.search_by_image(image_embedding)
The classes presented above provide a solid foundation for building a dog data management system. The Util class provides utility methods for managing the Elasticsearch index. The Dog class represents the attributes of a dog. The DogRepository class offers methods for persisting and retrieving dog data from Elasticsearch. The DogService class provides the business logic for efficient dog data management.
The main code
We'll basically have 2 main flows or phases in our code:
- Register the Dogs with basic information and image.
- Perform a search using a new image to find the Dog in the vector database.
Phase 01: Registering the Puppy
To store the information about Luigi and the other company's little dogs, we'll use the Dog class.
For this purpose, let's code the sequence:
Start registering the puppies
# Start a connection
es_db = Util.get_connection()
Util.delete_index(es_db, Util.get_index_name())
# Register one dog
dog_repo = DogRepository(es_db, Util.get_index_name())
dog_service = DogService(dog_repo)
# Visualize the inserted Dog
from IPython.display import display
from IPython.display import Image as ShowImage
filename = "/content/image-search-01/dataset/dogs/Luigi.png"
display(ShowImage(filename=filename, width=300, height=300))
Output
Registering Luigi
dog = Dog('Luigi', filename, 'Jack Russel/Rat Terrier', 'Ully')
dog_service.register_dog(dog)
Registering all the others puppies
import json
# JSON data
data = '''
{
"dogs": [
{"dog_id": "Buddy", "image_path": "", "breed": "Labrador Retriever", "owner_name": "Berlin Red"},
{"dog_id": "Bella", "image_path": "", "breed": "German Shepherd", "owner_name": "Tokyo Blue"},
{"dog_id": "Charlie", "image_path": "", "breed": "Golden Retriever", "owner_name": "Paris Green"},
{"dog_id": "Bigu", "image_path": "", "breed": "Beagle", "owner_name": "Lisbon Yellow"},
{"dog_id": "Max", "image_path": "", "breed": "Bulldog", "owner_name": "Canberra Purple"},
{"dog_id": "Luna", "image_path": "", "breed": "Poodle", "owner_name": "Wellington Brown"},
{"dog_id": "Milo", "image_path": "", "breed": "Rottweiler", "owner_name": "Hanoi Orange"},
{"dog_id": "Ruby", "image_path": "", "breed": "Boxer", "owner_name": "Ottawa Pink"},
{"dog_id": "Oscar", "image_path": "", "breed": "Dachshund", "owner_name": "Kabul Black"},
{"dog_id": "Zoe", "image_path": "", "breed": "Siberian Husky", "owner_name": "Cairo White"}
]
}
'''
# Convert JSON string to Python dictionary
dogs_data = json.loads(data)
# Traverse the list and print dog_id of each dog
image_dogs = "/content/image-search-01/dataset/dogs/"
for dog_info in dogs_data["dogs"]:
dog = Dog(dog_info["dog_id"], image_dogs + dog_info["dog_id"] + ".png" , dog_info["breed"], dog_info["owner_name"])
dog_service.register_dog(dog)
Visualizing the new dogs
# visualize new dogs
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import math
image_dogs = "/content/image-search-01/dataset/dogs/"
num_dogs = len(dogs_data["dogs"])
cols = int(math.sqrt(num_dogs))
rows = int(math.ceil(num_dogs / cols))
# Set the figure size
plt.figure(figsize=(5, 5))
# Loop para exibir as imagens dos cães
for i, dog_info in enumerate(dogs_data["dogs"]):
filename = image_dogs + dog_info["dog_id"] + ".png"
img = mpimg.imread(filename)
plt.subplot(rows, cols, i+1) # (número de linhas, número de colunas, índice do subplot)
plt.imshow(img)
plt.axis('off')
plt.show()
Output
Phase 02: Finding the lost dog
Now that we have all the little dogs registered, let's perform a search. Our developer took this picture of the lost puppy.
filename = "/content/image-search-01/dataset/lost-dogs/lost_dog1.png"
display(ShowImage(filename=filename, width=300, height=300))
Output
Let's see if we find the owner of this cute little puppy?
# find dog by image
result = dog_service.find_dog_by_image(filename)
Get the results
Let's see what we found...
filename = result['hits']['hits'][0]['_source']['image_path']
display(ShowImage(filename=filename, width=300, height=300))
Output
Voilà!! We found it!!!
But who will be the owner and their name?
# Print credentials
print(result['hits']['hits'][0]['_source']['dog_id'])
print(result['hits']['hits'][0]['_source']['breed'])
print(result['hits']['hits'][0]['_source']['owner_name'])
Output
- Luigi
- Jack Russel/Rat Terrier
- Ully
Happy end
We found Luigi !!! Let's notify Ully.
filename = "/content/image-search-01/dataset/lost-dogs/Ully.png"
display(ShowImage(filename=filename, width=300, height=300))
Output
In no time, Ully and Luigi were reunited. The little puppy wagged his tail with pure delight, and Ully hugged him close, promising to never let him out of her sight again. They had been through a whirlwind of emotions, but they were together now, and that was all that mattered. And so, with hearts full of love and joy, Ully and Luigi lived happily ever after.
Conclusion
In this blog post, we have explored how to use vector search to find a lost puppy using Elasticsearch. We have demonstrated how to generate image embeddings for dogs, index them in Elasticsearch, and then search for similar dogs using a query image. This technique can be used to find lost pets, as well as to identify other objects of interest in images.
Vector search is a powerful tool that can be used for a variety of applications. It is particularly well-suited for tasks that require searching for similar objects based on their appearance, such as image retrieval and object recognition.
We hope that this blog post has been informative and that you will find the techniques we have discussed to be useful in your own projects.