CBIR (Content Base Image Retrieval) using SIFTpack Saliency maps

Yonatan Dishon

Technion

ydishon at tx.technion.ac.il

Lihi Zelnik-Manor

Technion

lihi at ee.technion.ac.il

Based on: "SIFTpack: a compact representation for efficient SIFT matching", Gilinsky & Zelnik-Manor, ICCV'13


Abstract

CBIR is an interesting and important field of research
As in context based search of documents the goal is:
Given a query image to retrieve similar images using the assumed main interest points (or main story) of the images in the database.

In [1] Gilinsky & Zelnik-Manor offer a compact "Image-like" representation of the popular SIFT descriptor, making it less redundant and saving memory space.
In the paper the authors suggest to apply algorithms made for images on the representation.

In this project we aimed to test the performance of applying Image Saliency [2] on the SIFTpack representation and preforming a CBIR task on a challenging scene database.


Project Details

In this Project we investigate the saliency of features in the feature space when arranged in image like representation. We noticed high correlation between the saliency of areas in the image space and feature space yielding similar saliency maps.

The objective was to see if the distinct features in the feature space are also the salient regions in the image space. i.e. if the most unique features (outliers in the feature space) are in the essence of the image understanding. Many algorithms in CV rely on some sort of BOW of features or sampling and quantization of the feature space to lower dimensionality. Usually applying K-means or other clustering algorithms to find the center of masses of the image. We claim that by doing so, the distinct features in the feature space are ignored, and consequently much of the important information of the image is gone. At first the project involved changing the algorithm in [2] to be able to compute saliency in higher dimensions, extending the algorithm in [1] to apply on finer steps (dense) and applying a similar representation for the very popular HOG descriptor for recognition.

We continued to research for applications for the saliency of the feature space. Finally applying it on a retrieval task similar to one of the tasks evaluated in [3]. We used the same dataset (see link below) as used in the original paper. We changed their algorithm for ranking histograms to be able to take into account weighted histograms - higher importance of some features/areas in the image over others resulting some interesting results. Obviously we didn't manage to improve their performance on the given task but it seems from the results that the saliency in the feature space can be useful for recognition and detection.


Results

Qualitative:

Saliency Maps:

Figure 1: Saliency examples from the 15 scenes dataset


Image Retrieval using Spatial Pyramid with Saliency
(Top-5 results from 15 Scenes Dataset):

You can see on Figure 2 how the Saliency based methods (SIFTpack,HOGpack & Image Saliency) retrieve the inner structure of the image
- see how the images are very similar in their construction

Note that the Query Image is on the Left and best 5-NN images according to the Selected method used are on the right

Figure 2: comparison between 4 different methods for Image Retrieval

Unsuccessful Retrieval

- yet we can understand why the saliency based methods choose those images (notice the emerged structure)

Figure 3 comparison between 4 different methods for Image Retrieval


Quantitative evaluation:

Figure 4: Number of errors in each Class vs. Method used (15 scenes dataset)

Figure 5 Confusion map of Spatial Pyramid (unweighted) between scene classes (15 scenes dataset)

Figure 6 Confusion map of SIFTpack between scene classes (15 scenes dataset)

Figure 7 Confusion map of HOGpack between scene classes (15 scenes dataset)

Figure 8 Confusion map of Image Saliency between scene classes (15 scenes dataset)

Spatial Pyramid SIFTpack Saliency Image Saliency HOGpack Saliency
Success Rate in Retrieval Task 77.5% 67% 48.7% 63.6%

Table 1 Success rate vs. Retrieval Method on the 15 scenes dataset


Implementation Details

There are several independent implementations and adaptations done in this project:

	1. Saliency Code was modified to support vector spaces higher than 3. 
	2. SIFTpack Algorithm was tweaked to support higher resolution sampling (below 4 pixels).
	3. Spatial Pyramid code was changed to support weighted histogram (To include Importancy of features).
	4. HOGpack representation was implemented (similar to the SIFTpack) in order to compare the performance of the representations.

Links

  • The 15 Scenes Dataset including 4485 images of 15 different scenes.

  • The SIFTpack Page.

  • The Distinct Saliency Page.

  • Matlab Code

    Matlab code can be available for research and commercial use.
    please contact Hovav Gazit for details.


    References

    • [1] Alexandra Gilinsky & Lihi Zelnik-Manor "SIFTpack: a compact representation for efficient SIFT matching", ICCV'13
    • [2] Ran Margolin, Ayellet Tal & Lihi Zelnik-Manor "What Makes a Patch Distinct?", CVPR'13
    • [3] Svetlana Lazebnik et.el "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories", CVPR'06