Image Retrieval

Image retrieval using pLSA

The dataset used is first 1000 images of MIRFLICKR-25000 Link
It is an unsupervised learning algorithm by Probabilistic Latent Semantic Analysis. Check the paper for more.

Coarse color histogram quantized to 1000 levels is used as a feature. The 1000 levels used are here

pLSA can be also applied to text, example it can cluster a series of articles into various topics like sports, politics, movies etc.

In this application the first 1000 images are used to learn the topic model based on the coarse color histogram. Later, the learnt model is used to retrieve k most similar images. Similar image means, which are similar by topics. For this application 6 topics are chosen.

We can see from the learnt topic model below that each topic has some pattern.
1. Topic 1 has brown shades as dominant color.
2. Topic 2 has shades of light-ish green and some more assorted ones.
3. Topic 3 has shades of orange ? as dominant color. ( I think its orange)
4. Topic 4 has shades of blue as dominant color.
5. Topic 5 has shades of grey as dominant color and some dark colors. (No pun(s) intended ;)
6. Topic 6 has shades of green as dominant color.