Visual Search Engines
Kevin Curran, Computer Lecturer - Magee College
Search technology may be the foundation of the Internet, but anyone who has ever tried conducting a search for rich media content can vouch for the ineffectiveness of text based searches.
As Web site content grows exponentially in non-textual types of information, it is apparent that text-based search engines are becoming less equipped to provide good results. In response to this deficit, several university labs and commercial software companies have developed tools that allow a visual search of images and products. With visual search, a user can make selections based on images rather than text.
Most of these systems operate in a similar way: the user performs a query by choosing an image which is somewhat similar to the desired images and then the engine does a pattern recognition search using global/local comparisons of color, shape or texture. So for example, you find a sample of a sunset and then ask the search engine to find images with similar red and gold colors.
This approach works when the entire image scene is distinctive and relevant, but it gives a lot of obviously wrong results for complex images or large databases.
There is a visual search engine (VSE) toolkit that implements an object-based approach to improve the accuracy of visual search results. The Java-based toolkit is by eVision and is available as a free download. This can be used to produce basic visual search applications using just the high-level API.
The approach that eVision takes is to:
1) treat photographs as a collection of objects, rather than as one big undifferentiated image and
2) make it easier for users to more clearly specify what they mean by "this" when they tell the search engine to "find something that looks like THIS".
When we look at photographs, we look for patterns and objects. We identify a photograph that is 10% brown and 90% green as a brown horse in a grassy field. So when searching for similar images, we would not be confused by a photograph of a green river dotted with 10% brown fallen tree branches. But general-purpose VSEs could identify a horse in the field and tree branches in a green river as very similar. They look at the image as one big undifferentiated group of RGB values. An object-based VSE like eVision, tries to first identify the objects in an image before doing a comparative search. While it can't attribute the meaning of horse to the brown object, it can say that the photo is composed of two distinct objects - a brown one with a particular shape and a green background. Then it runs visual comparisons to other images based on these regions.
Thanks to Tony DeYoung of WebReference for the material in this project topic. Tony DeYoung is a Web developer and technophile living in San Francisco. As an independent contractor, he makes
his living finding new media technologies and developing applications or consulting services around them. Tony's original article can be found here.