How Should We Retrieve Images? - MULTIMEDIA

Consider the image in the following figure of a small portion of The Garden of Delights by Hieronymus Bosch (1453 - 1516), now in the Prado museum in Madrid. This is a famous painting, but we may be stumped in understanding the painter's intent. Therefore, if we are aiming at automatic retrieval of images, it should be unsurprising that encapsulating the semantics (meaning) in the image is an even more difficult challenge. A proper annotation of such an image certainly should include the descriptor "people". On the other hand, should this image be blocked by a "Net nanny" screening out "naked people"?

We know very well that most major web browsers have a web search button for multimedia content, as opposed to text. For Bosch's painting, a text - based search will very likely do the best job, should we wish to find this particular image. Yet we may be interested in fairly general searches, say for scenes with deep blue skies and orange sunsets. By pre - calculating some fundamental statistics about images stored in a database, we can usually find simple scenes such as these.

In its inception, retrieval from digital libraries began with ideas borrowed from traditional information retrieval disciplines. This line of inquiry continues. For example, in images are classified into indoor or outdoor classes using basic information - retrieval techniques. For a training set of images and captions, the number of times each word appears in the document is divided by the number of times each word appears over all documents in a class. A similar measure is devised for statistical descriptors of the content of image segments, and the two information - retrieval - based measures are combined for an effective classification mechanism.

However, most multimedia retrieval schemes have moved toward an approach favoring multimedia content itself, without regard to or reliance upon accompanying textual information. Only recently has attention once more been placed on the deeper problem of addressing semantic content in images, once again making use of accompanying text. If data consists of statistical features built from objects in images and also of text associated with the images, each type of modality - text and image - provides semantic content omitted from the other. For example, an image of a red rose will not normally have the manually added keyword "red" since this is generally assumed. Hence, image features and associated words may disambiguate each other.

In this chapter, however, we shall focus only on the more standardized systems that make use of image features to retrieve images from databases or from the web. The types of features typically used are such statistical measures as the color histogram for an image. Consider an image that is colorful - say, a Santa Claus plus sled. The combination of bright red and flesh tones and browns might be enough of an image signature to allow us to at least find similar images in our own image database (of office Christmas parties).

Recall that a color histogram is typically a three - dimensional array that counts pixels with specific red, green, and blue values. The nice feature of such a structure is that is does not care about the orientation of the image (since we are simply counting pixel values, not their orientation) and is also fairly impervious to object occlusions. A seminal paper on this subject launched a tidal wave of interest in such so - called "low - level" features for images.

Other simple features used are such descriptors as color layout, meaning a simple sketch of where in a checkerboard grid covering the image to look for blue skies and orange sunsets. Another feature used is texture, meaning some type of descriptor typically based on an edge image, formed by taking partial derivatives of the image itself - classifying edges according to closeness of spacing and orientation. An interesting version of this approach uses a histogram of such edge features. Texture layout can also be used. Search engines devised on these features are said to be content - based: the search is guided by image similarity measures based on the statistical content of each image.

Typically, we might be interested in looking for images similar to our current favorite Santa. A more industry - oriented application would typically be seeking a particular image of a postage stamp. Subject fields associated with image database search include art galleries and museums, fashion, interior design, remote sensing, geographic information systems, meteorology, trademark databases, criminology, and an increasing number of other areas.

A more difficult type of search involves looking for a particular object within images, which we can term a search - by - object model. This involves a much more complete catalog of image contents and is a much more difficult objective. Generally, users will base then - searches on search by association, meaning a first cut search followed by refinement based on similarity to some of the query results. For general images representative of a kind of desired picture, a category search returns one element of the requested set, such as one or several trademarks in a database of such logos. Alternatively, the query may be based on a very specific image, such as a particular piece of art - a target search.

How can we best characterize the information content of an image?

How can we best characterize the information content of an image?

Another axis to bear in mind in understanding the many existing search systems is whether the domain being searched is narrow, such as the database of trademarks, or wide, such as a set of commercial stock photos.

For any system, we are up against the fundamental nature of machine systems that aim to replace human endeavors. The main obstacles are neatly summarized in what the authors of the summary in term the sensory gap and the semantic gap: The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene.

The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.

Image features record specifics about images, but the images themselves may elude description in such terms. And while we may certainly be able to describe images linguistically, the message in the image, the semantics, is difficult to capture for machine applications.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

MULTIMEDIA Topics