Neuroinformatics Group

Universität BielefeldTechnische FakultätNI

Invariant Recognition with Generative Models

A generative model must be able to integrate bottom-up and top-down signals for improved recognition rates and especially for the learning of new objects from few training views. Furthermore, this process must be able to adapt to unseen objects and extrapolate to unseen views of known objects by predicting (or generating) the appearance, based on the estimated intrinsic variables and other contextual information. The basic idea is as follows:

  1. Propagate first recognition signals upwards, that are matched with the intermediate representations and then
  2. exploit stored experiences (observed transformations) in order to generate hypotheses (predictions) for the missing or ambiguous pieces of information and pass them downwards (shape, hidden parts, angle, movement ...).
  3. In the lower layers check if the feedback signals contradict the actual inputs or can be integrated in a plausible manner.
  4. The checking allows that a hypothesis may contradict the input signal, as long as another hypothesis claims to “explain” the same part of the input signal.
  5. This up-wards matching and down-wards checking can be seen as a communication process to build up a plausible representation.

The Generative Loop

The generative loop

In hard vision problems, a pure feed-forward detection might not be sufficient because the feature detectors may not be able to provide reliable results. In the picture above, these are just small image patches, but that is only for simplicity reasons and you can think of others e.g., Gabor filters. Here, occlusion leads to big differences when compared (thin black arrows) to the stored features in the discriminative processing. However, there might be just enough information to generate a list of possible candidates to feed into a model of the scene. From a representation like this, it might be feasible to generate expectations for the different hypotheses, which can be compared to the image or to the early features in the forward processing (red dotted arrows). Of course, this would raise the need for a comparison within single features. One can imagine that the bonsai plant has been detected and this is used as an explanation for the green pixel in the features that should match with the duck or the banana. Clearly for hard scenarios, an iterative approach is needed. The generative loop. Basically the question can be posed as: How can the process of learning and recognition be shaped such that the dependence from the infinitely many different possible appearances (feature vectors) is minimal and the higher-layer representation catches more of the abstract essence of the objects and the typical applied transformations? In other words, we aim for plausible explaining instead of pure pattern matching.