The desing of such a statistical model is outlined. It assumes a given parametric form of the distribution density of the model features and then takes into account the translation, rotation, and projection of an object to an image, the unkwown correspondence between model and image features, and the unknown partition between object and background features. The unknown parameters of the distribution density are estimated from images of different views of the object. Using the principle of missing information, resulting in the EM-algorithm, it is possible to estimate the parameters of a 3D-model density from 2D-images without handlabeling of correspondences between model and image features. This is the basis for mainly automatic training of the statistical model.
Once the statistical model is determined, localization is treated as maximum likelihood estimation of the localization parameters, and classification is the conventional determination of the class with maximal a posteriori probability. However, localization is computationally expensive since it requires a global search of the parameter space.
We present experimental results showing the feasibility of this approach. Possible extensions concern the selected features, the efficiency of search, and the modeling of inter-object dependencies to obtain statistical scene models.