Efficient discriminant viewpoint selection for active Bayesian recognition
Catherine Laporte and Tal Arbel
Given a database of labeled objects, the object recognition problem requires
associating a label with previously unseen images of these objects. The pose
estimation problem consists in determining the pose of the objects seen in these
images with respect to the reference frames defined by the objects in the database.
These problems are difficult due to the ambiguities in appearance which are
intrinsic to the particular database. For example, two different objects may
have a similar appearance when seen from certain points of view, or an object may
look the same in several different positions. In order to resolve those
ambiguities, it is helpful to use multiple observations of the object instead of
one. Further improvements can be obtained by applying active vision techniques to
select observations in order to optimise the process [2, 3, 4]. Such decision making,
however, comes at a computational cost. This work focuses on
the development of observation selection criteria which are efficient both in terms
of the number of views required to solve the problem, and in terms of computational
tractability.
Sequential Bayesian recognition
Consider a database of objects
,
and a mobile camera
facing an unknown object from this set whose class and pose are to be determined.
The objects may be positioned in
discrete poses defined
according to a global reference frame.
The observed scene may be illuminated by one of
different light sources,
.
The camera measurement is parameterised by a feature
vector
, which depends on the identity
of the object, its pose
, the light source
and the viewing position
.
Under uncertainty, this relationship is represented through a conditional
probability density function
,
which may be obtained from a physical model
or estimated off-line from training data.
Given a measurement
, a known viewing position
and a prior
distribution
over object class, object pose and light
source, the probability of each class-pose-light source tuple is computed using
Bayes' rule:
 |
(1) |
Assuming that subsequent measurements are independent of each other given object class, pose, light source
and viewing positions, one obtains the recursive Bayesian update rule
 |
(3) |
A sequential recognition engine based on this evidence fusion scheme exploits the information
provided by the appearance of the object and, more importantly,
by its spatial structure [7].
Finally, a probability distribution over object class and pose
is obtained by marginalising
(2) over light sources:
Active viewpoint selection
The object recognition and pose estimation problem
is difficult, mainly because for certain choices of viewing positions
, the observed data may be well explained by more
than one hypothesis.
This difficulty can be alleviated by
choosing a shift in viewpoint such that
competing hypotheses will appear as different as possible in order to facilitate
distinction.
Considering the recognition task as a series of pairwise discrimination subtasks,
and given a measure of dissimilarity
between two probability
density functions
and
, the following general form is proposed as a
criterion for the selection of a viewpoint
at step
:
where
and
Each term of the sum achieves a pairwise comparison of two hypotheses. Since
the most probable hypotheses account for most of the ambiguity, more effort is
made to disambiguate highly probable hypotheses.
The general form of the criterion makes no assumption about the dissimilarity
measure
. Good choices for this measures include Mahalanobis distances, when
appearance models are well represented by their mean and variance, or the information
theoretic Jeffrey or Kullback-Leibler divergences (and fast approximations thereof) for more general cases
[5].
Genearlly speaking, it is possible to choose a
which can be computed fast, making the viewpoint selection algorithm
faster than those based on mutual information[4]. The viewpoint selection criterion
criterion can also be simplified by neglecting terms involving extremely low probability
object class and pose pairs, which contribute little to the sum.
This causes the computation
of (4) to get increasingly fast as the Bayesian
inference engine converges toward a single winning hypothesis.
Experiments
The proposed active object recognition and pose estimation framework
may be used in conjunction with a broad
variety of feature extractors and appearance models. The results presented here are based on
a low dimensional appearance-based object representation obtained by principal component analysis (PCA)
[6].
A Gaussian appearance model was fitted
to the projections of labeled training images onto eigenspace.
The first case study was conducted with a database of 31 synthetic 3D models of
aircraft [1]. Figure 1 shows sample rendered
images of these objects.
Figure 1:
Sample objects from the first case study.
|
The problem considered was that of identifying an unknown object
and estimating its pose under the illumination of a single possible light source.
The observer is a virtual camera with two degrees of freedom
about a sphere, within which
the object pose can vary according to two degrees of freedom (pan and tilt).
In a first set of experiments, the proposed observation selection strategy was compared to a random navigation strategy where no active selection of observations was
employed.
The object recognition and pose estimation results are summarised in table 1.
Table 1:
Comparison of the accuracy of recognition and pose estimation results for the aircraft database, using the random and proposed observation selection strategies.
| |
Recognition rate |
Average pose error |
| Random navigation |
81% |
1.84 degrees |
| Proposed strategy |
83% |
2.19 degrees |
|
While there are no significant differences in accuracy between the two methods, the proposed active view selection method requires fewer views for recognition.
Similar experiments were then performed on 14 of the synthetic objects using an observation selection criterion based on mutual information [4].
The proposed strategy achieved similar results to mutual information, both in terms of accuracy and in terms of the number of
views required for recognition. These are summarised in table 2.
However, the proposed observation selection strategy is much less computationally expensive than
mutual information. This is illustrated in figure 2.
Notice that the amount of time required for the first decision is one order of magnitude
lower with the proposed strategy than for the strategy based on mutual information. Furthermore, the time needed
for decision making using the proposed strategy dramatically decreases as the recognition process progresses.
Table 2:
Performance comparison of the recognition and pose estimation results for 14 objects of the aircraft database, using the random,
proposed and mutual information observation selection strategies.
| |
Recog. rate |
Avg. pose error |
Avg. views |
| Proposed strategy |
82% |
2.63 degrees |
3.4 views |
| Mutual information |
81% |
0.43 degrees |
3.3 views |
|
Figure 2:
Comparison of the average time required for viewpoint selection as the recognition task progresses, using the mutual information and
proposed observation selection strategies.
![\includegraphics[]{figs/decision_time.eps}](img109.png) |
Case study 2: real imagery
The second case study considers the more general problem of object recognition and
pose estimation under varying lighting conditions.
The study was conducted with a set of 13 objects which were
custom-built with the purpose of rendering the recognition task
difficult, and two light sources.
Images of 5 sample objects as seen from different points of
view are shown in figure 3 and one object is also shown as
illuminated from the two possible light sources in figure 4.
Figure 3:
Sample objects used for the second case study as seen from eight different points of view
|
Figure 4:
Sample views of object 1 illuminated from two different sources
|
The proposed observation selection strategy was compared to a random navigation strategy (without active viewpoint selection). It was found that the proposed strategy
performed poorly under realistic conditions due to the appearance model not fitting the data very well. While the appearance model can easily be changed to better reflect
experimental conditions, flaws will always remain. Instead, a heuristic was introduced into the navigation strategy whereby a viewpoint cannot be visited more than once
[3].
This prevents the system from using consistently biased information to do inference and select viewpoints.
Table 3 and figures 5 and 6 summarise
the results.
Table 3:
Comparison of the results obtained with and without the non-repeating navigation constraint for the random and proposed
viewpoint selection approaches.
| |
Recog. rate |
Avg. pose error |
Avg. views |
| Random |
87% |
0.69 degrees |
10.06 |
| Proposed strategy |
76 % |
2.66 degrees |
8.54 |
| Random non-repeating |
93% |
0.49 degrees |
9.36 |
| Proposed strategy non-repeating |
94% |
1.71 degrees |
6.85 |
|
Figure 5:
Comparison of the average number of steps required for recognition and pose
estimation of the house-like objects of the second case study using both the
random and proposed observation selection strategies.
|
Figure 6:
A comparison between the repeating and non-repeating versions of the random and proposed navigation strategies based on the
evolution of recognition accuracy over time, using real data.
[Evolution of the recognition rates over time for the different navigation strategies.]![\includegraphics[width=3in]{figs/rate_evolution_house_tabu.eps}](img240.png) |
[Evolution of the pose estimation error over time for the different navigation strategies.]![\includegraphics[width=3in]{figs/pose_evolution_house_tabu.eps}](img241.png) |
|
Clearly, the non-repeating
navigation constraint improves the accuracy of the recognition results. The average rate of
correct
classification obtained with the non-repeating version of the proposed observation selection strategy compares with that obtained with the non-repeating random
navigation strategy.
The slight degradation in the accuracy of the pose estimates in the case of the proposed observation selection strategy
is largely compensated by the lower cost implied by the acquisition of measurements. As shown in figure
5, the number of views
required for recognition and pose estimation of the different objects is consistently and significantly lower in the case of the proposed navigation strategy
than in the random case.
The proposed active viewpoint selection criterion allows
for competing hypotheses to be effectively
disambiguated and is an efficient alternative to popular techniques that
maximize mutual information.
The proposed observation selection strategy is
much quicker than a strategy based on mutual information, and requires fewer
measurements than a random navigation strategy.
Conceivably, the new approach could be combined with instance-based learning techniques
to further accelerate the viewpoint selection process[7].
- 1
-
Radio control - computer aided design gallery.
http://www.rccad.com/Gallery-Classic8.htm.
- 2
-
T. Arbel and F. P. Ferrie.
Entropy-based gaze planning.
Image and Vision Computing, 19:779-786, 2001.
- 3
-
H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz.
Appearance-based active object recognition.
Image and Vision Computing, 18:715-727, 2000.
- 4
-
J. Denzler and C. M. Brown.
Information theoretic sensor data selection for active object
recognition and state estimation.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
24(2):145-157, 2002.
- 5
-
D. J. C. MacKay.
Information-based objective functions for active data selection.
Neural Computation, 4(4):589-603, 1992.
- 6
-
H. Murase and S. K. Nayar.
Visual learning and recognition of 3-D objects from appearance.
International Journal of Computer Vision, 14:5-24, 1995.
- 7
-
L. Paletta, M. Prantl, and A. Pinz.
Learning temporal context in active object recognition using
Bayesian analysis.
In Proceedings of the 15th International Conference on Pattern
Recognition, pages 695-699, Barcelona, Spain, 2000.
For more information on this work, see the following publications
Catherine Laporte and Tal Arbel,
"Efficient discriminant viewpoint selection for active Bayesian recognition",
International Journal of Computer Vision, 68(3):267-287, July 2006.
Catherine Laporte, Rupert Brooks and Tal Arbel,
"A fast discriminant approach to active object recognition and pose estimation",
In Proceedings of the 17th International Conference on Pattern Recognition, vol. 3, pages 91-94, Cambridge, U.K., 2004.
[PS][PDF]
Catherine Laporte,
"A fast discriminant approach to active Bayesian visual recognition", M. Eng. thesis, McGill University, Montreal, Canada, 2004.
[PS][PDF]
Efficient discriminant viewpoint selection for active Bayesian recognition
This document was generated using the
LaTeX2HTML translator Version 2002 (1.62)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 subm.tex
The translation was initiated by Catherine Laporte on 2006-08-14
Catherine Laporte
2006-08-14