Monday, 10 March 2014

Random Ferns for patch description

Random Ferns for patch description


  • Let us consider the problem of object recognition in which we have to decide if a given patch of image contains the desired object or not.
  • One way to achieve this to characterize the texture about a key-points across images acquired under widely varying poses and lightning conditions.

    Due to its robustness to partial occlusions and computational ef´Čüciency, recognition of image patches extracted around detected key-points is crucial for many vision problems.
  • One way to achieve this is to compute a statistical model for the object/patch and then estimate if a given patch has a good probability of being sampled from the object model.
  • Let us consider a mathematical representation of the patch to be classified by representing it in terms of a feature vector.
  • Let $f$ be the set of features computed over the patch. \begin{eqnarray} \bar{c_i}=argmax P(C = c_i | f)\\ P(C = c_i | f) = \frac{P(f | C=c_i) P(C=c_i)}{P(f)} \\ \bar{c_i} = argmax P(f | C=c_i) \\ \end{eqnarray} For real time application we require features that can be easily computed and encoded.The concept of Local Binary patterns has become popular for texture analysis and patch description due to it small memory and computational requirement.
  • In case of Local binary features each patch is described by a bit string. The value of each bit is derived by performing some binary test on the image patch
  • The test's can in principle be in any binary test that can effectively encode the information present in the patch as 0 or 1.
  • Typically each binary feature is determined by performing intensity comparisons . The intensity comparisons can be between two different pixel locations or between image and transformed image at specified pixel location.
  • A set of binary test constitutes the feature vector.
  • However these features are very simple and hence a large number of such features may be required to accurately describe a patch.
  • If we consider a joint probability distribution we would require to store $2^N$ entries for each class.However if we assume independence between features using Naive Bayes assumption we are required to store N feature ,however this completely ignores any correlation between features.
  • We consider a randomized fern algorithm,which consists of randomly selecting S features out of total pool of N features into single group.
  • All the features with a group are considered to be dependent and a joint distribution is constructed over these features.
  • Each group is called a fern and for each fern $f_k$ we can compute $P(f_k | C = c_i)$, Thus we have K random classifier that provide us the probability that feature vector $f_k$ belong to class $P(f_k | C = c_i)$.
  • Here each fern is constructed from a random subset of S features.
  • $F_k$ corresponds binary feature vector for each fern k. Since each group contains S features and we are required to maintain a joint PDF we require $2^S$ entries.And since there are M groups total entires required are $M*2^S$.
  • The feature points used in binary tests are selected randomly from a normal distribution. The grouping of features points is also performed randomly.
  • We will approximate the joint PDF using histogram.Each fern consists of S binary features
  • Since features are binary ,total number of bins the joint histogram are $2^S$
  • For example for a fern with 2 features the histogram bins are indexed as 00,01,10,11.Each bin can be indexed by a integral value.
  • During the training phase we need estimate the class conditional probabilities that require the estimation of $2^S*M$ parameters.

    An ensemble techniques is used to combine the result of K classifiers \begin{eqnarray} \bar{c_i} = argmax \frac{1}{K} \sum{k=1}^{K} P(f_k | C=c_i) \\ \end{eqnarray}
  • A important feature of this scheme of modelling does not require us to store the training data.
  • Also we can perform incremental learning since we have to update only the counts in the statistical model.

    Implementation Details

  • The points where the features are computed are stored in 2D vector of length $numFerns*2*numFeatures$, this is stored in a vector where each element is of type $Point2f$.
  • As mentioned above to maintain a Joint PDF we have to maintain a joint histogram of $2^{numFeatures}$ and there are $numFerns$ groups.The data is stored in a single data structure of type std::vector.
  • 3 such vectors are maintained positive,negative and posterior probability distribution.
  • Each of the locations in the joint PDF can be addressed using a integer data type ie The feature vector extracted for each Fern/group is a integer data type.
  • Whenever we learn a positive/negative sample,we increment the count corresponding to the bins indexed by the integer valued feature.Since feature vector is of length $_numFerns$ the vectors are indexed as $i*numIndices+features[i]$ where $_numIndices=2^{numFeatures}$ $i$ represents the fern and $features[i]$ represents integer feature value.
  • Thus to maintain histogram ,the bin indices corresponding to decimal equivalent to binary features are incremented.
  • Positive vector is updated upon learning positive sample ,Negative vector is updated upon learning negative sample and Posteriors probability vector is updated after updating positive or negative samples and contains the confidence/probability that binary feature belongs to positive patch.
  • Give a binary vector to compute probability that it represents a positive sample we just look at the posterior probabilities of each group $P(F_k | C=c_i)$ and compute the average over all groups $k=1\ldots numFerns$.


  • The code for the same can be found at git repo in $ImgFeatures/randomferns.cpp$ and $ImgFeatures/randomferns.hpp$ files
  • Some important functions are also provided below
  • class $RandomFerns$ is the main class for the file