Thursday, 24 April 2014

Markov chain implementation in C++ using Eigen


Markov Chains

Introduction

In this article we will look at markov models and its application in classification of discrete sequential data.
  • Markov processes are examples of stochastic processes that generate random sequences of outcomes or states according to certain probabilities.
  • Markov chains can be considered mathematical descriptions of Markov models with a discrete set of states.
  • Markov chains are integer time process ${X_n,n\ge 0}$ for which each random variable $X_n$ is integer valued and depends on the past only through most recent random variable $X_{n-1}$ for all integer $n\geq 1$.
  • ${X_n,n\in N}$ is a discrete Markov chain on state space $S={1,\ldots,M}$
  • At each time instant t,The system changes state ,and makes a transition.
  • The markov chains follow the markovian and stationarity property.
  • For a first order markov chain,the markov property states that the state of the system at time $t+1$ depends only on the state of the system at time $t$.The markov chain is also said to be memoryless due to this property. \begin{eqnarray*} & Pr(X_{t+1} = x_{t+1} |X_{t} = {x_1 \ldots x_t} = Pr(X_{t+1} = x_{t+1} |X_{t} = x_t) \end{eqnarray*}
  • A stationarity assumption is also made which implies that markov property is independent of time. \begin{eqnarray*} & Pr(X_{t+1} = x_i | X_{t} = x_j) = P_{i,j} & \text{for $\forall$ t and $\forall i,j \in {0 \ldots M}$} \end{eqnarray*}
  • Thus we are looking at processes whose sample functions are sequence of integers between ${1 \ldots M}$.
  • Thus markov process is parameterized by transition probability $P_{ij}$ and intital probability $P_{i0}$
  • Markov chains can be represented by directed graphs,where each state is represented by a node and directed arc represents a non zero transition probability.
  • If a markov chain has M states then transition probability can be represented by a $MxM$ matrix. \begin{eqnarray*} &T =\begin{bmatrix} P_{11} & P_{22} & \ldots &P_{1M} \\ P_{21} & P_{22} & \ldots & P_{2M} \\ \ldots \\ P_{M1} & P_{M2} & \ldots & P_{MM} \\ \end{bmatrix} \\ &\sum_{j} P_{ij} = 1 \end{eqnarray*}
  • The matrix T is stochastic matrix where elements in each row sum to 1
  • This implies that it is necessary for transition to occur from present state to one of the M states.
  • The probability of sequence being generated by markov chain is given by \begin{eqnarray*} &P({X}) = \pi(x_0)*\prod_{t=1}^T p(x_t | x_{t-1}) \\ & \text{$p(x_t | x_{t-1})$ is the probability of observing the sequence $x_t$ at} \\ \end{eqnarray*} time instant t given the present state is $t-1$
  • Let us consider a 2 models with following initial transition and probability matrix. \begin{eqnarray*} & \pi_1 = \begin{bmatrix} 1 & 0 & 0 \\ \end{bmatrix} \\ & T_1 = \begin{bmatrix} 0.6 & 0.4 & 0 \\ 0.3 & 0.3 & 0.4 \\ 0.4 & 0.1 & 0.5 \\ \end{bmatrix}
    & \pi_2 = \begin{bmatrix} 0.1 & .5 & 0.4 \end{bmatrix} \\ & T_2 = \begin{bmatrix} 0.9 & 0.05 & 0.05 \\ 0.3 & 0.1 & 0.6 \\ 0.3 & 0.5 & 0.2 \\ \end{bmatrix} \end{eqnarray*}

    Sequence Classification

  • The sequence generate from these two markov chains \begin{eqnarray*} & S_1= \begin{bmatrix} 1 & 2 & 1 & 2 & 3 \\ 3 & 3 & 2 & 1 & 1 \\ \end{bmatrix} \\ & S_2= \begin{bmatrix} 3 & 2 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix} \end{eqnarray*}
  • In sequence 1 since $P_{13}=0$,we can observe there is no transition from 1 to 3 and dominant transition is expected to be from $1->1$.
  • In sequence 2 since dominant transition is from $1->1$ we can observe a long sequence of 1's.
  • We can also compute the probability that the sequence has been generate from a given markov process.
  • The sequence 1 has probability of $8.6400e^{-05}$ from the 1st model and $ 2.4300e^{-07}$ from second model.
  • The sequence 2 has probability of 0 being generate from 1st model and 0.00287 from 2nd model.
  • Thus if we have sequence and know it is being generate from 1 of 2 models we can always predict the model the sequence has been generated from by choosing the model which generates the maximum probability.
  • Thus we can use markov chain for sequence modelling and classification.

    Generating a Sequence

  • The idea behind generating a sequence from a markov process is to use a uniform random number generator.
  • For each row of initial probability or transition matrix select state which is most likely.
  • For example if the row contains values $[0.6,0.4,0]$
  • If a uniform random value generates a value between 0 and 0.6 then state 0 is returned
  • If a random value between 0.6 and 1 is generated then state 1 is returned.
  • First step is to use the above method to select a inital state of matrix by passing the initial probability matrix as input.
  • Next random state will be selected from the transition probability by passing the transition probability matrix as input.

    Code

    The code can be found in https://github.com/pi19404/OpenVision in files {ImgML/markovchain.cpp} and {ImgML/markovchain.hpp} .
The PDF Version of the document can be found below