#
Markov Chains

#### Introduction

In this article we will look at markov models and its application in classification of discrete sequential data.- Markov processes are examples of stochastic processes that generate random sequences of outcomes or states according to certain probabilities.
- Markov chains can be considered mathematical descriptions of Markov models with a discrete set of states.
- Markov chains are integer time process ${X_n,n\ge 0}$ for which each random variable $X_n$ is integer valued and depends on the past only through most recent random variable $X_{n-1}$ for all integer $n\geq 1$.
- ${X_n,n\in N}$ is a discrete Markov chain on state space $S={1,\ldots,M}$
- At each time instant t,The system changes state ,and makes a transition.
- The markov chains follow the markovian and stationarity property.
- For a first order markov chain,the markov property states that the state of the system at time $t+1$ depends only on the state of the system at time $t$.The markov chain is also said to be memoryless due to this property. \begin{eqnarray*} & Pr(X_{t+1} = x_{t+1} |X_{t} = {x_1 \ldots x_t} = Pr(X_{t+1} = x_{t+1} |X_{t} = x_t) \end{eqnarray*}
- A stationarity assumption is also made which implies that markov property is independent of time. \begin{eqnarray*} & Pr(X_{t+1} = x_i | X_{t} = x_j) = P_{i,j} & \text{for $\forall$ t and $\forall i,j \in {0 \ldots M}$} \end{eqnarray*}
- Thus we are looking at processes whose sample functions are sequence of integers between ${1 \ldots M}$.
- Thus markov process is parameterized by transition probability $P_{ij}$ and intital probability $P_{i0}$
- Markov chains can be represented by directed graphs,where each state is represented by a node and directed arc represents a non zero transition probability.
- If a markov chain has M states then transition probability can be represented by a $MxM$ matrix. \begin{eqnarray*} &T =\begin{bmatrix} P_{11} & P_{22} & \ldots &P_{1M} \\ P_{21} & P_{22} & \ldots & P_{2M} \\ \ldots \\ P_{M1} & P_{M2} & \ldots & P_{MM} \\ \end{bmatrix} \\ &\sum_{j} P_{ij} = 1 \end{eqnarray*}
- The matrix T is stochastic matrix where elements in each row sum to 1
- This implies that it is necessary for transition to occur from present state to one of the M states.
- The probability of sequence being generated by markov chain is given by \begin{eqnarray*} &P({X}) = \pi(x_0)*\prod_{t=1}^T p(x_t | x_{t-1}) \\ & \text{$p(x_t | x_{t-1})$ is the probability of observing the sequence $x_t$ at} \\ \end{eqnarray*} time instant t given the present state is $t-1$
- Let us consider a 2 models with following initial transition and probability
matrix.
\begin{eqnarray*}
& \pi_1 =
\begin{bmatrix}
1 & 0 & 0 \\
\end{bmatrix}
\\
& T_1 =
\begin{bmatrix}
0.6 & 0.4 & 0 \\
0.3 & 0.3 & 0.4 \\
0.4 & 0.1 & 0.5 \\
\end{bmatrix}

& \pi_2 = \begin{bmatrix} 0.1 & .5 & 0.4 \end{bmatrix} \\ & T_2 = \begin{bmatrix} 0.9 & 0.05 & 0.05 \\ 0.3 & 0.1 & 0.6 \\ 0.3 & 0.5 & 0.2 \\ \end{bmatrix} \end{eqnarray*}#### Sequence Classification

- The sequence generate from these two markov chains \begin{eqnarray*} & S_1= \begin{bmatrix} 1 & 2 & 1 & 2 & 3 \\ 3 & 3 & 2 & 1 & 1 \\ \end{bmatrix} \\ & S_2= \begin{bmatrix} 3 & 2 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix} \end{eqnarray*}
- In sequence 1 since $P_{13}=0$,we can observe there is no transition from 1 to 3 and dominant transition is expected to be from $1->1$.
- In sequence 2 since dominant transition is from $1->1$ we can observe a long sequence of 1's.
- We can also compute the probability that the sequence has been generate from a given markov process.
- The sequence 1 has probability of $8.6400e^{-05}$ from the 1st model and $ 2.4300e^{-07}$ from second model.
- The sequence 2 has probability of 0 being generate from 1st model and 0.00287 from 2nd model.
- Thus if we have sequence and know it is being generate from 1 of 2 models we can always predict the model the sequence has been generated from by choosing the model which generates the maximum probability.
- Thus we can use markov chain for sequence modelling and classification.
#### Generating a Sequence

- The idea behind generating a sequence from a markov process is to use a uniform random number generator.
- For each row of initial probability or transition matrix select state which is most likely.
- For example if the row contains values $[0.6,0.4,0]$
- If a uniform random value generates a value between 0 and 0.6 then state 0 is returned
- If a random value between 0.6 and 1 is generated then state 1 is returned.
- First step is to use the above method to select a inital state of matrix by passing the initial probability matrix as input.
- Next random state will be selected from the transition probability
by passing the transition probability matrix as input.
#### Code

The code can be found in https://github.com/pi19404/OpenVision in files {ImgML/markovchain.cpp} and {ImgML/markovchain.hpp} .

**The PDF Version of the document can be found below**