septemberman: Viterbi Algorithm

Viterbi Algorithm

Viterbi algorithm definition

1. Formal definition of algorithm

The algorithm may be summarised formally as:

For each i,, i = 1, ... , n, let :

- this intialises the probability calculations by taking the product of the intitial hidden state probabilities with the associated observation probabilities.

For t = 2, ..., T, and i = 1, ... , n let :

- thus determining the most probable route to the next state, and remembering how to get there. This is done by considering all products of transition probabilities with the maximal probabilities already derived for the preceding step. The largest such is remembered, together with what provoked it.

Let :

- thus determining which state at system completion (t=T) is the most probable.

For t = T - 1, ..., 1

Let :

- thus backtracking through the trellis, following the most probable route. On completion, the sequence i₁ ... i_T will hold the most probable sequence of hidden states for the observation sequence in hand.

2. Calculating individual 's and 's

The calculation of 's is similar to the calculation of partial probability ('s) in the forward algorithm. Compare this diagram showing 's and 's being calculated with the diagram at the end of section 2under the forward algorithm.

The only difference is that the summation ( ) in the forward algorithm is replaced with max to calculate the 's - this important difference picks out the most likely route to the current position, rather than the total probability. We also, for the Viterbi algorithm remember the best route to the current position by maintaining a `back-pointer', via the argmax calculation of the 's.

Finding most probable sequence of hidden states

We often wish to take a particular HMM, and determine from an observation sequence the most likely sequence of underlying hidden states that might have generated it.

1. Exhaustive search for a solution

We can use a picture of the execution trellis to visualise the relationship between states and observations.

We can find the most probable sequence of hidden states by listing all possible sequences of hidden states and finding the probability of the observed sequence for each of the combinations. The most probable sequence of hidden states is that combination that maximises

Pr(observed sequence | hidden state combination).

For example, for the observation sequence in the trellis shown, the most probable sequence of hidden states is the sequence that maximises :

Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy), . . . . Pr(dry,damp,soggy | rainy,rainy,rainy)

This approach is viable, but to find the most probable sequence by exhaustively calculating each combination is computationally expensive. As with the forward algorithm, we can use the time invariance of the probabilities to reduce the complexity of the calculation.

2. Reducing complexity using recursion

We will consider recursively finding the most probable sequence of hidden states given an observation sequence and a HMM. We will first define the partial probability , which is the probability of reaching a particular intermediate state in the trellis. We then show how these partial probabilities are calculated at t=1 and at t=n (> 1).

These partial probabilities differ from those calculated in the forward algorithm since they represent the probability of the most probable path to a state at time t, and not a total.

2a. Partial probabilities ( 's) and partial best paths

Consider the trellis below showing the states and first order transitions for the observation sequence dry,damp,soggy;

For each intermediate and terminating state in the trellis there is a most probable path to that state. So, for example, each of the three states at t = 3 will have a most probable path to it, perhaps like this;

We will call these paths partial best paths. Each of these partial best paths has an associated probability, the partial probability or . Unlike the partial probabilities in the forward algorithm, is the probablity of the one (most probable) path to the state.

Thus (i,t) is the maximum probability of all sequences ending at state i at time t, and the partial best path is the sequence which achieves this maximal probability. Such a probability (and partial path) exists for each possible value of i and t.

In particular, each state at time t = T will have a partial probability and a partial best path. We find the overall best path by choosing the state with the maximum partial probability and choosing its partial best path.

2b. Calculating 's at time t = 1

We calculate the partial probabilities as the most probable route to our current position (given particular knowledge such as observation and probabilities of the previous state). When t = 1 the most probable path to a state does not sensibly exist; however we use the probability of being in that state given t = 1 and the observable state k1 ; i.e.

- as in the forward algorithm, this quantity is compounded by the appropriate observation probability.

2c. Calculating 's at time t ( > 1 )

We now show that the partial probabilities at time t can be calculated in terms of the 's at time t-1.
Consider the trellis below :

We consider calculating the most probable path to the state X at time t; this path to X will have to pass through one of the states A, B or C at time (t-1).

Therefore the most probable path to X will be one of

(sequence of states), . . ., A, X

(sequence of states), . . ., B, X

or (sequence of states), . . ., C, X

We want to find the path ending AX, BX or CX which has the maximum probability.

Recall that the Markov assumption says that the probability of a state occurring given a previous state sequence depends only on the previous n states. In particular, with a first order Markov assumption, the probability of X occurring after a sequence depends only on the previous state, i.e.

Pr (most probable path to A) . Pr (X | A) . Pr (observation | X)

Following this, the most probable path ending AX will be the most probable path to A followed by X. Similarly, the probability of this path will be

Pr (most probable path to A) . Pr (X | A) . Pr (observation | X)

So, the probability of the most probable path to X is :

where the first term is given by at t-1, the second by the transition probabilities and the third by the observation probabilities.

Generalising the above expression, the probability of the partial best path to a state i at time t when the observation kt is seen, is :

Here, we are assuming knowledge of the previous state, using the transition probabilites and multiplying by the appropriate observation probability. We then select the maximum such.

2d. Back pointers, 's

Consider the trellis

At each intermediate and end state we know the partial probability, (i,t). However the aim is to find the most probable sequence of states through the trellis given an observation sequence - therefore we need some way of remembering the partial best paths through the trellis.

Recall that to calculate the partial probability, at time t we only need the 's for time t-1. Having calculated this partial probability, it is thus possible to record which preceding state was the one to generate (i,t) - that is, in what state the system must have been at time t-1 if it is to arrive optimally at state i at time t. This recording (remembering) is done by holding for each state a back pointer which points to the predecessor that optimally provokes the current state.

Formally, we can write

Here, the argmax operator selects the index j which maximises the bracketed expression.

Notice that this expression is calculated from the 's of the preceding time step and the transition probabilites, and does not include the obervation probability (unlike the calculation of the 's themselves). This is because we want these 's to answer the question `If I am here, by what route is it most likely I arrived?' - this question relates to the hidden states, and therefore confusing factors due to the observations can be overlooked.

2e. Advantages of the approach

Using the Viterbi algorithm to decode an observation sequence carries two important advantages:

There is a reduction in computational complexity by using the recursion - this argument is exactly analogous to that used in justifying the forward algorithm.

The Viterbi algorithm has the very useful property of providing the best interpretation given the entire context of the observations. An alternative to it might be, for example, to decide on the execution sequence

where

Here, decisions are taken about a likely interpretation in a `left-to-right' manner, with an interpretaion being guessed given an interpretation of the preceding stage (with initialisation from the vector).

This approach, in the event of a noise garble half way through the sequence, will wander away from the correct answer.

Conversely, the Viterbi algorithm will look at the whole sequence before deciding on the most likely final state, and then `backtracking' through the pointers to indicate how it might have arisen. This is very useful in `reading through' isolated noise garbles, which are very common in live data.

3. Section Summary

The Viterbi algorithm provides a computationally efficient way of analysing observations of HMMs to recapture the most likely underlying state sequence. It exploits recursion to reduce comuputational load, and uses the context of the entire sequence to make judgements, thereby allowing good analysis of noise.

In use, the algorithm proceeds through an execution trellis calculating a partial probability for each cell, together with a back-pointer indicating how that cell could most probably be reached. On completion, the most likely final state is taken as correct, and the path to it traced back to t=1 via the back pointers.

출처:> http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/viterbi_algorithm/s1_pg1.html

이 글은 스프링노트에서 작성되었습니다.

septemberman

2010년 5월 29일 토요일

Viterbi Algorithm