Free Outlook Express Spam FilterAnti-Spam Blocker Software For Microsoft

Markovian Discrimination

A hidden Markov model (HMM) or Markovian Discrimination is a statistical model where the system being modelled is assumed to be a Markov process with unknown parameters, and the challenge is to find the hidden parameters, from the observable parameters, based on this assumption. The extracted model parameters can then be used to do further analysis, for example for pattern recognition applications.

In a regular Markov model in Markovian Discrimination, the state is directly visible to the observer, therefore the state transition probabilities are the only parameters. A hidden Markov model adds outputs: each state has a probability distribution over the possible output tokens. To sum up, looking at a sequence of tokens generated by an HMM does not directly suggest the sequence of states.

Using Markov models

There are 3 canonical problems to work out with HMMs and Markovian Discrimination:

  • Given the model conditions, compute the probability of a particular output sequence. Worked out by the forward algorithm.
  • Given the model parameters, find the most likely sequence of (hidden) states which could have generated a given output sequence. Solved by the Viterbi algorithm.
  • Given an output sequence, find the most likely set of state transition and output probabilities. Solved by the Baum-Welch algorithm.

A concrete example of Markovian Discrimination

Assume you have a friend who lives far away and whom you call daily to talk about what each of you did that day. Your friend has only three things he's interested in: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined only by the weather on a given day. You have no definite information about the weather where your friend lives, but you know general trends. Based on what he tells you he did each day, you try to guess what the weather must have been like.

You believe that the weather works as a separate Markov chain. There are two states, "Rainy" and "Sunny", but you cannot see them directly, that is, they are hidden from you. On each day, there is a certain chance that your friend will do one of the following activities, depending on the weather: "walk", "shop", or "clean". Since your friend tells you about his activities, those are the observations. The entire system is that of a hidden Markov model (HMM).

You know the general weather trends in the area and you know what your friend likes to do on average. For example, the parameters of the HMM are known. In fact, you can write them down in the Python programming language:

Applications of hidden Markov models and Markovian Discrimination

  • speech recognition or optical character recognition
  • natural language processing
  • bioinformatics and genomics
    o prediction of protein-coding regions in genome sequences
    o modelling families of related DNA or protein sequences
    o prediction of secondary structure elements from protein primary sequences
  • and many more...

In mathematics, a (discrete-time) Markov chain, named after Andrei Markov, is a discrete-time stochastic process with the Markov property. In such a process, the past is irrelevant for predicting the future given knowledge of the present.

Scientific applications

Markovian systems appear extensively in physics, particularly statistical mechanics, whenever probabilities are used to show unknown or unmodelled details of the system, if it can be assumed that the dynamics are time-invariant, and that no relevant history need be considered which is not already in the state description.

Markov chains can also be used to model various processes in queueing theory and statistics. Claude Shannon's famous 1948 paper A mathematical theory of communication, which at a single step created the field of information theory, opens by introducing the idea of entropy through Markov modeling of the English language. Such idealised models can capture many of the statistical regularities of systems. Even without describing the full structure of the system perfectly, such signal models can make possible effective data compression through entropy coding techniques such as arithmetic coding. They also allow effective state estimation and pattern recognition. The world's mobile telephone systems depend on the Viterbi algorithm for error-correction, while Hidden Markov models (where the Markov transition probabilities are at first unknown and must also be estimated from the data) are extensively used in speech recognition and in bioinformatics, for instance for coding region/gene prediction.

The PageRank of a webpage as used by Google is defined by a Markov chain. It is the probability to be at page i in the stationary distribution on the following Markov chain on all (known) webpages. If N is the number of known webpages, and a page i has ki links then it has transition probability (1-q)/ki + q/N for all pages that are linked to and q/N for all pages that are not linked to. The parameter q is taken to be about 0.15.

CRM114 is a system to examine incoming e-mail (including spam, phishing, and email fraud), system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to the user's wildest desires. Even spammy direct marketing from Internet Service Providers and domains. Criteria for categorization of data can be by satisfaction of regexes, by sparse binary polynomial matching with a Bayesian Chain Rule evaluator, a Hidden Markov Model, or by other means. Accuracy of the SBPH/BCR classifier has been seen in excess of 99 per cent, for 1/4 megabyte of learning text. In other words, CRM114 learns, and it learns fast . Faster than any Naive Bayesian filtering technique. Programs are being developed using CRM114 to reduce macintosh outlook spam and being spam free in outlook.

Markov chain methods have also become important for generating sequences of random numbers to accurately reflect complicated wished probability distributions - a process called Markov Chain Monte Carlo or MCMC for short. In recent years this has revolutionised the practicability of Bayesian inference methods.
Markov chains also have many applications in biological modelling, particularly population processes, which are useful in modelling processes that are (at least) similar to biological populations.

A recent use of Markov chains is in geostatistics. That is, Markov chains are used in two to three dimensional stochastic simulations of discrete variables conditional on observed data. Such an application is called "Markov chain geostatistics", similar to kriging geostatistics. The Markov chain geostatistics method is still in development.

Markov parody generators

Markov processes and Markovian Discrimination can also be used to produce superficially "real-looking" text given a sample document: they are used in various pieces of recreational "parody generator" software (Jeff Harrison, Mark V Shaney). Markov chains have also been used in music composition.

DISCLAIMER
Although we do our best to provide our users with useful and accurate information on our web site, we do not update this information which is derived from sources believed to be accurate. Users must understand that information presented does not serve as an endorsement of any particular company or individual and that this information changes frequently and is subject to differing interpretations. Users are hereby advised that they are responsible for ensuring that the facts and general advice obtained from our site are applicable to their specific situations and should discuss their specific tax, business, financial, and legal matters with pertinent professionals.

 

» Welcome
» SPAMarkov Guided Tour
» Features and Download
» Store
» Reviews
» Forum
» Help and Support
» About Us
» Contact Us
   
» Link Exchange Instructions
» Our Partners
» Site Map

 

Subscribe to Our Newsletter

Name:
Email:
Country:
Your question or comment:



 

Free Outlook Express Spam Filter
© Copyright 2005, SPAMarkov.com. | Free Outlook Express Spam Filter
Live Support! Download FREE Trial Here!