首先看一下来自Wolfram的定义
马尔可夫链是随机变量{X_t}的集合(t贯穿0,1,…),给定当前的状态,未来与过去条件独立。
Wikipedia的定义更清楚一点儿
…马尔可夫链是具有马尔可夫性质的随机过程…[这意味着]状态改变是概率性的,未来的状态仅仅依赖当前的状态。
马尔可夫链具有多种用途,现在让我看一下如何用它生产看起来像模像样的胡言乱语。
算法如下,
代码如下
import random
class Markov(object):
def __init__(self, open_file):
self.cache = {}
self.open_file = open_file
self.words = self.file_to_words()
self.word_size = len(self.words)
self.database()
def file_to_words(self):
self.open_file.seek(0)
data = self.open_file.read()
words = data.split()
return words
def triples(self):
\"\"\" Generates triples from the given data string. So if our string were
\"What a lovely day\", we\'d generate (What, a, lovely) and then
(a, lovely, day).
\"\"\"
if len(self.words) < 3:
return
for i in range(len(self.words) - 2):
yield (self.words[i], self.words[i+1], self.words[i+2])
def database(self):
for w1, w2, w3 in self.triples():
key = (w1, w2)
if key in self.cache:
self.cache[key].append(w3)
else:
self.cache[key] = [w3]
def generate_markov_text(self, size=25):
seed = random.randint(0, self.word_size-3)
seed_word, next_word = self.words[seed], self.words[seed+1]
w1, w2 = seed_word, next_word
gen_words = []
for i in xrange(size):
gen_words.append(w1)
w1, w2 = w2, random.choice(self.cache[(w1, w2)])
gen_words.append(w2)
return \' \'.join(gen_words)
为了看到一个示例结果,我们从古腾堡计划中拿了沃德豪斯的《My man jeeves》作为文本,示例结果如下。
In [1]: file_ = open(\'/home/shabda/jeeves.txt\') In [2]: import markovgen In [3]: markov = markovgen.Markov(file_) In [4]: markov.generate_markov_text() Out[4]: \'Can you put a few years of your twin-brother Alfred, who was apt to rally round a bit. I should strongly advocate the blue with milk\'
[如果想执行这个例子,请下载jeeves.txt和markovgen.py
马尔可夫算法怎样呢?
这是一个示例文本。
复制代码 代码如下:\”The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead.\”
这个文本对应的语料库像这样,
{(\'The\', \'quick\'): [\'brown\'],
(\'brown\', \'fox\'): [\'jumps\', \'who\', \'who\'],
(\'fox\', \'jumps\'): [\'over\'],
(\'fox\', \'who\'): [\'is\', \'is\'],
(\'is\', \'slow\'): [\'jumps\'],
(\'jumps\', \'over\'): [\'the\', \'the\'],
(\'over\', \'the\'): [\'brown\', \'brown\'],
(\'quick\', \'brown\'): [\'fox\'],
(\'slow\', \'jumps\'): [\'over\'],
(\'the\', \'brown\'): [\'fox\', \'fox\'],
(\'who\', \'is\'): [\'slow\', \'dead.\']}
现在如果我们从\”brown fox\”开始,接下来的单词可以是\”jumps\”或者\”who\”。如果我们选择\”jumps\”,然后当前的状态就变成了\”fox jumps\”,再接下的单词就是\”over\”,之后依此类推。
提示