As an exercise to practice Ruby, you can try to compete a random text generator using an underlying Markov chain model. The codes in the following github account are incomplete. You are supposed to fill in or create methods that will create randomly generated texts given seed texts.
http://github.com/eandrejko/text-dynamo
Markov chain is like a state machine, but the key is the what causes state transition only depends on the current state. In this case, how do you determine probability of selecting which word next? It’s quite simple. You go through the seed text and count frequency of next words, and that determines the frequency. For example, “am” is likely to folllow “I” most frequently. Next might be “do” or other verbs.
There are multiple ways to do it, and I think more elegant way is to use two or more classes – one class for nodes and one class for edges. Each word is a node, and each node is connected via an edge. Weight is given to an edge, which represents probability. Weight could be just incremented by one, thus larger the weight is more likely that word (node) could be chosen. The edge is also directional.
I did much more simpler way, by creating hashes using a pair of keys. First key is each word, and next key is what the next word would be. The value is the weight. Thus, node[“I”][“am”] will probably have the biggest value, largest weight, and highest probability.
text_generator.rb
require File.dirname(__FILE__) + '/markov_chain'
class TextGenerator
attr_reader :markov_chain
def initialize
@markov_chain = MarkovChain.new
end
def seed(text)
sentences(text).each do |sentence|
words = sentence.split
words.each_with_index do |word, index|
if words[index+1]
@markov_chain.increment_probability(word, words[index+1])
end
end
end
end
def sentences(text)
return text.split(/(?=)\s+(?=[A-Z])/)
end
def generate(start)
@markov_chain.random_walk(start).join(" ")
end
end
markov_chain.rb
require File.dirname(__FILE__) + '/weighted_directed_graph'
class MarkovChain
attr_accessor :graph
def initialize
@graph = WeightedDirectedGraph.new
end
def increment_probability(a,b)
@graph.add_node(a)
@graph.connect(a,b)
end
def next_node(node)
total = @graph.node(node).values.inject(0) {|sum, v| sum + v}
n = rand*total
@graph.node(node).each do |key, weight|
return key if n < weight
n -= weight
end
end
def random_walk(start)
sentence = []
node = start
while @graph.out_degree_of(node) > 0
sentence << node
node = next_node(node)
end
sentence << node
end
end
weighted_directed_graph.rb
class WeightedDirectedGraph
attr_accessor :nodes
def initialize
@nodes = Hash.new
end
def add_node(name)
@nodes[name] ||= Hash.new
end
def connect(a,b,weight=1)
@nodes[a][b] = (@nodes[a][b] ||= 0) + weight
end
def edge_weight(a,b)
@nodes[a][b]
end
def contains?(name)
@nodes[name]
end
def out_degree_of(name)
@nodes[name].size rescue 0
end
def node(name)
@nodes[name]
end
end
When you run the above code, you get some interesting random texts.
Nearly all that science of the least permits me to hate long time that he took refuge was rekindled within me. 'By your departure of several nights." "You have it." "I do upon me. "I have come so strange sight of the judges from behind us along the gentle manners and be the south.
Nearly all associated with the old man, but you will necessarily keep her head to my door, and at no one request, of obtaining the date of madness by quitting his speech, interchanging each of genius and he, "I intended services towards him. "Autumn passed three weeks before every sight was useful to this deposition did hope that countenance.
Nearly all possible consequences of the very remembrance of medicine, and abhorred!) that of the town, when he said she. "How can that morning, before us, indicating that day more ominous and take place since found that he was obliged to intercept him of spring; all their gay apparel she might obtain absolution; but of the world and their cottage. "I have no one request, of this?