My answer to text-dynamo

As an exercise to practice Ruby, you can try to compete a random text generator using an underlying Markov chain model. The codes in the following github account are incomplete. You are supposed to fill in or create methods that will create randomly generated texts given seed texts.

Markov chain is like a state machine, but the key is the what causes state transition only depends on the current state. In this case, how do you determine probability of selecting which word next? It’s quite simple. You go through the seed text and count frequency of next words, and that determines the frequency. For example, “am” is likely to folllow “I” most frequently. Next might be “do” or other verbs.

There are multiple ways to do it, and I think more elegant way is to use two or more classes – one class for nodes and one class for edges. Each word is a node, and each node is connected via an edge. Weight is given to an edge, which represents probability. Weight could be just incremented by one, thus larger the weight is more likely that word (node) could be chosen. The edge is also directional.

I did much more simpler way, by creating hashes using a pair of keys. First key is each word, and next key is what the next word would be. The value is the weight. Thus, node[“I”][“am”] will probably have the biggest value, largest weight, and highest probability.


require File.dirname(__FILE__) + '/markov_chain'

class TextGenerator
  attr_reader :markov_chain
  def initialize
    @markov_chain =
  def seed(text)
    sentences(text).each do |sentence|
      words = sentence.split
      words.each_with_index do |word, index|
	if words[index+1]
	  @markov_chain.increment_probability(word, words[index+1])
  def sentences(text)
    return text.split(/(?=)\s+(?=[A-Z])/)
  def generate(start)
    @markov_chain.random_walk(start).join(" ")


require File.dirname(__FILE__) + '/weighted_directed_graph'

class MarkovChain
  attr_accessor :graph
  def initialize
    @graph =
  def increment_probability(a,b)

  def next_node(node)
    total = @graph.node(node).values.inject(0) {|sum, v| sum + v}
    n = rand*total
    @graph.node(node).each do |key, weight|
      return key if n < weight 
      n -= weight

  def random_walk(start)
    sentence = []
    node = start
    while @graph.out_degree_of(node) > 0
      sentence << node
      node = next_node(node)
    sentence << node



class WeightedDirectedGraph

  attr_accessor :nodes

  def initialize
    @nodes =

  def add_node(name)
    @nodes[name] ||= 
  def connect(a,b,weight=1)
    @nodes[a][b] = (@nodes[a][b] ||= 0) + weight
  def edge_weight(a,b)
  def contains?(name)
  def out_degree_of(name)
    @nodes[name].size rescue 0
  def node(name)


When you run the above code, you get some interesting random texts.

Nearly all that science of the least permits me to hate long time that he took refuge was rekindled within me. 'By your departure of several nights." "You have it." "I do upon me. "I have come so strange sight of the judges from behind us along the gentle manners and be the south.

Nearly all associated with the old man, but you will necessarily keep her head to my door, and at no one request, of obtaining the date of madness by quitting his speech, interchanging each of genius and he, "I intended services towards him. "Autumn passed three weeks before every sight was useful to this deposition did hope that countenance.

Nearly all possible consequences of the very remembrance of medicine, and abhorred!) that of the town, when he said she. "How can that morning, before us, indicating that day more ominous and take place since found that he was obliged to intercept him of spring; all their gay apparel she might obtain absolution; but of the world and their cottage. "I have no one request, of this?

Reblog this post [with Zemanta]

Post to Twitter

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see