My answer to text-dynamo

As an exercise to practice Ruby, you can try to compete a random text generator using an underlying Markov chain model. The codes in the following github account are incomplete. You are supposed to fill in or create methods that will create randomly generated texts given seed texts.

http://github.com/eandrejko/text-dynamo

Markov chain is like a state machine, but the key is the what causes state transition only depends on the current state. In this case, how do you determine probability of selecting which word next? It’s quite simple. You go through the seed text and count frequency of next words, and that determines the frequency. For example, “am” is likely to folllow “I” most frequently. Next might be “do” or other verbs.

Continue reading

Korea will be left behind

It’s okay to be patriotic. You should love your country, but it should not keep you from being objective.

In one way, Koreans are kind of Xenophobic. It’s not exactly it because they don’t hate foreigners, but because they think they are better than others. It was very clear when I lived in Korea for two years. It’s media’s fault, which is pretty much propaganda machine for everything to do with Korea. Come to think of it, this kind of blind loyalty is rampant in Korea.

When you are in the middle of it, it’s really hard to tell others about different things. But it becomes crystal clear when you are outside Korea. Whatever Koreans think they are best at, people in other parts of world simply don’t care.

Now, you may ask, “why do you care?” I shouldn’t. What Korea does or doesn’t do doesn’t affect me. So, why? I used to ask that myself, and I found an answer. Because I am a Korean, too (well, 1/2 of me is. Not that I am mixed, but I just happened to live 1/2 of my life in the US). I didn’t want to care, but I can’t help it.

Anyhow, I think Korea is in big trouble. They will be completely left behind in 10 years or so. Because they don’t invest in important technologies. But, you might say, “C’mon. Korea has the highest rate of Broadband penetration! Their mobile technology is way ahead of the Continue reading

How to convert from MySQL to Postgres

I have been using MySQL for probably as long as I could remember. For Bloglation, search capability is an important feature since it’s hard to browse each post one by one. I will probably implement tagging functionality, but even so, it’s important to be able to search the contents with a keyword(s). While Ultrasphinx works well, Heroku only supports WebSolr… I was using acts_as_ferret using /tmp for index files, but the problem using the /tmp directory is that ferret index files most likely to disappear at some point.

Then, I found out that Postgres supports full-text search and since Heroku uses Postgres, I could use other plug-ins like acts_as_tsearch or texticle for free. Free is important to me, since it’s not making any money.

Searching online, there are various ways to do it like Pivotal Labs’ script or Heroku’s Taps gem, but I wanted to do it in an old way like AEdifice to check everything is going alright at each step.

1. First thing to do is to backup MySQL

For me, it was important to backup preserving encoding, since it had many different languages. First I pulled db from Heroku thinking that I Continue reading

Bloglation – Translate, Save, and Share!

Last Thursday, I released private alpha version of Bloglation, which lets a user translate any web page, save and share. It’s supposed to be private, but I need to get some good feedback from real users. If you are bi-lingual (or not) and interested in translating cool ideas, concepts and/or knowledge, please go ahead. And don’t forget to send me any comments/feedback you have.

I also wanted to maintain a separate blog just for bloglation. You can find it here.

Those who say it cannot be done shouldn’t interrupt the people doing it

Michael Arrington at TechCrunch had a new posting about Don Dodge’s forced departure from Microsoft, and in that post, he had a great quote.

What a great quote it is! In the same post, he had a link to his previous blog post about Yossi Vardi and another quote from Theodore Roosevelt. What a great quote! This is a kind of thing that drives entrepreneurs.

It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.

Installing acts_as_ferret with pagination and deploying on Heroku

OMG!

This shouldn’t have been this difficult, but it has because while there are many cool tutorials are out there, they are mostly outdated, and for some reason, the instruction on Heroku was not accessible.

While I picked acts_as_ferret because Heroku supports it, many seemed to prefer Thinking Sphinx. So, if you are not constrained (like me with Heroku), you should try that out too.

1. Install acts_as_ferret

Full instruction is outlined on github, so you should check it out. You can also find the installation instruction and complete list of methods here, too.

While the instruction asks you to put version name, since Heroku only has version 0.4.3 installed, specifying a version will break it.

Continue reading

Get WP-Syntax

I was just using blockquote, code and pre HTML tags for indicating codes in my blogs, but they just looked horrible!

But I just installed WP-Syntax, and it’s fantastic. Now, all the codes in my blogs should read much easier.

If you use WordPress and have lots of codes, you probably use some sort of markup tool already. If you don’t, get WP-Syntax.

Counting rows and modifying MySQL to work with Postgres or Heroku

Now I am moving on to Open Translation Project. I’ve done some translation work before, including one of Paul Graham’s essay – Why to not not start a startup. BTW, he finally made a link from the essay to my translation. I used Google Translate as base, but I couldn’t believe how bad the translation was. Yahoo’s Babel Fish was a little better, but not as much. That’s where I got the idea of creating this possibly massive project.

Anyhow, I wanted to find a way of selecting an article or blog that was translated the most. I had one model that stored basic information of original article/blog. Then its children are translations. So, I need to count rows of children with the same parent. In MySQL, I had the following statement in Rails.

@top_origs = OrigPost.find(:all,
                              :select => 'orig_posts.*, count(posts.id) as post_count',
                              :joins => 'left outer join posts on posts.orig_post_id = orig_posts.id',
                              :group => 'orig_posts.id',
                              :order => 'post_count DESC',
                              :limit => 5)

Continue reading