When JSON encoding does not play well with character encoding

There comes a time for a developer to face a bug so weird that it leaves him/her speechless and scramble to find an answer wherever possible.

After seemingly innocent deployment to production, I noticed huge spike in errors like the following:

ActionView::Template::Error: "\xEB" from ASCII-8BIT to UTF-8

Offending line was JSON encoding of geo location information provided in request header by our cache provider. \xEB happens to be ß character in ASCII, so it was probably some German city name.

Our site’s character type is UTF-8, and Ruby 2.0’s default encoding is UTF-8. I also verified that Encoding.default_internal and Encoding.default_external are both UTF-8. It was our cache provider inserting non-UTF-8 character in request header.

No problem. I will just force UTF-8 encoding on the string.

Then, I was faced with another kind of problems.

ActionView::Template::Error: partial character in source, but hit end

Something is off.

Off to the best debugging method in Ruby – print the string encoding. And it’s ISO-8859-1 (or Latin-1).

So, the final solution was to force ISO-8859-1 encoding and encode it again in UTF-8.

string.to_s.force_encoding("ISO-8859-1").encode("UTF-8")

However, I am not 100% happy with the solution. It just smells. It works for now, but I need to look for better solution when I get a chance.

For bonus, check this out – The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Post to Twitter