When JSON encoding does not play well with character encoding

There comes a time for a developer to face a bug so weird that it leaves him/her speechless and scramble to find an answer wherever possible.

After seemingly innocent deployment to production, I noticed huge spike in errors like the following:

ActionView::Template::Error: "\xEB" from ASCII-8BIT to UTF-8

Offending line was JSON encoding of geo location information provided in request header by our cache provider. \xEB happens to be ß character in ASCII, so it was probably some German city name.

Our site’s character type is UTF-8, and Ruby 2.0’s default encoding is UTF-8. I also verified that Encoding.default_internal and Encoding.default_external are both UTF-8. It was our cache provider inserting non-UTF-8 character in request header.

No problem. I will just force UTF-8 encoding on the string.

Then, I was faced with another kind of problems.

ActionView::Template::Error: partial character in source, but hit end

Something is off.

Off to the best debugging method in Ruby – print the string encoding. And it’s ISO-8859-1 (or Latin-1).

So, the final solution was to force ISO-8859-1 encoding and encode it again in UTF-8.

string.to_s.force_encoding("ISO-8859-1").encode("UTF-8")

However, I am not 100% happy with the solution. It just smells. It works for now, but I need to look for better solution when I get a chance.

For bonus, check this out – [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)][1]

Post to Twitter

One thought on “When JSON encoding does not play well with character encoding

  1. I had this issue just now, and a quick search brought me you post.
    Thanks for saving me the time to figure this out myself!

Leave a Reply to Jochen Cancel reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax