When JSON encoding does not play well with character encoding

There comes a time for a developer to face a bug so weird that it leaves him/her speechless and scramble to find an answer wherever possible.

After seemingly innocent deployment to production, I noticed huge spike in errors like the following:

ActionView::Template::Error: "\xEB" from ASCII-8BIT to UTF-8

Offending line was JSON encoding of geo location information provided in request header by our cache provider. \xEB happens to be ß character in ASCII, so it was probably some German city name.

Our site’s character type is UTF-8, and Ruby 2.0’s default encoding is UTF-8. I also verified that Encoding.default_internal and Encoding.default_external are both UTF-8. It was our cache provider inserting non-UTF-8 character in request header.

No problem. I will just force UTF-8 encoding on the string.

Then, I was faced with another kind of problems.

ActionView::Template::Error: partial character in source, but hit end

Something is off.

Off to the best debugging method in Ruby – print the string encoding. And it’s ISO-8859-1 (or Latin-1).

So, the final solution was to force ISO-8859-1 encoding and encode it again in UTF-8.

string.to_s.force_encoding("ISO-8859-1").encode("UTF-8")

However, I am not 100% happy with the solution. It just smells. It works for now, but I need to look for better solution when I get a chance.

For bonus, check this out – The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Post to Twitter

One thought on “When JSON encoding does not play well with character encoding

  1. I had this issue just now, and a quick search brought me you post.
    Thanks for saving me the time to figure this out myself!

Leave a Reply to Jochen Cancel reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>