Tech Blog - Employee Portal
wildlink.com

Wildlink's Technology Blog
An occasionally updated list of informative articles.
Pulled from our internal wiki.

ColdFusion - Dealing with MS Word special characters
2016-01-18

Problem

The users love to paste text into textareas from word documents and unfortunately, MS Word loves to replace normal characters with non-ascii special characters like fancy quotes.

Browsers and popular fonts are doing better these days at dealing with the special characters, but they do still cause problems, so it often is better to just get rid of the "weird" characters in the data before it can cause problems elsewhere.

Solution

This function is included in most of our apps as a simple way to replace several of the common "weird" characters, then just bulk get rid of anything else.

string function stripWord(required string text) {
    var returnValue = trim(arguments.text);
    returnValue = replace(returnValue, chr(8220), '"', 'all');      // left quotes
    returnValue = replace(returnValue, chr(8221), '"', 'all');      // right quotes
    returnValue = replace(returnValue, chr(8216), "'", 'all');      // left '
    returnValue = replace(returnValue, chr(8217), "'", 'all');      // right '
    returnValue = replace(returnValue, chr(8211), "-", 'all');      // en dash
    returnValue = replace(returnValue, chr(8212), "-", 'all');      // em dash
    returnValue = replace(returnValue, chr(8226), "*", 'all');      // bullet
    returnValue = replace(returnValue, chr(8230), "...", 'all');    // ellipsis

    // now strip everything outside of "normal ASCII" range
    returnValue = REReplace(returnValue, '[^\x00-\x7F]', "", 'all');    // all non ASCII 0 - 128

    return trim(returnValue);
} // stripWord

This is included in our common_utils.cfm file in all apps.

A Caveat

This solution is very English specific. It can be expanded to be aware of accented characters from other languages, but currently as written it will just get rid of them.

Back to the Tech Blog