The users love to paste text into textareas from word documents and unfortunately, MS Word loves to replace normal characters with non-ascii special characters like fancy quotes.
Browsers and popular fonts are doing better these days at dealing with the special characters, but they do still cause problems, so it often is better to just get rid of the "weird" characters in the data before it can cause problems elsewhere.
This function is included in most of our apps as a simple way to replace several of the common "weird" characters, then just bulk get rid of anything else.
string function stripWord(required string text) {
var returnValue = trim(arguments.text);
returnValue = replace(returnValue, chr(8220), '"', 'all'); // left quotes
returnValue = replace(returnValue, chr(8221), '"', 'all'); // right quotes
returnValue = replace(returnValue, chr(8216), "'", 'all'); // left '
returnValue = replace(returnValue, chr(8217), "'", 'all'); // right '
returnValue = replace(returnValue, chr(8211), "-", 'all'); // en dash
returnValue = replace(returnValue, chr(8212), "-", 'all'); // em dash
returnValue = replace(returnValue, chr(8226), "*", 'all'); // bullet
returnValue = replace(returnValue, chr(8230), "...", 'all'); // ellipsis
// now strip everything outside of "normal ASCII" range
returnValue = REReplace(returnValue, '[^\x00-\x7F]', "", 'all'); // all non ASCII 0 - 128
return trim(returnValue);
} // stripWord
This is included in our common_utils
module available in all 5.5 or newer apps (older apps will find it in the common_utls.cfm
file).
A Caveat
This solution is very English specific. It can be expanded to be aware of accented characters from other languages, but currently as written it will just get rid of them.