Internationalization Concerns - Java Script

If you are planning to create a Web site that can be accessed from anywhere in the world, or a Web application that can be installed anywhere in the world, internationalization is a concern. Entire libraries, available in numerous programming languages, help you with internationalization of software, ranging from typical C++ applications to Web-based systems. Companies spend hundreds of hours examining their Web sites and Web applications for internationalization purposes, but they often forget to examine JavaScript code.

Detecting language using JavaScript

You were introduced to the navigator object and its properties earlier. One of the properties that has not been discussed in detail is the language property, which returns the language and country code in which the browser is currently operating (for example,“en-us” for United States English):

var sLang =navigator.language;//won’t work in IE

Mozilla, Opera,and Safari/Konqueror all support this property, but Internet Explorer does not. Instead, Internet Explorer provides three properties: browserLanguage (indicates the language being used by the browser), userLanguage (essentially identical to browserLanguage), and systemLanguage (indicating the language of the operating system). The userLanguage property is essentially the same as language, so you can make a simple addition to the previous code to detect the language for all browsers:

var sLang =navigator.language||navigator.browserLanguage;

Using this code, you can determine if someone is viewing your page from a browser with an unsupported language setting and take appropriate action, such as redirecting the visitor to a more appropriate page:

This code checks to see if the language is French (represented as “fr”), and if so, it redirects to another page.

Strategies

The most important step in internationalizing your JavaScript is to avoid hard-coded strings. For example, don’t do this:

alert(“The date you entered is incorrect.”);

In this example, the string “The date you entered is incorrect.” is hard-coded.

When a value is hard-coded, its value cannot be changed without directly editing the line that uses it. Compare this with the following example:

This example places the message string into a variable called sIncorrectDateMessage. All other internationalized strings should be stored alongside this variable so you can change any and all values in only one place.

The best way to handle internationalized strings is to separate all strings into a separate JavaScript file (similar to the way JSP applications use properties files). Each language you support should have its own JavaScript file. For example, suppose you have three languages to support: English (language code en),German (de), and French (fr). Each language should have its own JavaScript file containing any strings necessary for the Web site or Web application. The easiest way to do this is to give each file a filename that differs only in the language code. For example, these filenames make selecting the correct file easy:

  • Strings_en.js
  • Strings_de.js
  • Strings_fr.js

Then, using a little server-side logic, you can ensure that the correct one is included. In PHP, you could do this:

This example assumes a variable named $lang contains the language to use and then matches it up against an array of supported languages ($supported). If the language is supported, the JavaScript file for that language is loaded; otherwise, the default (English) language script is loaded. This ensures that the correct JavaScript string values are used for the given language and that there is a default language to fall back on if an unsupported language is encountered.

String considerations

The first edition of ECMAScript introduced support for Unicode characters (which number upwards of 65,000 as compared to 128 ASCII characters), effectively assuring that ECMAScript can handle strings of any kind,including typically problematic double-byte characters.

What exactly is Unicode?

According to the official Unicode home page, “Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.”

Unicode was developed to provide a common encoding to handle all the characters that exist in the world. Prior to Unicode, each language had its own encoding, meaning that characters in different languages could be represented by the same code,so the letter A in English could use the same code as a different letter in a different language (obviously, not optimal).

Unicode represents characters as a 16-bit number, allowing for over 65,000 possible characters, making it an ideal solution to internationalization concerns.Additionally,the first 128 Unicode characters are,in fact, the 128 ASCII characters, making compatibility with older English-language applications much easier.

Representation in JavaScript

All Unicode characters, including ASCII, are represented in Unicode as a four-digit hexadecimal value prefixed with a u to indicate a Unicode character. For example,u0045 is the Unicode form of the E (which can also be represented using ASCII syntax as x45).

This representation of characters can be used in comments and strings in JavaScript just as you use special characters like n. For example: alert(“u0048u0045u004Cu004Cu004Fu0057u004Fu0052u004Cu0044”);

Not sure what this line does? It presents an alert with the text “HELLO WORLD” to the user. Using the Unicode character set, you can create messages in any number of languages. Even though the plain text form of such messages isn’t human readable, it’s still the only way to deal with multibyte characters from other languages.

Browser versus operating system support

Just because JavaScript can display and understand Unicode characters doesn’t necessarily mean the operating system can. Why should this concern Web developers who care only about what the browser supports, you may ask? The answer is because JavaScript uses some operating system functionality to do its job, although most developers never realize it. For internationalization,you must be aware of this very important boundary.

Any time you use alert(),confirm(), or prompt(),you are using an operating system dialog box. Unless the client operating system has foreign language support installed, you end up with a dialog full of gibberish. Most of the time, the browser reflects the language of the operating system, however you never can tell what individuals with do with their browsers.

When using operating system dialogs with internationalization, be aware that these problems can occur. When dealing with a distributed Web application, it may be enough to inform the customer of this limitation; on public Web sites, however, it may be best to avoid using these dialogs altogether.

Error-proofing strings

Oftentimes in internationalized Web pages, developers try to pass strings from a server-side variable into a JavaScript variable using a technique such as this:

This example uses JSP with the intent of outputting the string “Hello” into a JavaScript variable. When this page gets to the browser, you can view the source:

The output looks correct and the JavaScript functions as expected. But now consider another example:

Do you see the problem? The string that was outputted from the JSP contained quotation marks, which creates a syntax error in JavaScript. This is the most common mistake made when internationalizing Web pages that use JavaScript to output strings. You must be aware of quotation marks contained within strings if the string is to be output into JavaScript code. The best way to deal with this is to replace the quotation marks in all strings before outputting to JavaScript, such as in the following:

This example converts all quotation marks to a backslash followed by a quotation mark using the Java replaceAll() method. The first argument is a string representation of a regular expression(you’ll remember that regular expression strings must be double-escaped, so ” becomes ”); the second argument is identical,although this one is a string and not a regular expression. This effectively changes this string:

When this is output to JavaScript, you get a valid string:

This JavaScript code is syntactically correct and runs without error.

Use double quotes

Another common mistake is to use apostrophes to indicate strings instead of quotation marks. As you remember,JavaScript allows either to represent strings,so the following two lines of code are equal:

Just because JavaScript lets you use either syntax doesn’t mean you can use them interchangeably when you want internationalization. In fact, because apostrophes are much more common than quotation marks in everyday language (especially in languages like French), you run into the same problem we just explored with quotation marks, but far more often. Because of this, it’s considered best practice to only use quotation marks to represent strings.

Following the guidelines in this section ensures that your internationalized JavaScript code works seamlessly

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Java Script Topics