So you're on-line and you decide to check out some Czech Web sites. Maybe you want some news or a weather forecast, or maybe you're looking for a hotel. You expect words with diacritics and you know it's important to notice them, but you keep getting what looks like the insertion of incomprehensible garbage, or maybe a language you've never encountered. Words like "p?esto_e" crop up in the midst of what otherwise looks like Czech.
What's wrong? It's your character set. More precisely, it's your computer's character repertoire. U.S. computers often are blissfully unaware of languages and alphabets other than English. However, unlike you, your computer can be taught where to place the proper accents instantly and effortlessly.
But let's say you want to view Czech Web pages without changing over your keyboard. Is it possible? Yes.
Perhaps the most painless way to explain "character encoding" and how it differs from straightforward "font selection" is to look at the problem historically. Americans, being at the forefront of computer development, invented the ASCII (American Standard Code for Information Interchange) alphabet. As you might expect, being American, ASCII ignored languages other than English. It defined 94 printable characters used in English and 43 control functions, such as those attached to the "backspace" or "enter" keys.
The character sets for each of these language groups are served by many font families, such as Helvetica or Times Roman which, of course, come in different sizes and variants (such as italic or bold). Thus, you first select your ISO language group; that's called character encoding, or character set selection. Then you select a font family and its variants. Newer browsers, such as Netscape and Microsoft Explorer versions 4.0 or higher, now allow you to pick out different font styles for different character sets. In other words, they are "character encoding aware."
On the other hand, if your computer does not have Latin 2 fonts, or is not told to use them in the Web page design, it will use its default character set. For U.S. computers, as we mentioned, this is Latin 1. When you are rendering a Latin 2 page with a Latin 1 characters set, it's a bit like being 10 years old again and designing your own secret code. In the places where Latin 2 has z-caron (or z with hacek), Latin 1 substitutes "_" for the Czech letter. Thus "p?esto_e."
As we all become more international and our computers acquire more memory, we are ready for something more. And modern browsers can do more. They can display words written in all the languages and alphabets ever used on the same page -- a feature required and used by, for example, people involved in comparative linguistics.
Such a Universal Character Set or Unicode is defined by a different international standard, which includes all above-mentioned ISO groups and adds some more. This new standard requires about two times more computer memory to store text written in English than ISO Latin 1 does, and it is not yet widespread. However, all kinds of alphabets and about 16,000 characters already have been numbered, named and cataloged in a large database, which available on the Internet.
References and Related Links
Hedgie's references include sites that offer free downloads of Latin 2 fonts. (Installation of such
additional font files is an essential step in the internalization of your
computer.) We also describe the invisible information that should be
contained in a "properly designed" Web page, enabling a browser to
select the proper character set automatically. Unfortunately, about 80 percent
of Czech pages on the Czech Internet, pages in the top-level domain .cz,
are not "properly designed" in this sense.
The remedy, though, is simple: you just supply the missing information, the proper encoding, with three clicks of your mouse. Depending on which browser you're using, select "View," then "Font" or "Character set," and click on the closest equivalent to Central European ISO Latin 2 you are offered. Microsoft calls the nearest equivalent of ISO Latin 2 "Central European (windows) or win1250."
If downloading and installing applications is your idea of torture, and you just want to see those intriguing Czech Web sites correctly, nab a computer geek to internationalize your computer.
© Copyright 2004-2005 Hedgehog Holding s.r.o.