Computers can Speak Czech Teach Your Computer Czech

Computers Can Speak Czech

Internationalize your computer to read characters correctly

So you're on-line and you decide to check out some Czech Web sites. Maybe you want some news or a weather forecast, or maybe you're looking for a hotel. You expect words with diacritics and you know it's important to notice them, but you keep getting what looks like the insertion of incomprehensible garbage, or maybe a language you've never encountered. Words like "p?esto_e" crop up in the midst of what otherwise looks like Czech.

What's wrong? It's your character set. More precisely, it's your computer's character repertoire. U.S. computers often are blissfully unaware of languages and alphabets other than English. However, unlike you, your computer can be taught where to place the proper accents instantly and effortlessly.

Localization: The Y2Z Problem

If you are already in the Czech Republic, perhaps you have noticed that in many Czech Internet cafes, you can choose between Czech and English localization. Computers can be localized to several languages. However, switching brings certain problems. You might also have noticed that if the computer is localized to Czech, not only are all the messages and labels in Czech, but the keyboard set-up changes and you can't touch type in the same patterns as in English. Y becomes Z, for instance. Writing to your true love, "I miss zou verz much. Zou are mz onlz love." might alarm rather than enchant the recipient. If you are e-mailing using a Czech keyboard, you might want to include a disclaimer that your spelling is not due to an excess of beer but to this Y2Z problem.

But let's say you want to view Czech Web pages without changing over your keyboard. Is it possible? Yes.

Internationalize Your Computer

Instead of localizing your computer to several national languages, you can "internationalize" your computer. In this case the keyboard, commands and system messages remain the same -- in plain old English -- which uses the ASCII character set. All the previous functions will remain the same, but some character encoding-aware applications, like your browser, some e-mail programs and most word processors, will be able to render other languages correctly.

Perhaps the most painless way to explain "character encoding" and how it differs from straightforward "font selection" is to look at the problem historically. Americans, being at the forefront of computer development, invented the ASCII (American Standard Code for Information Interchange) alphabet. As you might expect, being American, ASCII ignored languages other than English. It defined 94 printable characters used in English and 43 control functions, such as those attached to the "backspace" or "enter" keys.

Keyboard of Babel

As other language groups demanded alphabets of their own, localization came next. Each nation or language had its own national keyboard. Later, languages with similar alphabets were grouped together and the original ASCII was extended to serve whole groups. The character set for all West European languages, including French and German, is now called ISO Latin 1. ISO stands for International Standards Organization. The second Latin alphabet, ISO Latin 2, also called "Central European ISO-8859-2," covers the languages of Central and Eastern Europe and includes Czech. ISO now defines more than 15 such language groups with its standard number 8859. The series now includes Cyrillic, Arabic, Hebrew, etc.

The character sets for each of these language groups are served by many font families, such as Helvetica or Times Roman which, of course, come in different sizes and variants (such as italic or bold). Thus, you first select your ISO language group; that's called character encoding, or character set selection. Then you select a font family and its variants. Newer browsers, such as Netscape and Microsoft Explorer versions 4.0 or higher, now allow you to pick out different font styles for different character sets. In other words, they are "character encoding aware."

Reading All the Languages of the World

In the process called "encoding negotiation," your browser can pick up invisible information about the required character set and switch automatically to the proper alphabet. This means pages served in different languages, even in languages from different ISO groups, will be rendered properly. Thus, if that Czech, French or Arabic Web page you open was designed properly and your computer has been internationalized, each page will be perfectly rendered.

On the other hand, if your computer does not have Latin 2 fonts, or is not told to use them in the Web page design, it will use its default character set. For U.S. computers, as we mentioned, this is Latin 1. When you are rendering a Latin 2 page with a Latin 1 characters set, it's a bit like being 10 years old again and designing your own secret code. In the places where Latin 2 has z-caron (or z with hacek), Latin 1 substitutes "_" for the Czech letter. Thus "p?esto_e."

As we all become more international and our computers acquire more memory, we are ready for something more. And modern browsers can do more. They can display words written in all the languages and alphabets ever used on the same page -- a feature required and used by, for example, people involved in comparative linguistics.

Such a Universal Character Set or Unicode is defined by a different international standard, which includes all above-mentioned ISO groups and adds some more. This new standard requires about two times more computer memory to store text written in English than ISO Latin 1 does, and it is not yet widespread. However, all kinds of alphabets and about 16,000 characters already have been numbered, named and cataloged in a large database, which available on the Internet.

That Little Czech Check mark

For example, that little check mark above some consonants, which Czechs call a "hacek" the English name a "caron." The letter z with this diacritic mark is officially called "Latin small letter z with caron "character ž." Five languages use it, including Bosnian, Czech and Upper Sorbian. Its glyph, or graphical image, is called z-caron.

References and Related Links
Hedgie's references include sites that offer free downloads of Latin 2 fonts. (Installation of such additional font files is an essential step in the internalization of your computer.) We also describe the invisible information that should be contained in a "properly designed" Web page, enabling a browser to select the proper character set automatically. Unfortunately, about 80 percent of Czech pages on the Czech Internet, pages in the top-level domain .cz, are not "properly designed" in this sense.

Conclusion

If your computer has been internationalized and Czech Web pages still do not load correctly, they probably have not been equipped with these proper invisible guidelines for your browser.

The remedy, though, is simple: you just supply the missing information, the proper encoding, with three clicks of your mouse. Depending on which browser you're using, select "View," then "Font" or "Character set," and click on the closest equivalent to Central European ISO Latin 2 you are offered. Microsoft calls the nearest equivalent of ISO Latin 2 "Central European (windows) or win1250."

If downloading and installing applications is your idea of torture, and you just want to see those intriguing Czech Web sites correctly, nab a computer geek to internationalize your computer.

Hedgie's Info Resources on Czech
Index to Hedgie's Language Section
Learning Czech: A Quick Guide, plus Resources
Technical References

------------------------

------------------------

To the FAQ Page
Back to the Front Page
Inside Home = Login page

© Copyright 2004-2005 Hedgehog Holding s.r.o.