Czech as a Slavic Language Explore and Enjoy!

Languages, Alphabets, and Character Sets

The Slavic Subfamily of Indo-European Languages

The Slavic subfamily of languages is part of the Indo-European family of languages which has roots in Sanskrit. The Germanic, Romance, and Slavic language subfamilies spread over most of Europe.

The splitting in 395 AD of the Roman Empire into Byzantium and the Holy Roman Empire is reflected in the two alphabets used in Europe today.

The Language Gradient

Moving from the Atlantic Ocean eastward, European languages become softer; they contain more "soft" sounds.

The "soft" to "hard" transition of s , z, c .. to sh zh ch .. is called "palatalization" because the tongue moves to the palate when we switch from s to sh, from z to zh etc. While in English only fricative consonants do this, Slavic languages palatalize most consonants ( n, d, t ..) and even the semivowels l and r. Czech is unique in using palletized R, which is hard to do if you are pronouncing an English R in the back of your mouth. Czechs, like Scots, roll their R's in the front of their mouths.

Different languages/alphabets use different methods of dealing with the fact that we have only 26 letters in the Latin alphabet but 40 to 60 phonemes (sounds) in Indo-European languages. Three of these methods are:

  1. Adding new characters (used by Eastern Slavs)

  2. Using digraphs, a.k.a. bi-grams (used in English, Polish, Hungarian, . . .)

  3. Using diacritics (used in French, German, and Western Slavic languages)

All three methods are in use today:

  1. The Russian alphabet, Cyrillic, has single characters for soft fricatives (sh, zh, . . . ). It also has a special character, a "soft sign" which has the same function as the English h : it palatalizes (softens) the sound. It changes "hard" sounds to "soft" sounds, such as with l or n (as in "nyet");

  2. In English, soft fricatives are represented by digraphs containing the letter h: sh, ch in words like shell or church. The letter "h" plays the role of the "soft sign" although this term is not commonly used in referring to English sounds.

  3. In Czech, "softening" is indicated by a diacritic sign called a caron. In Czech the sign is called a "hacek" which translates as "hook," although this is not its proper technical name.

Additional references:

Czech Orthography

Until the 14th century, Czech was using digraphs to represent soft sounds, just as English does today.

Czech used z , rather than h, as the "soft sign".

(This is reflected in the English name of the language, people, and lands: Czech rather then Chech).
Hungarian and Polish continue to use z as a soft sign.

Jan Hus, in his book Orthographia bohemica in 1406-1412 A.D. introduced two diacritics:

  1. The caron, to indicate palatalization.

  2. The "acute accent," as in á, to indicate the length of the vowel.

Diacritics have been in use in Greek and Latin orthography since antiquity to specify the details of pronunciation
as explained e.g. in this long, academic, pdf document

Words with long vowels, e.g. Door, would be written as Dór using Hus' method.

Letters with the caron are not shown on this page because they would not appear correctly on a typical American computer.

The issue of representing different languages on computers is complex and descriptions often get technical, but users who want to extend their horizons beyond the Anglosaxon universe will want to understand basic concepts such as localization vs. internationalization of a computer. Briefly, this means differentiating between
1) selecting which language (and alphabet, keyboard,..) is native for the machine
vs
2) making your computer multi-lingual -- being able to understand and display wide repertoire of "foreign" characters while keeping the same native language for interacting with you.

Computers localized to English use a character set called (extended) ASCII or, more exactly, Latin-1. This has some accented characters but no characters to represent letters with a caron.

Your computer can be localized to English and still show foreign characters properly.
If you have Latin-2 fonts properly installed, you will see letters with carons. If not, your computer shows Latin-2 characters using the best match in the "native character set" of your computer. If you will be viewing pages that use Latin-2 fonts, it is definitely worth installing them.
See our information on Unicode, below, and our Technical References for more information.

The ability to read accented characters is handy when you are visiting the Czech Republic, as illustrated in Learning Czech: Hedgie's 10 Minute Tips.

Character sets

If this jump into character sets seems too daunting, see our easier essay Computers Can Speak Czech.

As a rather unfortunate consequence of the spread of computer literacy, every nation wanted computers which would use their national alphabet and their national keyboard.

This led to the creation of Latin-2, and then Latin-3 etc., all the way to Latin-15 . Later, when natives of these nations tried to communicate, using computers hooked to the Internet, they found certain incompatibilities.

That finally led to Unicode- a universal character set which has some 16,000 characters, enough for all the natural languages and even a few unnatural ones, such as IPA - the International Phonetic Alphabet.

All possible characters in all possible alphabets have been given names and numbers. Naturally, no one remembers all 16,000 names and numbers, and so they have all been entered into a large database.

The Eseti Keele Institute in Estonia linked the Unicode database to the Web. Thus you can search the Letter Database right here and now: You can enter the name of the character and get its number, its glyph (or grapheme), and its use in different languages .
Or you can enter the number of the character and get its name.

To know the character names is useful, if you write web pages using HTML language, since modern browsers can display the Unicode characters of different Latin sets, using the character numbers. If you are doing that, then you should also read our page on the encoding negotiation to use HTML page headers properly.

Search the Unicode database

The search is case-insensitive but requires an exact substring match.

This means, using as an example the zh (zcaron):

If you enter just z, you get large number of characters, all characters have z in their name, including Zero and z in Cyrillic and Greek ... .
You get exactly same result if you enter capital Z (it is insensitive to case of the letter)

If you enter just "caron" you get list of ALL letters with carons.

If you enter just "z caron" or z< or zh you get nothing.

That's because the proper and full name of this character in the Unicode database is

"LATIN SMALL LETTER Z WITH CARON"

and "z caron" is not a substring of the full name.

You should enter

"Z WITH CARON"
or "latin small LETTER Z"
or SMALL LETTER z with,

since those are exact substrings, when case is discarded.

You will then get (among other information, such as its full name) the Unicode number of this character, namely 382.

If you know the Unicode number, use the lower entry box to get the name of a character, (e.g. 8364 for euro or 382 for zh).

Search by Unicode name

  • Type your description:

Search by Unicode number

More Resources


Hedgie's Info Resources on Czech
Index to Hedgie's Language Section
Learning Czech: A Quick Guide, plus Resources
Essay: Computers Can Speak Czech
Technical References

------------------------

------------------------

To the FAQ Page
Back to the Front Page
Inside Home = Login page

© Copyright 2004-2005 Hedgehog Holding s.r.o.