The Slavic subfamily of languages is part of the Indo-European
family of languages which has roots in
Sanskrit. The Germanic, Romance, and Slavic
language subfamilies spread over most of Europe.
The Eastern Slavic nations, who accepted their written language along with Orthodox Christianity from Constantinople, use the Cyrillic alphabet.
The Western Slavs, who accepted Christianity from Rome, use the Latin alphabet.
The splitting in 395 AD of the Roman Empire into Byzantium and the
Holy Roman Empire is reflected in the two alphabets used in
Europe today.
Moving from the Atlantic Ocean eastward, European languages become softer; they contain more "soft" sounds.
The "soft" to "hard" transition of s , z, c .. to sh zh ch .. is called "palatalization" because the tongue moves to the palate when we switch from s to sh, from z to zh etc. While in English only fricative consonants do this, Slavic languages palatalize most consonants ( n, d, t ..) and even the semivowels l and r. Czech is unique in using palletized R, which is hard to do if you are pronouncing an English R in the back of your mouth. Czechs, like Scots, roll their R's in the front of their mouths.
Different languages/alphabets use different methods of dealing with the fact that we have only 26 letters in the Latin alphabet but 40 to 60 phonemes (sounds) in Indo-European languages. Three of these methods are:
Using diacritics (used in French, German, and Western Slavic languages)
All three methods are in use today:
In Czech, "softening" is indicated by a diacritic sign called a caron. In Czech the sign is called a "hacek" which translates as "hook," although this is not its proper technical name.
Until the 14th century, Czech was using digraphs to represent soft sounds, just as English does today.
Jan Hus, in his book Orthographia bohemica in 1406-1412 A.D. introduced two diacritics:
The "acute accent," as in á, to indicate the length of the vowel.
Words with long vowels, e.g. Door, would be written as Dór using Hus' method.
Letters with the caron are not shown on this page because they would not appear correctly on a typical American computer.
The issue of
representing different languages on computers is complex
and descriptions often get technical, but users who want to extend
their horizons beyond the Anglosaxon universe will want to understand
basic concepts such as
localization vs. internationalization of
a computer. Briefly, this means differentiating between
1) selecting which language (and alphabet, keyboard,..) is native for the machine
vs
2) making your computer multi-lingual
-- being able to understand and display wide repertoire of "foreign" characters while
keeping the same native language for interacting with you.
Computers localized to English use a character set called (extended) ASCII or, more exactly, Latin-1. This has some accented characters but no characters to represent letters with a caron.
The ability to read accented characters is handy when you are visiting the Czech Republic, as illustrated in Learning Czech: Hedgie's 10 Minute Tips.
As a rather unfortunate consequence of the spread of computer literacy, every nation wanted computers which would use their national alphabet and their national keyboard.
This led to the creation of Latin-2, and then Latin-3 etc., all the way to Latin-15 . Later, when natives of these nations tried to communicate, using computers hooked to the Internet, they found certain incompatibilities.
That finally led to Unicode- a universal character set which has some 16,000 characters, enough for all the natural languages and even a few unnatural ones, such as IPA - the International Phonetic Alphabet.
All possible characters in all possible alphabets have been given names and numbers. Naturally, no one remembers all 16,000 names and numbers, and so they have all been entered into a large database.
The Eseti Keele Institute in Estonia linked the Unicode
database to the Web. Thus you can search the Letter Database right
here and now: You can enter the name of the character and get its number,
its glyph (or grapheme), and its use in different languages .
Or you
can enter the number of the character and get its name.
To know the character names is useful, if you write web pages using HTML language, since modern browsers can display the Unicode characters of different Latin sets, using the character numbers. If you are doing that, then you should also read our page on the encoding negotiation to use HTML page headers properly.
The search is case-insensitive but requires an exact substring match.
This means, using as an example the zh (zcaron):
That's because the proper and full name of this character in the Unicode database is
and "z caron" is not a substring of the full name.
You should enter
You will then get (among other information, such as its full name) the Unicode number of this character, namely 382.
If you know the Unicode number, use the lower entry box to get the name of a character, (e.g. 8364 for euro or 382 for zh).
© Copyright 2004-2005 Hedgehog Holding s.r.o.