Chinese characters are not some kind of alphabet. It's like an intermediate lang...

lolinder · on Oct 28, 2024

> In fact many would argue Chinese languages were never unified (mandarin/cantonese/etc) but the scripts were.

This is, in fact, the default stance held by most non-CCP linguists. If you read what experts in the Chinese language family say, it's basically "Chinese languages are mutually unintelligible and more distinct than the Romance languages, but because the government of China says they're just dialects and we (as linguists) recognize that the line between dialect and language is basically arbitrary, we'll call them dialects so we can just study the languages and avoid getting sucked into nasty political discussions."

As the saying goes, a language is a dialect with an army and a navy—and this works both to define distinct languages that are otherwise mutually intelligible and to merge dialects that aren't.

niceguy1827 · on Oct 29, 2024

> In fact many would argue Chinese languages were never unified (mandarin/cantonese/etc) but the scripts were.

This is the correct understanding, even within mainland China, and across all times. The practice of assigning a "mandarin" based on where the capital is/was dates back at least hundreds of years, if not thousands. You can easily Google "Mandarin in Republic of China" to see the Republic of China's attempt to standardize its mandarin. It's really not a CCP issue.

int_19h · on Oct 28, 2024

Linguists often use terms like "isolect" these days just to dodge this whole debate and the associated (often very toxic) politics. Not just with Chinese - it's also an issue around e.g. Serbo-Croatian.

GranularRecipe · on Oct 28, 2024

I think it's also an issue of translation. "Dialect" in English is variety of a language. 方言 (fangyang, translated as dialect) in Chinese is a regional language or speech. Linguistically, Cantonese and other varieties of Chinese are recognised to be part of the Chinese language group / family.

Cf https://baike.baidu.com/item/%E6%B1%89%E8%AF%AD/22488993?fro...

djtango · on Oct 28, 2024

I'm sure there are plenty of scholarly academics with a whole body of literature to argue otherwise, but I've never bought all the arguments of how different Cantonese and Mandarin are.

To my mind assimilating Cantonese from Mandarin or vice versa is way easier than French <> Italian or Italian <> Spanish. Spanish <> Portuguese is an interesting contender.

Good luck pretending Mandarin and Cantonese are distinct languages while comparing German and French lol.

I say this as someone with a whole lot of Cantonese dna in my heritage before people get all up in arms. I've personally always figured the barriers to learning both Mandarin and Cantonese was cultural and there are plenty of people in Guangzhou who are perfectly bilingual

broken_clock · on Oct 28, 2024

How many of these languages do you speak?

I currently speak and understand English, Spanish, Cantonese and Mandarin to varying degrees.

I was forced to take French for 2 years in high school. Never even took it seriously. But because of that, after 6 weeks of private Spanish tutoring, I was able to hold hour+ long conversations with strangers while backpacking LatAm.

I've spoken Cantonese my entire life (but not truly native level). I took an entire year of college level accelerated "Mandarin for Other Chinese Language Speakers". I took it quite seriously. I'm backpacking China right now. I still can't even talk to anyone for more than a couple minutes without having to use a translator or look up words.

> there are plenty of people in Guangzhou who are perfectly bilingual

There are also plenty who move to Guangzhou and Shenzhen and can't pick up Cantonese at all. Turns out having an authoritarian government force Mandarin on you will make the Cantonese speakers bilingual rather quickly.

djtango · on Oct 28, 2024

I have learned to varying degrees Mandarin Teochew Cantonese Japanese Spanish French German (and Latin lol) I was shocked at how intuitive Italian was when simply walking around Italy after having a grounding in so many adjacent languages (and learning classical music).

Admittedly I am atypical in my exposure to languages and I do enjoy linguistics but it seems to me there's a high initial barrier to the dialects but after the initial wall is overcome it just becomes a mapping exercise and a handful of idioms.

I'd be curious to know which bits of Mandarin you find difficult? Vocab? The grammar is close enough that you have a huge advantage over almost every other language in the world especially for the everyday stuff. Reading and writing, if you know traditional you'll pick up simplified in no time (speaking from experience backpacking through China armed with only a paper dictionary we didn't have smartphones back in my day) the Cantonese tones are quite wild but if you can do tones you have a huge advantage of languages which don't have tones.

If I'm allowed an uncharitable take, my experience has been that a lot of people from China don't feel a drive to learn more languages maybe with the sole exception of English. Maybe it's the result of being in a country of a billion+ that all ostensibly speak the same language. I've always found it so frustrating encountering people who move to the UK to study and they can barely hold a conversation in English despite doing A levels, Bachelor's and Masters in the UK sometimes. For all the complaints that dialects are hard a lot of south east Asian people back in the day would pick up a handful of them and often learn the basics of other languages like Bahasa. This kind of mindset and interaction reminds me more of Europe in the sense that people are more adaptable out of necessity

cyberax · on Oct 28, 2024

> Chinese characters are not some kind of alphabet.

Yes, they are. Modern Hanzi are a very bad phonetic alphabet.

While a minority of characters are indeed pure logograms (小，大，田，etc.), most modern Chinese words are two-syllabic. And syllables often don't have meaningful connection to the meaning of the word: 东西（"east-west" literally, but means "a thing, object"), some characters have lost _any_ semantic meaning in most words (“子”), and many more characters can only be used as a part of another word ("bound forms", e.g. "据").

Classical Chinese was more logographic and less phonetic, but modern Chinese is not really close to it anymore.

est · on Oct 28, 2024

> Modern Hanzi are a very bad phonetic alphabet

alphabets, universally have one common property: they are sortable.

I challenge you to sort Chinese characters.

This is an idea from James Gleick's The Information. The Chinese may never be able to invent morse code alone, because encoding Chinese scripts is extremely hard, even today (think of all those massive code-points in CJK Unicode, with dups and errors)

Chinese text on the Internet may have some emulation of phonemes, but it's never systematically standarized. It just borrows some aspects here or there.

vntok · on Oct 28, 2024

> I challenge you to sort Chinese characters.

Chinese characters are in fact definitely sortable. There are multiple keys, the most popular ones being by stroke or by sound.

Example: https://en.m.wikipedia.org/wiki/Stroke-based_sorting

unscaled · on Oct 28, 2024

Chinese dictionaries have been sorted in various ways for at least two millennia, but there are some aspects which make alphabetic order sorting simpler:

1. Less ambiguous order: With classic Kangxi radicals for example, it's not always clear which radical to pick, and there is no clear order when there are multiple characters with the same radical and stroke count. There are other, more modern systems out there, but they all have some ambiguities.

2. Phonetic lookup: If you hear a word and don't know how to write it, you can just try to look it up phonetically. Unless the writing system is extremely perverse (I'm looking at you Ongloti, er, I mean English), you can kinda guess how it's written or at least how it starts. With Chinese characters that is not possible. Sure, Chinese dictionaries often have Pinyin or Zhuyin (Bopomofo) indexes, but Pinyin and Zhuyin are alphabets.

est · on Oct 28, 2024

good luck dealing with duplicates and hand-written variants.

tsimionescu · on Oct 28, 2024

That's a problem in most alphabets as well. Several Latin letters (and the number symbols we use as well) have significant differences between printed and handwritten versions, and several handwritten versions around (g and z have some of the most variations).

freilanzer · on Oct 28, 2024

> alphabets, universally have one common property: they are sortable.

Isn't this just an arbitrary order? Why could I not assign numbers to chinese characters and sort them? I know next to nothing about Chinese.

seanc · on Oct 28, 2024

The sort order of the alphabet symbols is arbitrary, but since all of the words are composed of an ordered set of symbols then sorting the words relative to one another is trivial.

est · on Oct 28, 2024

> Isn't this just an arbitrary order

yeah but there are very limited number of alphabetical letters and commonly agreed order as a convention.

There's no such a thing in Chinese. For example, you can't easily sort names by A-Z in Chinese except PinYin (or Unicode codepoints for what matters)

fwip · on Oct 28, 2024

Dictionaries written in Chinese exist. They are in a sorted order, just like English dictionaries, so users can quickly look up the word they have in mind.

https://en.wikipedia.org/wiki/Chinese_character_orders

est · on Oct 29, 2024

The thing is it's sorted only after PinYin is invented, sorta proves the point.

You can't easily compile an encoding out of it, but for alphabets it's intuitive to invent an index for each letter into dash-dots as morse code. It's extremely difficult to do so for Chinese.

Back to the topic, OP talks about "Character amnesia", if you think Chinese characters as emoji, yeah you talk about actions represented in emoji, but you forgot how it was drawn exactly. You can't sort emoji, and emojis don't generally have a sound.

unscaled · on Oct 28, 2024

Alphabet is a very specific thing: it's a small set of letters (usually less than 30) where each letter usually represents a single phoneme.

Sometimes a letter might represent a phoneme cluster (such as the letters "x" and "j" in English, that usually represent the consonant clusters /ks/ and /dʒ/ respectively). Sometimes there might be some ambiguity, like two letters being used for the same sound (both "c" and "k" can produce the sound /k/ in English) or one letter having two different pronunciations ("c" can be pronounced as either /k/ or /s/).

What distinguishes alphabets from all other similar written systems is that a single letter cannot represent a combination of a consonant and a vowel and that vowels can be independently represented by letters.

Other similar scripts are Abjad (like ancient Hebrew), where letters only represent consonants and vowels are implied from the context. The Ancient Hebrew script (which is different than the square Aramaic alphabet used to write Hebrew after circa 300 BC) is a later variant of the Proto-Canaanite script, an abjad which served as a basis to all later European alphabets (Etruscan, Greek, Latin, Runic and Cyrillic) and other Near Eastern alphabets (such as Aramaic, Arabic and Syriac). The only pre-modern alphabet (or abjad or abugida) I'm aware of that is not derived from Proto-Canaanite is Hangul (which is a true alphabet, unlike the two Japanese Kana).

Modern Hebrew and Arabic are mixed-alphabets, since some vowels can be represented by consonants, but not all of the vowels, and the letters that represent a vowel leave some ambiguity with regards to which vowels they represent (or whether they represent a vowel or a consonant).

The next type of similar system is abugida, which covers most of the Ethiopian, South Asian and South-East Asian scripts (Ge'ez, Devanagari, Tamil, Tibetan, Thai, Burmese, Khmer and many more). These are all probably derived from the Aramaic alphabet. In abugidas most letters represent a consonant that comes with default vowel (e.g. क in Devanagari used to write Hindi represents /ka/), but there are special diacritics that can modify a letter to have a different vowel (e.g. कॆ represents /ke/ in Devanagari) or even insert extra consonants or glides before the vowel. These combined forms together with the diacritics can get fairly irregular (especially in Ge'ez) and consonant clusters can become quite unwieldy and then about 80% of the consonants would just get dropped in Tibetan. But that's the general idea.

Then you've got syllabaries: these are pretty straightforward systems, where every letter represent a combination of a consonant (or a consonant cluster) and a vowel (sometimes a diphthong or a vowel with a glide). These scripts require you to remember more letters, but the combinations are simpler and more regular than most alphabets (let alone abjads and abugidas). This is the kind of writing system you see getting developed independently more often than others: Linear B, Japanese Kana, Cherokee, Vai, Yi.

Chinese characters are none of these. Characters never represent a single consonant or a stand-alone vowel that can combine with another consonant. In fact, bar few exceptions (such as 儿 in Mandarin) every character represents a full syllable and does not combine to form a syllable. But Chinese characters are not syllabaries either, since there are many characters that can be used to write each sound and they are not interchangeable with each other. A specific character has to be used based on the meaning of the word. This is how logographic writing systems works and modern Chinese is logographic language par excellence.

To appreciate that you have to compare Chinese characters with other logographic languages. Let's take Akkadian cuneiform (the writing system used for writing Babylonian and Assyrian) for example.

Cuneiform was first developed to write Sumerian, but this language was mostly dead by the times of Hammurabi (18th century BC), and it was a far-gone relic during the heyday of the Neo-Babylonian Empire of Nebuchadnezzar II. The Akkadians (i.e. the various Eastern Semitic language speakers of Mesopotamia) needed to write their own language with characters that represented Sumerian concepts, and they used the same methods modern Chinese (or Japanese) speakers use today: using a single Sumerian logogram in its own original meaning (but Akkadian pronounciation), transcribing a word using syllables that represent different words with same sounds and combining multiple logograms to form a new meaning. Like Japanese (but unlike Chinese), Akkadian cuneiform characters can represent a multi-syllable word and multiple logograms can combine to a new word with completely different (and unexpected) pronunciation. Akkadian is also commonly using logograms as word classifier (e.g to indicate geographical locations, gender, type of object and many other things[1]). These classifiers were written, but rarely (if ever?) pronounced.

Egyptian hieroglyphs, which I am even less familiar with than cuneiform, seem to have a far more developed system of classification (determinaties). They also seem to exhibit combinations of logograms to denote new meanigns and phonetic writing from a very early stage. In fact, Egyptian hieroglyphs, the quintessential "pictographic" in contemporary imagination, are mostly phonetic. Each hieroglyphs generally represents a cluster of 1-3 consonants, which probably came from the original pronunciation of the word it represented.

But this is like an abjad! And yes, the Proto-Canaanite abjad probably originated in a simplification of Egyptian hieroglyphs. And like abjads, which dveloped into mixed scripts (like Modern Hebrew and Arabic) and developed optional diacritics, Egyptian hieroglyphs also needed a method to disambiguate the multitude of similar-sounding words. And for that reason most phonetic Egyptian words (as far as I know) are accompanied by a logographic determinative [2] (classifer) that signifies whether it's the name of a God, a city, a house, a lotus flower, a lotus bud, another part of the lotus (stem, stalk or rhizome) or foxes skins. Yeah, these classifiers get rather specific. [3]

No system out there (including Japanese Kanji) is exactly like Chinese characters as used in Mandarin Chinese and other modern Chinese languages, but what I want to show here is that even though Modern Chinese is quite different from Classical Chinese, the writing system is still logographic. All logographic systems (including classical Chinese) have some phonetic features, at the very least in order to account for words that have no agreed-upon logogram. But what makes them logographic, is the pervasive use of logograms in a semantic role to disambiguate meanings.

[1] https://sumerianlanguage.tumblr.com/post/167245277900/hi-i-w...

[2] https://www.ucl.ac.uk/museums-static/digitalegypt/writing/sy...

[3] http://web.ff.cuni.cz/ustavy/egyptologie/pdf/Gardiner_signli...

dehrmann · on Oct 28, 2024

> Chinese characters are not some kind of alphabet. It's like an intermediate language (IL) of mind.

I realized this in Taiwan when I started being able to recognize characters, know what it means in English, and have absolutely no idea what the word is in Mandarin. The written language is almost orthogonal to the spoken one.

saithound · on Oct 28, 2024

> The written language is almost orthogonal to the spoken one.

I'm almost certain that this is true of Chinese script (after all, it was and is used for writing many languages!), but it might not be deducible based on this sort of experience.

I say thus because I had a very similar experience after I had to spend a month in the UAE. Thanks to frequent bilingual signs, I started recognizing common Arabic words, but I had no idea what the words are in Arabic or how to say them. But as far as I know, written Arabic is not at all orthogonal to spoken Arabic, every word is written exactly as it sounds.

cjohnson318 · on Oct 28, 2024

Arabic is significantly more difficult for English speakers than a Romance language, but you're still able to draw a straight line from symbols to sounds. Once you learn the alphabet, sounding words out in your head isn't difficult. (Naturally, you will not sound like a native speaker for a long time.)

sandbach · on Oct 29, 2024

It's true that Chinese words don't inflect, but not all the grammatical categories you list are missing. There are aspect markers like 了 and 正在, and nouns are definite or indefinite even if they're not marked as such by articles: 有 can only have an indefinite object, for example.

djtango · on Oct 28, 2024

> which makes the scripts very fast to parse

Yes my wife is bilingual and she thinks in English but prefers reading in Chinese because it's more terse