Linguistics <- StackExchange top 100

1: What’s the difference between phonetics and phonology? (score 397050 in 2011)

Question

Having practiced armchair linguistics for some years I should be able to sum up the difference off the top of my head, yet often I don’t know which term to use.

And looking them up on Wikipedia doesn’t help a lot…

Wikipedia on phonology:

Phonology is, broadly speaking, the subdiscipline of linguistics concerned with “the sounds of language”.

Wikipedia on phonetics:

Phonetics is a branch of linguistics that comprises the study of the sounds of human speech.

Can it be that the difference is that phonology deals with language sounds and phonetics deals with human speech sounds? And if so, well what does that mean?

Answer accepted (score 36)

Phonetics is about the physical aspect of sounds, it studies the production and the perception of sounds, called phones. Phonetics has some subcategories, but if not specified, we usually mean “articulatory phonetics”: that is, “the study of the production of speech sounds by the articulatory and vocal tract by the speaker”. Phonetic transcriptions are done using the square brackets, [ ].

Phonology is about the abstract aspect of sounds and it studies the phonemes (phonemic transcriptions adopt the slash / /). Phonology is about establishing what are the phonemes in a given language, i.e. those sounds that can bring a difference in meaning between two words. A phoneme is a phonic segment with a meaning value, for example in minimal pairs:

  1. bat - pat
  2. had - hat

Or in Italian:

  1. pèsca -> /ɛ/ = Peach (the fruit)
  2. pésca -> /e/ = Fishing (the activity)

Answer 2 (score 22)

My advisor, Dennis Preston, used to tell students that the ear hears phonetics, but the brain hears phonology. That is, your ear is capable of processing whatever linguistic sounds are given to it (assuming someone with normal hearing), but your language experience causes your brain to filter out only those sound patterns that are important to your language(s).

Of course, this summary simplifies things considerably. Phonologists are often as interested in patterns related to the manner of articulation as they are the patterns of the speech waves. Phoneticians, meanwhile, would have no way to analyze their data sets if they didn’t have phonological categories to help organize them.

Generally, phonetics is the study of fine grained details of those sounds, while phonology has traditionally dealt with analysis of greater abstractions. For understandable reasons, the line between the two discipliens is blurring, particularly as our modeling capabilities become more sophisticated. Still, the distinction is useful.

Answer 3 (score 15)

I think the big difficulty with the phonetics-phonology divide is not only that linguists don’t even really agree on the difference but also that there doesn’t exist a good analogy with any other pair of subfields.

This is the way I’ve seen it (cards on the table, although there are more extreme folks, I’m fairly far on the “phonology doesn’t exist” camp, and that is probably influencing my answers)…

Phonology is the study of the cognitive processes that turn words into instructions to hand down to the physical body parts that produce the sounds. These instructions, personified into human commands, might sound like, “close your lips, now move your tongue to touch your alveolar ridge; begin lowering the diaphragm at a normal rate and constrict the vocal chords to this degree”. On the acoustic side, phonology’s role is much harder to specify (at least to me), but I would say that the “phonology” center takes in sequences/matricies of interpreted linguistic features, for example “between 442-488ms, palatalization level 2”. Phonology would then turn that into the abstract “underlying” representations that can be mapped to morphological parsers and the lexicon.

Phonetics is the study of how the “commands” end up translating into specific articulator and vocal tract movements. For instance, how the command to retract the tongue at some particular time “really” maps to minute physical details like exactly when tongue section X touches mouth section Y and then in turn how that affects parts of the resultant acoustic signal. Phonetics also makes observations of how certain groups of instructions can cause very specific consequences. On the acoustic side, phonetics turns the mental spectrogram we receive from the nerve endings in our cochleas into feature sets and timings of the sort that it received from the phonological center during articulation.

Articulatory phonology is an attempt to consolidate the two, that, as far as I can tell, is basically phonetics taken one level deeper to receive underlying segments as inputs. And articulatory phonology moves a lot of what was in phonology proper as cognitive processes into physically motivated processes during articulation.

In short, nobody really knows the difference, but the broad agreement is that phonetics is lower-level and more articulator-centric and phonology is higher-level and more cognition-centric.

2: What’s the difference between syntax and grammar? (score 171722 in 2017)

Question

From what I’ve read, both terms have to do with the rules of formation of sentences. I’ve seen grammar used in mathematical contexts, in computability theory, where it has a precise definition. But from what I’ve read about syntax, I cannot see the difference between the two terms. So, what’s the difference? Or do they mean the same?

Crosspost at English.SE: What’s the difference between grammar and syntax?

Answer accepted (score 38)

Grammar is a (occasionally the) set of rules for the organization of meaningful elements into sentences; their economy, in one sense of that word.

There are two basic varieties of grammar; all languages have some of both kinds, but, depending on the kind of language involved, there’s a lot of variation in how much of each kind they have.

One part of grammar is called Morphology. It has to do with the internal economy of words. So a word like bookkeepers has four morphemes (book, keep, -er, -s) and is put together with morphology. English doesn’t have nearly as much morphology as most European languages; Russian grammar, for instance, has much more morphology than syntax. Russian is a synthetic (inflected) language.

The other part is called Syntax. It has to do with the external economy of words, including word order, agreement; like the sentence For me to call her sister would be a bad idea and its syntactic transform It would be a bad idea for me to call her sister. That’s syntax. English grammar is mostly syntax. English is an analytic (uninflected) language.

Answer 2 (score 5)

English Grammar and Syntax defines the two as follows:

Grammar is a set of rules that set forth the correct standard of usage in a language. These rules dictate how we should say things correctly. For example, agreement between words in relation to other constructions in the sentence.

Syntax is the study of sentences and their structure, and the constructions within sentences. Syntax tells us what goes where in a sentence.

Answer 3 (score 5)

English Grammar and Syntax defines the two as follows:

Grammar is a set of rules that set forth the correct standard of usage in a language. These rules dictate how we should say things correctly. For example, agreement between words in relation to other constructions in the sentence.

Syntax is the study of sentences and their structure, and the constructions within sentences. Syntax tells us what goes where in a sentence.

3: Examples of Linguistic Features? (score 84277 in 2015)

Question

I’m taking a course “Introduction to Translation” and while i’m reading about the things a translator should do before translating a text is to see what are the salient linguistic features in the text? Can you please provide me with examples on that?

Answer accepted (score 1)

Linguistic features is an extremely broad phrase. In context of translating, the things I would take particular note of include:

  • register: a variety of a language used for a particular purpose or in a particular social setting, a sociolect, ethnolect, etc.
  • dialect, slang,
  • jargon: “the technical terminology or characteristic idiom of a special activity or group”
  • tense / aspect : these may not have 1 - 1 mappings
  • lexical aspect
  • alliteration
  • puns & double entendres
  • idiomatic expressions
  • cultural proverbs, e.g. a kotowaza in Japanese
  • assumed social knowledge that is evident in certain language use, like detecting or displaying sarcasm, irony, and more nuanced tones in writing
  • changes in script, orthography, use of capitals, italicization, small caps, quotes, underlining: for example, how do you represent all capitals in a language with two letter cases, in a language that has only majuscules ?

In my opinion, those are things that would be examples when translating. However, feature has a plethora of meanings in linguistics. In semantics, it could refer to a semantic class or category. In phonetics, say, a place of articulation. In morphology, a lexical category or inflectional category.

4: Syntax trees for sentences (score 76347 in 2012)

Question

I am having trouble drawing a syntax tree for 3 sentences and I would appreciate it if someone could help me.

The sentences are:

  1. This giraffe reads books about psychopharmacology.
  2. Monarchs will fly to Mexico.
  3. The computer said that a fatal error occurred.

I would also be thankful if you can explain to me how to draw them, because I do not understand them very well.

Answer accepted (score 7)

The sentences are parsed as follows:

  1. This giraffe reads books about psychopharmacology.

    enter image description here
  2. Monarchs will fly to Mexico.

    enter image description here
  3. The computer said that a fatal error occurred.

    enter image description here

The trees have been made in the site phpSyntaxTree.

What is a tree?

A tree is a mathematical object consisting of a set of points called nodes between which certain relations hold. The nodes correspond to syntactic units; left-right order on the page corresponds to temporal order of utterance between them; and upward connecting lines represent the relation ‘is an immediate subpart of’. Nodes are labeled to show categories of phrases and words, such as noun phrase (NP); preposition phrase (PP); and verb phrase (VP). (Scholz et al 2011)

Answer 2 (score 2)

In my opinion, and I’m not entirely sure about the PP and the DetP (could be DP that splits into ‘this’ and NP ‘giraffe’), it should rather look like this (using Redford’s structure)

(I used http://mshang.ca/syntree/ to draw the tree.)

5: Does an IPA to ‘English’ translator exist? (score 73697 in 2013)

Question

IPA is really tricky to read, especially for beginners like me.

Are there any online tools that can almost ‘convert’ pasted IPA into phonetic pronunciations or similar?

I’ve tried Wolfram|Alpha which doesn’t seem to understand IPA, and a Google search wasn’t much help.

Any advice? Thanks.

Answer accepted (score 8)

I haven’t heard of the kind of program that you’ve described, but it might not matter, because there is no substitute for learning the IPA. You may find the following links to be helpful:

First, here are some links that have sound files to go with the IPA characters so that you can hear how they’re pronounced.

Consonants: http://www.yorku.ca/earmstro/ipa/consonants.html

Non-Pulmonic Consonants: http://www.yorku.ca/earmstro/ipa/nonpulmonics.html

Vowels: http://www.yorku.ca/earmstro/ipa/vowels.html

Diphthongs: http://www.yorku.ca/earmstro/ipa/diphthongs.html

Diacritics: http://www.yorku.ca/earmstro/ipa/diacritics.html

Other Symbols: http://www.yorku.ca/earmstro/ipa/othersymbols.html

Suprasegmentals: http://www.yorku.ca/earmstro/ipa/suprasegmentals.html

Second, there is at least one way of transcribing the IPA with ASCII characters, to wit:

http://www.let.rug.nl/~gilbers/onderwijs/tools/sampa.html

Third, here is an online IPA keyboard.

http://westonruter.github.com/ipa-chart/keyboard/

Answer 2 (score 3)

Preface: I don’t know of an online tool for this, and I agree that the real solution is to practice IPA.

That said, there are a number of alternative phonetic alphabets (as James Grossman mentioned, though SAMPA is probably worse than IPA). Some of them might be easier to read: ARPAbet (http://en.wikipedia.org/wiki/Arpabet) is relatively approachable, for example. However, it still requires some learning to be able to read quickly.

You’d also need either a dictionary containing both transcriptions to search for words in (extant words only), or an IPA-to-ARPAbet converter; the latter is actually nontrivial because the phones in an IPA string aren’t necessarily delimited, there’s not an exact correspondence between most phonetic alphabets, and IPA may be used for either phonemic or phonetic transcriptions.

Googling, I found this Praat script which appears to contain an IPA-to-ARPAbet conversion function (http://students.washington.edu/riebold/files/Arpabet%20Vowel%20Analyzer.praat); and this Haskell file which seems to contain ARPAbet-to-IPA (http://rd.slavepianos.org/sw/sw-83/Sound/SC3/Lang/Data/CMUdict.hs).

Answer 3 (score 3)

Preface: I don’t know of an online tool for this, and I agree that the real solution is to practice IPA.

That said, there are a number of alternative phonetic alphabets (as James Grossman mentioned, though SAMPA is probably worse than IPA). Some of them might be easier to read: ARPAbet (http://en.wikipedia.org/wiki/Arpabet) is relatively approachable, for example. However, it still requires some learning to be able to read quickly.

You’d also need either a dictionary containing both transcriptions to search for words in (extant words only), or an IPA-to-ARPAbet converter; the latter is actually nontrivial because the phones in an IPA string aren’t necessarily delimited, there’s not an exact correspondence between most phonetic alphabets, and IPA may be used for either phonemic or phonetic transcriptions.

Googling, I found this Praat script which appears to contain an IPA-to-ARPAbet conversion function (http://students.washington.edu/riebold/files/Arpabet%20Vowel%20Analyzer.praat); and this Haskell file which seems to contain ARPAbet-to-IPA (http://rd.slavepianos.org/sw/sw-83/Sound/SC3/Lang/Data/CMUdict.hs).

6: When should one use slashes or square brackets when transcribing in IPA? (score 65871 in 2011)

Question

When should one use /fubar/ and when [fubar] when transcribing in IPA? What are the differences?

Answer accepted (score 42)

Square brackets ([fubar]) are generally used for what is known as narrow transcription - this includes as much detail as the transcriber feels is necessary. Slashes (/fubar/) represent the broad transcription, which does not include “predictable” information.

For example, in English, voiceless plosives are aspirated word-initially and in stressed onsets. Thus, a narrow transcription of “cool” might be [kʰul], while a broad transcription would be /kul/. Similarly, “lack” could be represented as [læk] and /læk/ - note that broad /k/ can become narrow [k] or [kʰ], depending on its position in the word and surrounding sounds. Because this information is predictable by the above rule, the aspiration is left out of the broad transcription.

In linguistic description, the “broad” and “narrow” designations are defined somewhat loosely, generally according to whatever convention the linguist reporting the language feels is most useful. Under the Generative Phonology framework, however, these are assumed to represent two distinct stages of phonological processing, each with a psychological reality. Square brackets denote the final stage of processing (which is sent to the articulators), called “phonetic transcription”, while slashes denote the form stored in the mental lexicon (stripped of all predictable information), called “phonemic transcription”.

To perform this kind of analysis, first you must determine which sounds are constrastive in the output. For example, in English, [pʰ] and [p] are not contrastive, because [pʰæt] and [pæt] “pat” are judged to be “the same” word by (most) native speakers (even though [pæt] would be ill-formed). However, in Hindi, [kapʰi] “coffee” and [kapi] “copy” are two separate words. The generative hypothesis is that each set of non-contrastive sounds is stored as a single unit in the brain, called a phoneme (in slashes), which is transformed into a final form (in brackets) passed to the articulators by a series of serially-ordered rules or simultaneous constraints on the possible output forms.

Answer 2 (score 38)

Yes, /fubar/ is typically used for phonemic transcriptions, and [fubar] for phonetic transcriptions. But, just to clarify the terminology, phonemic vs. phonetic is not necessarily the same thing as broad vs. narrow transcription.

Many linguists talk about using both broad and narrow phonetic transcriptions, which just refers to the level of detail used in representing the actual speech sounds. A narrow phonetic transcription would represent every tiny little characteristic of the speech sounds as they were produced in an utterance of Language X, while a broad phonetic transcription would indicate some of the most salient characteristics of the transcribed phones, without being exhaustive but also without necessarily making claims about which segments are phonemically/phonologically contrastive.

Phonemic transcription, on the other hand, can really only be broad, in that it only represents the sounds that are purported to be contrastive in the given language, without any detail that is not directly relevant to forming these contrasts. Phonemic transcription does not describe how an utterance actually sounds when produced by a particular speaker speaking in a particular style in a particular situation - phonemic transcription is the ‘idealised’ representation of the speech sounds, and supposed to represent the underlying contrasts that are meaningful to speakers. You can only do phonemic transcription when you have already done quite a bit of work on Language X (collected a range of lexical items, found some minimal pairs and/or worked out the patterns of allophonic variation, etc etc) because phonemic transcription implies that you/someone has decided which phones relate to contrastive phonemes (different sounds that can occur in exactly the same environment) and which phones occur as the result of free/conditioned variation.

The examples given by @Alek Storm above illustrate the differences between [phonetic] and /phonemic/ transcription - just remember that phonetic can be broad and narrow.

Answer 3 (score 15)

In addition to slashes and square brackets, sometimes also used are double-slashes, pipes, and angle brackets. Their uses are:

  • Angle brackets — ⟨cats⟩ or cats or “cats” or cats — orthography

    Indicates a linguistic entity, like a word or grapheme, written according to a language’s orthography. Alternatively, the orthography is often given in italics or quotes, or simply not indicated.
  • Square brackets — [ˈkʰæʔt͡s] — phonetic transcription

    Indicates a transcription that records the phones (speech sounds) that are spoken, without attempting to classify them into phonemes. The amount of phonetic detail transcribed can vary; the example here is a fairly narrow (detailed) transcription. A broader transcription might be [ˈkʰæts].
  • Slashes — /ˈkæts/ — phonemic transcription

    Indicates a transcription that records only information asserted to be contrastive. This records a sequence of phonemes (as well as any phonemic suprasegmental features like stress or tone), ignoring allophonic differences. In this example, the fact that the initial /k/ is pronounced aspirated ([kʰ]) is ignored, because /k/ and /kʰ/ do not contrast in English.
  • Double-slashes or pipes — //ˈkæt z// or |ˈkæt z| — morphophonemic transcription

    Indicates a transcription that attempts to record the underlying sounds of morphemes, before they are combined to form words. This ignores sound changes conditioned on location, which may result in multiple phonemes, that get applied in actual speech. To illustrate, here, the final sound is written ⟨z⟩ (rather than ⟨s⟩ as in its pronunciation) because it represents the English morphophoneme //z//, used as the plural marker. //z// can be realized as /z/ (as in pigs or kangaroos), /s/ (as in cats), or /ɪz/ (as in horses). The posited morphophoneme cannot be an actual phoneme, because English distinguishes /z/ /s/ /ɪz/ in other contexts (like whores, horse, horiz(ontal)). (The choice of ⟨z⟩ for the transcription of this morphophoneme is arbitrary, but makes sense because it is realized as /z/ in the absence of the effect of an adjacent voiceless segment.)
More examples
  • Morphophonological processes are especially prominent in French. Example phrase: On a laissé la fenêtre ouverte. ‘We left the window open.’:

    • orthographic ⟨On a laissé la fenêtre ouverte.⟩
    • phonetic [ɔ̃.na.le.se.laf.nɛː.tχu.vɛχt]
    • phonemic /ɔ̃naleselafnɛːtʁuvɛʁt/
    • morphophonemic //ɔn a les e la fənɛːtʁ uvɛʁt ə//
  • French petit ‘small’: orthographic ⟨petit⟩, phonetic [pɵt̪ʲi], phonemic /pəti/, morphophonemic //pətit//

  • Standard Russian vowel reduction

    In the standard (Moscow-based) Russian accent, five vowels /i e a o u/ are distinguished in stressed syllables, but at most three /ɪɨ ɐə ʊ/ in unstressed syllables. /i e/ reduce to /ɪ/, /a o/ to /ɐ/, and /u/ to /ʊ/. Russian also has word-final obstruent devoicing and assimilation of voicing and palatalization. Russian orthography is roughly morphophonemic. Examples:

    • meaning: poppy, mage, poppies, mages, earth (nom. pl.), earth (gen. sg.), transport (infinitive), transports (present, 3sg)
    • orthographic ⟨ма́к⟩ ⟨ма́г⟩ ⟨ма́ки⟩ ⟨ма́ги⟩ ⟨зе́мли⟩ ⟨земли́⟩ ⟨вози́ть⟩ ⟨во́зит⟩
    • phonetic [ˈmak] [ˈmak] [ˈmakʲɪ] [ˈmaɡʲɪ] [ˈzʲe.mʲlʲɪ] [zʲɪˈmʲlʲi] [vɐˈzʲitʲ] [ˈvo.zʲɪt]
    • phonemic /ˈmak/ /ˈmak/ /ˈmakʲɪ/ /ˈmaɡʲɪ/ /ˈzʲe.mʲlʲɪ/ /zʲɪˈmʲlʲi/ /vɐˈzʲitʲ/ /ˈvo.zʲɪt/
    • morphophonemic //ˈmak// //ˈmaɡ// //ˈmak‿ʲi// //ˈmaɡ‿ʲi// //ˈzʲe.mlʲi// //zʲeˈmlʲi// //voˈzʲitʲ// //ˈvo.zʲit//

7: What is recursion? (score 63018 in 2013)

Question

What is recursion? I’ve looked at the Wikipedia’s explanation (recursion and then recursion in language) but that explanation is not really clear.

Answer accepted (score 5)

Recursion is a property of language.

From a Linguistics viewpoint, recursion can also be called nesting. As I’ve stated in this answer to what defines a language (third-last bullet point), recursion “is a phenomenon where a linguistic rule can be applied to the result of the application of the same rule.”

Let’s see an example of this. Consider the sentence:

Alex has a red car.

An application of recursion would give:

Alex, whom you know very well, has a red car.

And then:

Alex, whom you know very well, has a red car which is parked there.

And so on. This can go on endlessly, even if in real situations recursion will stop at a certain point, since the idea being expressed would get too confused. Recursion can also be applied to a noun and its adjectives:

Nice Alice.

And

Nice and cute Alice.

And again

Nice and cute Alice, sweet, gentle and considerate.

Answer 2 (score 1)

A definition which appeals to what is being defined is recursive.

For instance, a phrase structure rule for coordination of sentences which defines an S using S in the definition, S -> S “and” S (A sentence may consist of a sentence followed by “and” followed by a sentence), is recursive.

It is possible for a set of rules to be recursive, even no single one of the rules is recursive. For example the set of rules S -> NP V; NP -> “that” S is recursive, because S must be interpreted in order to give a full interpretation of S.

Similarly in programming, a procedure is recursive when among the procedures which are called to complete some computation is that very procedure itself.

In mathematics, a definition which appeals to the term being defined is recursive. For instance, “positive integer” can be defined as “1” or the successor of some positive integer.

Answer 3 (score 1)

A definition which appeals to what is being defined is recursive.

For instance, a phrase structure rule for coordination of sentences which defines an S using S in the definition, S -> S “and” S (A sentence may consist of a sentence followed by “and” followed by a sentence), is recursive.

It is possible for a set of rules to be recursive, even no single one of the rules is recursive. For example the set of rules S -> NP V; NP -> “that” S is recursive, because S must be interpreted in order to give a full interpretation of S.

Similarly in programming, a procedure is recursive when among the procedures which are called to complete some computation is that very procedure itself.

In mathematics, a definition which appeals to the term being defined is recursive. For instance, “positive integer” can be defined as “1” or the successor of some positive integer.

8: The Origin of the Word ‘God’ (score 56061 in 2012)

Question

I originally posted this a while ago on my blog, but someone recently suggested that I pose it as a question here.

A brief Wikipedia search on the origin of the word ‘god’ reveals the following:

The earliest written form of the Germanic word god comes from the 6th century Christian Codex Argenteus. The English word itself is derived from the Proto-Germanic * ǥuđan. Most linguists agree that the reconstructed Proto-Indo-European form * ǵhu-tó-m was based on the root * ǵhau(ə)-, which meant either ‘to call’ or ‘to invoke’.

Google came up with this link which presents a survey of various sources that attempt to decipher the origin of the word. It begins with a short summary of its conclusion, which contains the following sentence:

The word God is a relatively new European invention, which was never used in any of the ancient Judaeo-Christian scripture manuscripts that were written in Hebrew, Aramaic, Greek or Latin.

I am befuddled by the fact that they seem to have overlooked a very clear source.

In Genesis 30:11 - we read:

וַתֹּאמֶר לֵאָה בגד  בָּא גָד

And Leah said, “gad has come.”

Targum [Pseudo-]Jonathan interprets:

וַאֲמַרַת לֵאָה אָתָא מַזְלָא טָבָא

And Leah said, “the good ‘mazal’ (astrological sign / luck) has come.”

I would assume that if ‘mazal’ = ‘gad’ then we could have a pretty good indication of where the word ‘god’ came from.

If this isn’t good enough, note the word גדא which is mentioned several times in the Talmud. See, for example, Hullin 40a:

אמר אביי לא קשיא הא דאמר להר הא דאמר לגדא דהר דיקא נמי דקתני דומיא דמיכאל שר הגדול

…Abbaye said, "it is not problematic, for this is when he said [he was slaughtering it] for the mountain and this is where he said it was for the gada (referring some heavenly minister that some would serve as a god) of the mountain…

It is quite clear from the context that גדא דהר means just that; god of the mountain!

Does anyone have any knowledge that would help in figuring out whether or not these words (גד and god) are actually related, as they seem to be?

Answer accepted (score 19)

Hebrew is an Afro-Asiatic language, whereas Proto-Germanic is an Indo-European language.

Both superethnic and cultural groups (Afro-Asiatics and Indo-Europeans) were connected through the history of trade, migrations, and the written alphabet. Afro-Asiatic cultures had trade routes throughout the Middle East, Orient, and Asia; and Indo-European cultures had both migratory and trade routes throughout the Orient, the Middle East and Europe; both groups shared these routes for thousands of years. The first written alphabet was Phoenician (Afro-Asiatic), and it was used as a model for the development of the Indo-European alphabets.

But we don’t know of any relation between these two language families: they seemed entirely unrelated even in our earliest sources. If there is any relation, it is shrouded in the mists of remote prehistory (although some people hypothesise a prehistoric relation). Barring very strong evidence to the contrary, any similarity between a Hebrew word and a Germanic word must therefore be ascribed to coincidence.

had trade route interactions with Afro-Asiatic cultures for thousands of years

There are two theories about the origin of the Germanic word, which is still uncertain:

  1. It could be from Proto-Indo-European *ghau- “call, invoke”, as the entity that is invoked;

  2. Or it could come from Proto-Indo-European *gheu- “pour”, as the entity to which libations are offered.

Do you have any information about the origin of the Hebrew word?

Answer 2 (score 13)

Theological issues aside, I can see several wrong assumptions in your question.

“The word God is a relatively new European invention, which was never used in any of the ancient Judaeo-Christian scripture manuscripts that were written in Hebrew, Aramaic, Greek or Latin.”

It’s not entirely clear what you meant there. If you are talking about the English word “god”, then why would you expect to come across an English word in a Latin or Greek text? If you are talking about the concept of “god(s)”, then your assumption is inaccurate. It is a well-known fact that various Indo-European peoples had gods, attested by names of deities (Jupiter, Zeus, Perkunas etc.) and the word ‘god’, e.g. Latin ‘deus’, Vedic ‘devas’ etc. As a matter of fact, Father Sky is a very common deity in many IE cultures. A common motif is the main god fighting/killing some chthonic creature (a snake or a dragon).

Now about “Gad” and English “god”. The wrong assumption here is what Yuri Otkupschikov called “chronological scissors” (chronological disparity). You are comparing words that don’t belong to the same time period. Yes, in OE the word was “god” but looking at other Germanic languages you can’t help but notice that originally the root vowel was not “o” but rather “u”, e.g. have a look at this.

I’m not a Hebraist (nor a Semitologist) but here’s what I’ve been able to find about the origin of the Hebrew proper noun “Gad”. I strongly recommend looking it up at least in Klein 1987:

enter image description here

A general remark on etymology:

“A generally accepted principle (advocated by Meillet) permits only comparisons which involve both sound and meaning together. Similarities in sound alone (for example, the presence of tonal systems in compared languages) or in meaning alone (for example, grammatical gender in the languages compared) are not reliable, since they often develop independently of genetic relationship, due to diffusion, accident and typological tendencies” (Campbell 2004: 356, emphasis mine - Alex B.)

Executive summary: I am very skeptical of your hypothesis/evidence and my answer is no.

9: Why is English classified as a Germanic rather than Romance language? (score 47331 in )

Question

I am not a linguist. I do not know German nor French. The majority of English vocabulary is derived from Romance languages. Given these facts, I ask for a simple and convincing demonstration (using an example) that the “basic structure” of English is of Germanic, rather than Romantic origin.

Answer accepted (score -7)

Classification of languages is a historical thing, rather than a synchronic one. Just like genetic classification of humans—someone who marries into a new family and goes and lives with them is nonetheless still genetically related to the family they came from.

The majority of the total vocabulary in English may be borrowed rather than inherited, but the majority of the most common and basic vocabulary is inherited. This includes such things as numbers (all inherited), most pronouns (except they, which is Germanic, but not inherited), many basic non-administrative nouns (wood, name, stone, man, woman, ship, way, ox, hound, house, etc.), and many basic verbs (be, have, should, can, will, go, do, live, die, think, bear, etc.).

Much of this basic vocabulary is also among the most irregular in the language, which usually indicates it’s been around much longer. Words borrowed from other languages tend to be force-fitted into the borrowing language’s most regular morphology, while inherited words suffer no such restrictions.

If you go back to Old English from before the Vikings settled in England, you can clearly see a language (or several closely related languages, if you prefer) that has a high degree of resemblance to other West Germanic languages of the time, in almost every aspect. This (and the fact that this language can be reconstructed back to the common Proto-Germanic language that all Germanic languages go back to) is really the best direct indicator that English is genetically Germanic, rather than Romance. You won’t find a stage of English where it is almost identical to any stage of any Romance language.

Answer 2 (score 11)

Classification of languages is a historical thing, rather than a synchronic one. Just like genetic classification of humans—someone who marries into a new family and goes and lives with them is nonetheless still genetically related to the family they came from.

The majority of the total vocabulary in English may be borrowed rather than inherited, but the majority of the most common and basic vocabulary is inherited. This includes such things as numbers (all inherited), most pronouns (except they, which is Germanic, but not inherited), many basic non-administrative nouns (wood, name, stone, man, woman, ship, way, ox, hound, house, etc.), and many basic verbs (be, have, should, can, will, go, do, live, die, think, bear, etc.).

Much of this basic vocabulary is also among the most irregular in the language, which usually indicates it’s been around much longer. Words borrowed from other languages tend to be force-fitted into the borrowing language’s most regular morphology, while inherited words suffer no such restrictions.

If you go back to Old English from before the Vikings settled in England, you can clearly see a language (or several closely related languages, if you prefer) that has a high degree of resemblance to other West Germanic languages of the time, in almost every aspect. This (and the fact that this language can be reconstructed back to the common Proto-Germanic language that all Germanic languages go back to) is really the best direct indicator that English is genetically Germanic, rather than Romance. You won’t find a stage of English where it is almost identical to any stage of any Romance language.

Answer 3 (score 4)

If you don’t want to get into details of linguistics (which I take it you don’t) the best way to see the family resemblance is to take a comparative look at English’s closest linguistic relative found on mainland Europe: Frisian.

Some sample words in Frisian, English, Dutch, and German:

  • dei, day, dag, Tag
  • rein, rain, regen, Regen
  • wei, way, weg, Weg
  • neil, nail, nagel, Nagel

Frisian is of course indisputably a Germanic language, and just from the above its pretty clear both that these very basic words are all related, and that the Frisian variant looks far closer to the English than the other two.

As someone who has never learned other languages, it might be an easy mistake to think that vocabulary is all there is to a language. However, that would be wrong. There’s far far more going on structurally in a language than simple word choice.

Delving into the murky waters of linguistics a bit more, we find that Germanic languages actually share a lot of pronunciation and structural features that are not found in Romance languages. Taking it further, West Germanic languages share features not found in North Germanic languages, and Anglo-Frisian languages share features not found in the other West Germanic languages. Based on all that, its fairly easy to classify English as Germanic, further as West Germanic, and further still as Anglo-Frisian.

10: Drawing tree diagrams of ambiguous sentences generated by a CFG (score 43610 in 2014)

Question

Suppose I have the following CFG rules:

S -> NP VP
NP -> (D) NOM
VP -> V (NP) (NP)
NOM -> N
NOM -> NOM PP
VP -> VP PP
PP -> P NP
X -> X+ CONJ X

How should I draw the tree diagram of this sentence?

Most cats and dogs with fleas live in the neighbourhood.

As I understand it, the sentence has two different interpretations, with one attributing fleas to both cats and dogs and the other only to dogs.

For the case of attributing fleas to only dogs I drew the following diagram:

enter image description here

Now for the second interpretation, I figured I should add the NOM substructure to the first NOM before CONJ:

enter image description here

This seems to be correct to me, because it reflects the fact that the attribute is equally applicable to both NOMs. But on the other hand, I am writing words on the added branch that aren’t there (have been deleted for avoiding repetition) and the tree reads a longer sentence.

Is my parsing correct and if not how should I make it work right?

Answer accepted (score 2)

If you are looking for parse trees according to your grammar, the first tree seems correct, up to the missing N symbols, and a missing NOM above fleas.

However your second diagram should look as follow, drawing only the relevant part that changes (i.e. under NOM_1):

            NOM 1
         __/   \___
        /          \
     NOM            PP
    / | \_         /  \
   /  |   \       |    |
NOM  CONJ  NOM    P    NP
 |     |    |     |    |
 N    and   N   with  NOM
 |          |          |
cats       dogs        N
                       |
                     fleas

Now, If you wanted to draw both trees in a single diagram, that is possible, but significantly more complicated. But I doubt your book does anything of the kind.

11: How to distinguish Korean “ㅔ” /e/ and “ㅐ” /ɛ/? (score 41840 in 2012)

Question

I’ve always had trouble with the distinction between the “e”-like vowels in European languages: /e/ vs /ɛ/. But pronouncing them the same has never caused me any problems.

In fact I don’t even know whether my English “short e” is /e/ vs /ɛ/. I seem to recall it varies by English variety, even for IPA use (I always use /e/ for English IPA). In my idiolect there may even be some kind of merger. “Head” is /hEd/ and “haired” is /hEːd/ where E could be either e or ɛ - I’m not sure.

So now I’m in Korea trying to improve my Korean. Up until now I had always pronounced “ㅐ” as /æ/ and “ㅔ” as whatever my English “short e” is.

But lately people have been correcting me and telling me “ㅐ” should be what to me sounds like “short e”.

Having done some reading I find Korean doesn’t have /æ/ as I’d thought, but has two contrasting vowels that would both fall into the “short e” category for my idiolect:

“ㅐ” is /ɛ/ and “ㅔ” is /e/.

How can I learn to distinguish these sounds correctly, both for listening and speaking?

If I learn it for Korean it will also help for my linguistics generally.

Are there some minimal pairs in Korean I can practice with with my native Korean speaking friends here? (It’s not easy trying to explain to non-linguists with imperfect English what minimal pairs are.)

Answer accepted (score 7)

There’s a book called The Sounds of Korean [1] with an accompanying CD which is invaluable for getting the phonetic distinctions right.

Mechanical snail is right in that the distinction is being lost, particular among the young. However, in speakers who maintain the distinction, it sounds like a lowered [e]. I had a little look on Forvo for examples of speakers who maintain the distinction, but all the ones I checked were from people who merged the distinction.

As for basic minimal pairs, explain to your friends that you’re interested in the difference between 새 (new) and 세 (combining form of 셋), and they should be able to come up with more.

EDIT: I have a few more minimal pairs for you.

  • crab vs 개 dog
  • 세 집 three houses vs 새 집 new house

The book also observes that while the distinction is not reliably made by many speakers’, there is consistency when transcribing English words: English [æ] is reliably mapped to ㅐ while [eI] and [E] are mapped to ㅔ.

[1] Choo and O’Grady. The Sounds of Korean. University of Hawaii Press.

Answer 2 (score 6)

Here are the minimal pairs of more than one syllable that I could find in the English Wiktionary using a custom application I wrote in JavaScript:

    1. 모레 (more)
      the day after tomorrow
    2. 모래 (morae)
      sand
    1. 새로 (saero)
      anew, newly, for the first time
    2. 세로 (sero)
      height, length, vertical

I also found twenty minimal pairs of just a single syllable that I’ll include if requested.


I tried the “crab v dog” test mentioned by jogloran with some Koreans here in Seoul.

  • Two guys in their 20s who I think are from Seoul both insisted they sound the same.
  • Another friend who is about 40 and not from Seoul insisted they sound different. He pronounced “crab” with a short sound like in English “bet” and “dog” with a long sound like in non-rhotic English “bear”.

Apparently both the vowel length distinction and the ㅔ vㅐ distinction are in the process of disappearing and it’s happening in Seoul before elsewhere. This is pretty much just what Wikipedia says.

Answer 3 (score 2)

In many Korean dialects, there are no sound difference between ㅐ and ㅔ. I mean almost every Korean pronounce those same. Of course, the standard pronunciation rules in both South and North Korea don’t allow it.

12: Syntax Trees examples (score 39994 in )

Question

I just try syntax trees and realize that I have a few problems. I have a problem especially with two examples because I am very unsure how to handle the cases. In case 1, I do not know how to deal with fixed terms such as “the church of England”. And in case 2, I don’t know how to deal with “the girl who left us” These are my solutions. Would someone kindly make me more understandable? I would be very grateful!

Case 1

Case 2

Answer accepted (score 5)

Although what is “correct” always depends on theory, there are various things that are definitely not quite right with your trees.


Tree #1

the founder of the church of England

The whole thing taken together is an NP (it starts with a definite article and can serve as the subject of a sentence, so it is something nominal, not prepositional), so the root of the tree should be labelled NP rather than PP.

In general, an XP must always have an X as its head.
Thus, when there is an NP, there must be an N as the head, and for a PP, there is a P head. This principle is not always follwed in your trees.

The same goes for NPs. Now I don’t know what theory you are using, because there are basically two opposing approaches:

  1. Make the whole thing an NP, i.e. a phrase with an N head to which the determiner is a specifier:
    enter image description here
    The head of the NP is the N “church”. The DP consisting of the D “the” is a specifier because it is the sister of N’ and daughter of NP.

  1. Make the whole thing a DP, i.e. a phrase with a D head to which the noun phrase is a complement:
    enter image description here
    The head of the DP is the D “the”. The complement of this D head is an NP which consists of the single N head “church”.

I will not go into a discussion of the motivations of each approach (and neither into a discussion about whether you should leave redundant bar levels away), but you need to decide what your phrase and its head is supposed to be. Having an NP branching into a D and an N violates the X-bar scheme because a phrase must have an identifiable head and can not branch into two lexical items (D and N); one of them must be an X’ or an XP. Either you make it an NP with an N head and the DP as a specifier, or you make it a DP with a D head and the NP as a complement.
Assuming that you want to have the whole as an NP, I’ll continue with the first approach.

So a first rudimentary picture of your tree looks like this:
enter image description here

You can now argue about whether the PP “of the church of England” is an adjunct rather than a complement, but in this case I find the latter approach more plausible. So within N’, we have an N head “founder” and a PP complement “of the church of England”:
enter image description here

Now about the PP. As said above, the head of the PP must be a P of which the complement is an NP, thus:
enter image description here

The NP “the church of England” again branches into the determiner and the N’ “church of England”:
enter image description here

Within this N’, “church” is the head and “of England” is a PP complement to the N head “church”:
enter image description here

Again, you could also argue about making the PP “of England” an adjunction, but here too I find a complement more plausible.

The PP “of England” itself looks similar as the other PPs, with the difference that the NP “England” doesn’t have a DP specifier:
enter image description here

And now you are done with your tree.
The whole phrase is an NP, of which the head is the noun “founder” and the PP “of the church of England” is a complement with a P head “of”. The determiner “the” is located in specifier position to the NP. the PP “of the church of England” later branches into another PP “of England”.


Tree #2

the brother of the girl who left us

I’ll keep my explanation a bit briefer here.

Similarly as above, you have an NP in which the N’ consists of the N head “brother” and a PP complement “of the girl who left us”:
enter image description here

Within the PP, the complement NP “the girl” is modified by adjunction of the relative clause “who left us”:
enter image description here

It is also possible to locate the relative clause as an adjunct to the N’ “girl” rather than the whole NP “the girl”:
enter image description here
For reasons that are too complicated to discuss here, I will assume adjunction to the NP rather than N’.

The difficult part now is how to handle the relative clause “who left us”. The assumption is the following:
Within the relative clause CP, the relative pronoun “who” is assumed to start in the subject position, i.e. in the specifier position of IP (SpecI), because the NP it refers to (“the girl”) is the subject of the sentence:
enter image description here

This NP pronoun is then moved to the specifier of CP (SpecC) to get into the position of a relative pronoun:
enter image description here

The moved pronoun leaves a trace (t_i) and is now located in SpecC position, where it serves as a relataive pronoun referring to “girl”.

The tree as a whole thus looks as follows:
enter image description here

To summarize, the whole expression is an NP, where the head N “brother” has a PP complement “of the girl who …”, and within that PP complement, the NP “the girl” is modified by adjunction of a relative clause CP in which the NP “who” was moved from SpecI to SpecC to serve as a relative pronoun referring to “the girl”.


General remarks

  • My proposal is not the one and only gold standard solution; there can not be one. Details of what a tree looks like always depends on theory. In particular,

    • there are opposing views on how to account for determiner + noun (making it an NP, as I did, or a DP, with consequences for their internal structure)
    • whether to omit redundant bar levels (as I did),
    • how to label the relative clause (CP or S or …) ,
    • where to attach the relative clause (as an adjunct to the NP “the girl” or as an adjunct to the N’ “girl”),
    • whether the PPs act as complements or as adjuncts to the NPs.
    Which solution is deemed correct depends on what theory you are using.
  • You really should take a look again into the basics of how phrase structure trees work. For example, having a VP with a P head, as you did in your second tree, makes absolutely no sense. It seems that there are some substantial assumptions about phrase structure trees that are not quite clear to you yet.
    You must always make sure that the labels of your (sub) trees are in accordance with what is in the tree: A PP consists of a P and an NP complement, if you have an NP, then this must have an N as its head, and an expression “the brother of …” is certainly not a VP.
    Once you gained a better understanding of how phrase structure trees work, what a phrase consists of an what relations hold between constutuents, it will get far more obivious to you how to assign a sentence a tree structure.

Answer 2 (score 2)

The two sentences are Nps

check this:

it cant be a PP because it clearly starts with an NP and “Of” cannot be separated from “The church” because both of them construct a PP (even in the meaning ‘of’ will be meaningless,and you will ask: of what?)

“the Church of England” is an NP that contains a subordinate PP which is [Of England]

Answer 3 (score 1)

lemontree has given an excellent answer. I agree with her that you should take another look at the basics, and I think you’d benefit from a quick review of two major concepts: constituency and heads. I’ll give a quick summary of two concepts - please consider this a supplement to lemon’s answer. :)

Describing constituency is the main purpose of syntax trees. The subset of a tree which is dominated by a single node - such as ‘of the Church of England’, which is a PP in the final version of Tree #1 in lemon’s answer - is known as a constituent because it acts as a unit in syntax. We can look for evidence for constituency in various ways, for example:

Topicalisation: [Of the Church of England] he is the founder; of the church of France he is not.

Cleft: It is [of the Church of England] that he is the founder.

Pro-form substitution: I’m talking about the Church of England itself, not the founder [thereof].

Conjunction (warning: less reliable test, must always be used with other tests): He is the founder [of the Church of England] and [of KFC].

Whenever you introduce a node in a syntax tree, you should ensure that the material it dominates form a constituent. In your tree, it’s not at all clear that ‘the founder of’ is a constituent.

Topicalisation: *[The founder of] he is the Church of England.

Cleft: *It is [the founder of] that he is the Church of England.

Pro-form substitution: *I’m talking about the head of the Church, not [some pro-form] the Church.

Conjunction: He is [the founder of] and [the head of] the Church. (See, I told you the conjunction test is less reliable :P)

That’s why ‘the founder of’ is not a constituent.

The other major concept that needs review is the head. The head of a phrase doesn’t have a precise, necessary-and-sufficient-conditions definition, but it does have a bundle of common properties that sometimes, though not always, coincide. In general, the head is the only element that always appears the kind of phrase it heads (except in really special cases like ellipsis), it gets to ‘choose’ (subcategorise for) its dependents other than adjuncts, and it governs its dependents.

As you can see, identifying the head isn’t always straightforward, and that’s why linguists disagree on whether NP dominates DP or DP dominates NP, or whether complement clauses are CPs or Ss, or whether sentences are IPs or TPs or Ss or something else. Zwicky (1985) lists five different ‘head-like notions’:

  • Semantic argument: In a phrase X+Y, if X+Y describes the kind of thing described by X, then X is the head. For example, in the NP ‘the Church of England’, the whole phrase describes a church, so ‘Church’ is the head.
  • The subcategorisand: In a phrase, the slot that must be listed in the lexicon and subcategorises for its sisters is the head. In the NP ‘the Church of England’, ‘the’ is a functional morpheme and ‘of England’ is a phrase; ‘Church’ is definitely listed in the lexicon and is thus the head. (Zwicky actually gives a different analysis here that favours the DP hypothesis, but let’s not get into that….)
  • The morphosyntactic locus: The bit in constituent X that carries inflectional information about the grammatical relation between X and other constituents is the head. English being an isolating language, this isn’t very obvious in your sentences, but it picks out e.g. the word ‘is’ in ‘He is nice’.
  • Government: The head governs its dependents: The depends get markers for appearing with the head. For example, in ‘They are nice’, the copula governs ‘they’, requiring a nominative form.
  • Concord: Dependents agree with the head: For example, in French ‘le nez’ (ART.DEF.M nose), ‘le’ agrees with ‘nez’ in being masculine, so ‘nez’ is the head.

In any event, though linguists frequently disagree on which constituent in a phrase is the head, one thing is clear: In an endocentric constituent, it is always the head that determines the syntactic category of the phrase it heads, and thus a P cannot head a VP. (Note, however, that not all theories require all phrases to be endocentric.)

13: Weird behavior of two fruits’ names (ananas/pineapple, banana/plátano) (score 39842 in 2011)

Question

Some time ago I found two tables that reported the names for two fruits, which were supposed to be funny, because they specifically reported a single exception among those several languages, where this fruit’s name was different for only one of those languages.

I cropped the images to avoid space problems:

the fruit that is ananas in many languages is pineapple in English onlythe fruit that is banana in many languages is plátano in Spanish only

They make one smile, but if we try to analyse the matter under a Linguistics point of view, it gets interesting, deserving to have a look into it.

So my questions are:

  1. Why did two languages, English and Spanish, develop an alternative word to denote those fruits?
  2. And why did just one language (per case) behaved like this? I mean, why not also some other Romance language for one case or some other Germanic one for the other case?

N.B. I just asked one question because I think these two occurrences are related, but if someone finds out they are two different linguistic phenomena, I can split them up. Also, feel free to retag, if necessary.

Answer accepted (score 33)

I’m going to answer about just the words for “pineapple”. In short, you’re seeing two phenomena:

  • A new item is introduced to the world, so a new name is needed to go with it. Whichever name is used by the introducers gets adopted in many other languages. See coffee or computer for some other examples of this.
  • The dataset you have above only includes examples that show the author’s intended pattern and excludes examples that don’t fit.

Let’s start by looking at the dataset. Clearly there’s a bias towards ananas-type words than towards pineapple-type words, but we don’t know why. One possibility is that the collector of this dataset simply included only languages with ananas and discarded any examples of pineapple (apart from English).

So let’s get some data of our own. I’m using Google Translate to pull up words for pineapple in all the languages they provide. (With a few restrictions: there must be a result other than what I typed in English, and there must be a romanization available.)

  • Afrikaans: pynappel
  • Armenian: ark’ayakhndzor
  • Azerbaijani, Belarusian, Bulgarian, Croatian, Czech, Danish, Dutch, Finish, French, German, Icelandic, Italian, Macedonian, Maltese, Norwegian, Polish, Romanian, Russian, Serbian, Slovenian, Swedish, Turkish, Ukrainian: ananas
  • Basque, Galician, Spanish: piña
  • Bengali: Ānārasa
  • Catalan, Filipino: pinya
  • Chinese: bōluó
  • English: pineapple
  • Estonian: ananass
  • Georgian: ananasi
  • Greek, Portuguese, Slovak: ananás
  • Gujarati: Anēnāsa
  • Haitian: anana
  • Hindi: Anannāsa
  • Hungarian: ananász
  • Indonesian, Malay: nanas
  • Irish: anann
  • Japanese: Painappuru
  • Kannada: Anānas haṇṇu
  • Korean: pain-aepeul
  • Latvian: ananāsu
  • Lithuanian: ananasas
  • Swahili: mananasi
  • Tamil: Aṉṉāci
  • Telugu: Anāsa paṇḍu
  • Thai: S̄ạbpard
  • Vietnamese: dứa
  • Welsh: phîn-afal

At a glance, it looks like we have six basic types here:

  • ananas: 42 languages
  • piña/pineapple: 10 languages
  • ark’ayakhndzor: 1 language
  • bōluó: 1 language
  • S̄ạbpard: 1 language
  • dứa: 1 language

Clearly the ananas words are the most common, but what does that mean? Should Serbian and Croatian (which are generally mutually intellible) each get a “vote” while Mexican Spanish and Castilian Spanish have to share a “vote”? Which languages you count and which ones you don’t is fairly arbitrary.

Let’s break this up a different way. The pineapple only came to the Old World via European colonization of the New World, so let’s see what the European colonizers call this fruit. Here we’re only looking at languages whose parent countries had a major colonial presence in the New World before 1600:

  • Portuguese: ananás
  • Spanish: piña

One of these languages use a pine-type word, one uses an ananas-type word. Doing a bit of reading, it seems that ananas is the Guaraní word for “pineapple” and was borrowed by the Portuguese, and it spread from there. The Spanish (going all the way back to Columbus) called the fruit piña “pine” because of its resemblance to a pinecone.

From these two origins come many of the names for “pineapple” in the various languages of the world. The only phenomenon you’re seeing here is that ananas simply became more popular. My best guess as to the distribution of the two forms would be the dominance of French (with ananas) in previous centuries. Languages that are more heavily influenced by Spanish (like Filipino) end up with pine-type words. Languages more heavily influenced by English (like Japanese) end up with the full pineapple type.

Answer 2 (score 20)

@Joe has covered words for “pineapple”, so here’s some info on words for “banana”.

There is a fairly straightforward explanation for why Spanish has the word plátano instead of a variant of banana, compared to the other languages in the above list.

Plátano already existed in Spanish to refer to another sort of plant, namely the ‘plane tree’, or trees of the genus Platanus, whose name can be traced back via Latin through Greek to a Proto-Indo-European root *plat- “to spread”. The name is thought to refer to either the largish leaves some of these trees have, or their broad, flat expanses of bark. Various cognates also carry the meaning of ‘broad’, ‘spreading’, and so on - in English, plants of the genus Plantago, commonly called plantains, are similarly named for their broad, round leaves, via a borrowing from French. (Edit: to clarify, the plantains just mentioned are a small, herbaceous, bog-loving plant, completely unrelated to any sort of banana).

So, once the Spanish-speaking world had access to bananas, plátano was usefully extended to refer to (some) banana plants; given that banana trees have quite enormous leaves, the name is quite appropriate.

There is an alternative hypothesis that Spanish got plátano from the Carib word platana (from Arawakan pratane) and that this was altered to make it more similar to Spanish plátano ‘plane tree’, but there is not much evidence to support this, and either way plátano ‘plane tree’ is somewhat responsible.

But, the list above is a bit disingenuous, because Spanish does have the word banana, and this is the word that Spanish initially borrowed from Wolof, a Niger-Congo language, to refer to the fruit. Depending on what sort of Spanish you speak, banana can refer to to smaller, sweeter fruits we are most familiar with, while plátano might refer to the larger, starchier, less sweet fruits that are generally used in cooking rather than eaten raw. In English, the latter are referred to as plantains or plantain bananas (probably on analogy with Spanish Edit: because although English already had the word ‘plantain’, it wasn’t used to refer to bananas). Both sorts are of the genus Musa. But at least in Mexico, plátano refers to the sweet variety, and plantains are plátano macho.

Most modern languages that use a variant of the word banana got the word via either Spanish or Portuguese, and those languages initially borrowed it from Wolof. So, to reiterate Joe’s point, “whichever name is used by the introducers gets adopted in many other languages”.

But in fact, the history of ‘words for banana’ started long before the Spanish and Portuguese borrowed the modern word from Wolof, and if you look at languages across the world, the word banana hardly comes into play.

A recent study by a team of geneticists, archeologists, agricultural scientists and linguists investigated the history of different banana varieties based on the evidence for human cultivation and dispersal of bananas. The evidence suggests that bananas likely originated in New Guinea, and the linguistic information associated with this is pretty interesting. Mark Donohue put together a list of over 1,100 words for ‘banana’ in languages from Melanesia and South East Asia, the regions in which the banana was first dispersed. The paper can be accessed here (sorry, abstract only unless you have access), but the supplementary materials are available publicly, so if you want to see over 1,100 words for ‘banana’, go to this page and click the link for Table_S04.

Using comparative methods, the researchers reconstructed root forms for different words for ‘banana’, and found four major cognate sets with distinct, but overlapping, geographical distributions. The root forms were muku, punti, qaRutay, and baRat. You can see some maps of the distribution of these variants here (pdf). There were other minor groupings and a lot of ‘miscellaneous’ lexical items that had no clear group of cognates. Banana is labelled as such, and only shows up twice in this whole list (granted, the list doesn’t include many African or continental European languages).

In sum, while plátano is an oddity in the short list presented above, Spanish actually did use the word banana first, and some varieties of Spanish still do use it for sweet bananas, where plátano specifically refers to plantain bananas. Furthermore, crosslinguistically, banana is actually an oddity in itself.

Answer 3 (score 6)

The reason that “only one language” calls it a “pineapple” is that the table left out other languages which also do. Spanish, for example, calls it piña, and a quick glance at Wiktionary suggests that several other languages, including Welsh, Catalan, and Afrikaans, do too. It’s not surprising that many different languages would have only two words for the fruit, because it’s a new world fruit and those are all old world languages, so they all take it as a loan-word.

As for banana, it seems that many, many languages have different words for it. See Wiktionary and note in particular the geographical variation in the Spanish word.

The moral of this story is that if we pick the right samples we can show anything.

14: What is the difference between native language, first language, mother tongue and L1? (score 39533 in )

Question

Note: I’m not a linguist, and I realize I might be treading in a grey area here.

I’m wondering what the differences (and/or similarities) between native language, first language, mother tongue and L1 are. The first three, I find, are often used interchangeably in casual conversation. In academic linguistics though, are there generally accepted differences between these terms? Also, where does that leave L1? I believe it’s an academic term, but I often see it defined relative to the other three casual terms.

A quick Google search seems to reveal there is a lot of confusion around these terms, and possibly no straightforward answer. It also doesn’t help that the Wikipedia article groups the four terms into one article and has fewer citations than I would like. The only thing I’ve been able to gather is that the tendency seems to be for “native language” to mean proficient and for “first language” to mean chronologically first.

In particular, I’d like to find out (if possible) what term or terms are applicable to the following scenarios:

  • A chronologically first language with which the speaker is no longer fluent or even competent.
  • A language learned to fluency in adulthood (with or without a foreign accent - though I realize the latter is rare).
  • A language learned to fluency in childhood (within the critical period) that is not chronologically first.

I realize fluency might not be the best word to use…but the words I would normally use are the ones that I’m seeking clearer definitions for!

Answer accepted (score 12)

OK, the fact of the matter is that everybody learns their own languages, in their own ways, in their own times, places, and circumstances. It is normal for kids to have several languages at home, and to pick up others as needed, by playing with other kids. Those languages either flourish through use, or wither and get forgotten by disuse, like any human skill.

Plus, people vary not only in their unique language experience, but in their skill at apprehending and using it. Also like any human skill. That’s a vast amount of individual variation.

By contrast, labels like

Native language,
First language,
Mother tongue
L1
L2
.. etc.

are invented by people who need abbreviations for commonly-referenced groups of characteristics, usually characteristics that are common only in monolingual places like the USA, where almost everybody speaks only English, and often finds multilingualism threatening.

They are not terms defined in the Qur’an or the APA Style Manual; they are just abbreviations, which may be useful in certain contexts among certain kinds of professional. That’s all, really.

These terms, and others, may or may not be applicable to the situations you mention. Or to others one can easily imagine. I repeat, they’re just nonce forms, with localized definition and localized utility.

They are Not Ready For Prime Time, in other words, so you shouldn’t take them too seriously. And they certainly don’t cover every possibility.

Answer 2 (score 4)

This is a very good question because it highlights the multiple terms used to describe what appears to be similar if not the same phenomena; However, as it has been pointed out above, there are contextual differences in the terms. As far as a clarification for the terms:

First language and L1 are the same. L1 is the abbreviated form of first language. And mother tongue and Native language are interchangeable. Essentially, these two terms are socio-cultural constructs. Meaning, the terms native and/or mother language are a way to conjure a transportation of a language from one culture and geography into another geography/culture. Their use trigger the counterpart, foreign. Thus, in addition to declaring the order of language acquisition, these two terms also reveal an immigration component to the language. In contrast, the term L1/First language are clinical terms to describe language acquisition in individuals who have acquired one or more languages.

As far as the scenarios listed in the original post:

  1. I would describe this scenario as a non dominant L1 individual, i.e. underscoring language acquisition and subsequent language shift.
  2. L2 adquisition post Critical period.
  3. This scenario needs to be further qualified because the critical period may include for some a wide time frame. For example, I would call an individual acquiring a second language up to age 4 or 5 a sequential L2, and/or maybe depending on the input exposure opportunity in both languages, a simultaneous bilingual, known as 2L1.

Answer 3 (score 3)

I wrote a blog post about this very topic last week, on the International Mother Language Day. http://multilingualparenting.com/2014/02/21/mother-tongue/

There is unfortunately no clear-cut answer if you speak more than one language. The different terms are used in different contexts and for varying purposes. For me ‘mother tongue’ and ‘native language’ are more or less interchangeable. The term ‘first language’ is as far as I understand not the chronologically first language, but the one a speaker is fluent in and feels most comfortable to speak. This means your ‘first language’ can change depending on where you live and which language you speak the most.

Your first question is intriguing and I also wrote about the scenario in my blog. If you no longer speak the language(s) you learnt as a child, based on the monolingual research terminology, you would be “mother-tongue-less” - however you would have a L1.

If you have become fluent in a language later in life, you have a ‘native level fluency’. It’s difficult to draw the line with regards to accents - different “native” speakers have vastly varying accents as well, so why would an accent from an other area prevent you from being called fluent?

If you have learnt a language as a child and you are fluent in it, it is one of your ‘native languages’ - the notion that there could only be one ‘mother tongue’ or ‘native language’ comes from a monolingual perspective and doesn’t apply to bilingual people.

15: Why did England not maintain French as a spoken language? (score 38494 in 2014)

Question

In many countries around the world, especially in Africa, the people natively speak both an indigenous language and French due to French colonization.

The Norman conquest of England left us with many, many French words and grammatical structures, but England maintained only one language.

What factors caused this difference? The French colonization of the world left many countries speaking French, but only left England with a vastly different language from what it started with.

Answer accepted (score 15)

After the Norman Conquest in 1066 French quickly replaced English in all domains associated with power. French was used at the royal court, by the clergy, the aristocracy, in law courts. But the vast majority of the population continued to speak English. Had the aristocracy and clergy miraculously vanished in 1100 English would have taken over right away. In reality it took until the 14th c. for English to slowly supplant French in many of these domains. There are several reasons for this:

  • John Lackland (King of England) lost Normandy to the King of France. This meant that his and the Norman aristocracy’s focus shifted to England. He still had possessions in the South of France, but these were too far off to shift the focus away from England.
  • Society used to be split into a French-speaking aristocracy and clergy who wielded all the power, and English-speaking peasants without power. Now an urban and English-speaking middle class (traders, artisans, etc.) came up, and acquired wealth and power.
  • The French-speaking population was ultimately rather small in number. Looking at it from this perspective one might ask why French stayed that important for such a long time. French did remain the language of power for two centuries or so, but ultimately the aristocracy slowly shifted towards English because their attachment with France had waned.

By the 14th c. people started making fun of the French spoken by the Norman Aristocracy. Chaucer, in the Prioress’s Tale in the Canterbury Tales says about the Prioress (a nun):

And Frenssh she spak ful faire and fetisly,

After the scole of Stratford-atte-Bowe,

For Frenssh of Parys was to hir unknowe.

The Prioress only knew the kind of French taught in England (Stratford here) and not the kind of French spoken in Paris (seen as more desirable). This was at a time when text books for teaching French to the aristocracy came up. They now needed instruction in French because they didn’t learn it at home any more.

Answer 2 (score 13)

The Norman conquest was hardly a case of ‘French’ colonization. France barely existed at the time. The Normans were fervently not French in their self-identity and can’t even really be said to have spoken ‘French’- rather they spoke a dialect of the Latin-based languages spoken across the old Roman world, the Parisian dialect of which would later develop into the standard French language of more recent centuries. The Normans of 1066 would simply have called their language ‘Romanz’ i.e. Roman.

The Normans appear to have adopted English as their first language far more quickly than generally thought, some scholars believe this transition was complete as early as the 1150s. There is a court record of a knight unable to speak Norman/French at all soon after that date. From that point on a more modern French (rather than the older Norman dialect) was spoken as an acquired prestige language, rather like the clergy spoke Latin. So the idea of a Francophone aristocracy throughout the medieval period is misleading - the powers that be did speak Norman and then French during the early middle ages, but mostly as a second language and only in certain contexts. Eventually this fascinating cultural fashion simply died out.

Answer 3 (score 3)

I suspect the difference may be due to the fact that language used to be less “politicized”, i.e., the conquerors did speak French, but there was no conscious effort to impose the French language on the colonized people: Everybody essentially just used whatever language seemed most useful to communicate in a given situation. This still favored the conquerors’ language, because they tended to be the people with money and power, but over time, there was a creolization of the languages.

In the 19th century, as the concept of a nation state developed, language started being considered part of the national identity, so there was more of an effort to impose the national language on every subject of a nation. As an example, the English Education Act of 1835 switched money previously spent on educating Indian elites in Sanskrit and Arabic to educating them in English.

16: What is the difference between voiced and voiceless stop consonants? (score 37742 in 2012)

Question

As a native speaker of American English, when I was listening to the difference sounds in this IPA chart, I was really surprised when I realized that I could not differentiate between p/b, t/d, and k/g. (I think I’ve always been distinguishing the pairs based on whether or not the consonant is aspirated.)

I know the difference has to do with vibrations of the vocal chords, but I am not quite sure what to listen for.

Answer accepted (score 30)

Since you asked about that particular chart and the accompany voice samples, I took the samples and laid them out in Praat, a tool designed for speech analysis.

The top half of the visual representation you see is a waveform, a visualization of air pressure on the y-axis and time on the x-axis. This is the actual energy your ear picks up and interprets as sound.

The bottom half is a spectrogram, which is a mathematical transformation of the waveform into its constituent frequencies. On the y-axis is the frequency (0 Hz to 5000 Hz) and on the x-axis is still time. A dark region at 3000 Hz at some particular time X means that the acoustic signal at that moment had strong “energy” in the 3000 Hz region.

Red and blue lines and regions are explained later.

A note to those who haven’t heard the actual samples in question. They are in the environment /a/ /a_a/.

Here are the results (annotations as we go along):

Bilabials

p

b

Corresponding speech samples:

The important difference between [b] above and [p] below is the relatively low-amplitude vibration before the red lines. This vibration is seen in the waveform as a typical wave and seen in the spectrogram as a “band” of energy in the lower frequencies. This vibration is the result of the vocal chords vibrating.

In both cases, the time leading up to the red line is characterized by a complete closure of the vocal tract by the lips (hence the term “bilabial stop”). In [b], during this time, the vocal chords are vibrating. In [p], during this time, they aren’t.

Velars

k

g

Corresponding speech samples:

You see the exact same pattern here. The “stop” portions (the period immediately before the red lines) have flat waveforms and no voicing bar on the spectrograms.

Alveolars

t

enter image description here

Corresponding speech samples:

You might have expected the pattern to continue here. I did too. I believe that the speaker flubbed the pronunciation of the [d] here. As you can see, there is no voicing in the stop periods.

I voiced my own version of [d] in the same frame, and this is the resulting spectrogram and sound:

d-improved

Sound: [d] version 2


Voiceless aspirated stops

Here are three voiceless aspirated stops (what we’re used to seeing in English as /p t k/ when they occur as simple onsets at the beginning of stressed syllables):

ph

th

kh

Notice in all these examples, there is considerable delay after the red line to the blue area.

Corresponding sound samples:


The spectrum

So you won’t have to pick apart samples from above, here’s a side by side comparison between [p] aspirated, [p] unaspirated, and [b]:

ph

Onset of voicing (blue) occurs well after stop release (red).

p

Onset of voicing coincides with stop release.

b

Onset of voicing occurs well before stop release and also fills up intervocalic (between vowels) stop duration.

Corresponding speech samples:


Conclusions

So what does this mean for you. As a native English speaker, you have only ever been exposed to [p] aspirated and [p] unaspirated. You understand the former to be /p/ and the latter to be /b/.

When you get presented with a sound you’ve never heard before in that particular phonetic environment, your brain just associates it to the nearest match. Your brain disregards the voicing before the stop release and notices that there is no aspiration duration, concluding immediately that it is also /b/. This is helped in large part in that in even slightly rapid speech, the English sound /b/ appears not as an unaspirated [p] but actually a fully voiced [b] (e.g., in the word tabs).

And that’s why you may not be able to tell.

It certainly didn’t help that their [d] was undervoiced.

Answer 2 (score 8)

If you’re still having trouble distinguishing between voiced and voiceless consonants by sound alone, you can try a simple trick I learned in one of my introductory linguistics classes.

Practice producing the voiced/voiceless pairs (I think it works best if you say them in the context of a word) while pressing your fingers to your throat. You’ll actually be able to feel the vibrations from your vocal chords when you utter the voiced consonant. I’ve found that it works pretty well for differentiating between more similar sounds, which are kind of hard to hear and aren’t necessarily expressed in the orthography, such as the dental fricatives in English (voiceless θ, as in thistle’ and voiced ð, as in ‘father’).

Answer 3 (score 8)

If you’re still having trouble distinguishing between voiced and voiceless consonants by sound alone, you can try a simple trick I learned in one of my introductory linguistics classes.

Practice producing the voiced/voiceless pairs (I think it works best if you say them in the context of a word) while pressing your fingers to your throat. You’ll actually be able to feel the vibrations from your vocal chords when you utter the voiced consonant. I’ve found that it works pretty well for differentiating between more similar sounds, which are kind of hard to hear and aren’t necessarily expressed in the orthography, such as the dental fricatives in English (voiceless θ, as in thistle’ and voiced ð, as in ‘father’).

17: English text corpus for download (score 37582 in 2013)

Question

I need a free English language corpus with at least 15 million words. The corpus should contain one or more plain text files. There should be no tagging, just raw text. The corpus should be free. I would prefer if the corpus contained was for modern English, with a mixture of: tv, radio, film, news, fiction, technical etc., or better still, just plain everyday conversation, but this is not a requirement. I will be processing each sentence in the text with the python programming language. Can anyone direct me to such a resource.

Answer accepted (score 19)

Corpora containing more than 15 million words are often not freely available due to copyright issues (such as the British National Corpus and the Corpus of Contemporary American English).

The open part of the American National Corpus (OANC) might fulfill your criteria. It contains almost 15 m. words, it’s free, and contains conversations and other genres. The spoken part consists mainly of the telephone based Switchboard corpus. If you want more face to face conversations consider adding the Santa Barbara Corpus of Spoken American English.

The OANC comes in versions with different annotation schemes. If you take the version with the least amount of annotation you should be able to strip off all the annotations with a regex find and replace scheme in Python or Notepad++.

Answer 2 (score 12)

Here is a variety of language corpora with millions of sentences each: http://corpora.informatik.uni-leipzig.de/download.html

18: What is the definition of “complementary distribution”? (score 37308 in 2014)

Question

Allophones are defined by means of complementary distribution. As I understand it, a complementary distribution is a “mutually exclusive” relationship between two phones, with regard to a certain phonetic environment. That is, one of the phones will only be found in that environment (and nowhere else) while the other phone will never be found in that environment (but will be found somewhere else).

For example, In English, [pʰ] and [p] are allophones of the phoneme /p/, since [pʰ] can be found at the beginning of syllables ([pʰɪn]) and nowhere else. Likewise, [p] is never found at the beginning of syllables, but can be found in other positions ([spɪn]). Summarizing in a table:

Phoneme /p/ in English

Now, in Brazilian Portuguese, [w] and [l] are considered allophones of /l/. The reasoning is similar: [l] is never found at the end of syllables, but can be found in other places ([a.’kli.vi], “slope”). [w], on the other hand, is found at the end of syllables ([saw], “salt”). The problem is: [w] can also be found elsewhere ([a.’kwi;.fe.ru], “aquifer”). The table for this distribution, therefore, would be like this:

Phoneme /l/ in Brazilian Portuguese

This means that, in some environments both [w] and [l] can be found, as in the [a.’kli.vi] versus [a’.kwi.fe.ru] example. So, evidently, I’m failing to understand what “complementary distribution” really is. Any help to clarify the concept would be much appreciated!

Answer accepted (score 9)

I think you need to tease apart the concept of allophony from the concept of complementary distribution. As people have mentioned in the comments, the two don’t necessarily go hand in hand–i.e. it is not necessarily the case that two phones that are in complementary distribution can be assumed to be allophones of the same phoneme (as Alex B. mentioned), nor is it the case that the distribution of two phones–each of which is a possible realization of a single phoneme–will never overlap in their distribution. It’s the second assertion that is relevant for your Brazilian Portuguese example.

The “trick” is that sometimes different phonemes can have allophones “in common”, i.e. Phoneme A might sometimes be realized as Phone Y, but Phoneme B might also sometimes be realized as Phone Y:

/A/ –> [X] in some environment, [Y] in some other environment

/B/ –> [Y] in some environment, [Z] in some other environment

This happens a lot in neutralizations and mergers. In languages with word-final devoicing, for example, maybe /d/ is realized as [t] word-finally but as [d] elsewhere. But /t/ would also be realized as [t] word-finally. If one encounters a [t] at the end of a word, one cannot be sure without additional information whether it is a realization of /t/ or /d/.

In your example, [w] is probably an allophone of some phoneme that can sometimes be realized as [w] and sometimes [l] (in some sense the “naming” of phonemes is arbitrary, but I’d probably call it /l/ since that seems to be the clear “elsewhere” realization) and [w] is probably also an allophone of a different phoneme (one that I would call /w/).

Now, it is true that allophones of the same phoneme must be in complementary distribution, but the two [w]s and the [l] in your table are not actually all realizations of the same phoneme. So, here is a revised assessment of their distribution, taking their underlying forms into account:

[l] (from /l/) not found at the end of a syllable, but found elsewhere

[w] (from /l/) found at the end of a syllable, but not found elsewhere

[w] (from /w/) found elsewhere (and perhaps also at the end of a syllable?)

So, the [l] and the first [w] are in fact in complementary distribution. I hope that makes sense!

Answer 2 (score 1)

complementary distribution mean that the allophones of a particular phoneme occur in different phonetic environment , the phonetic environment determines which of the allophones is used in pronouncing a word . For example ,the two allophones of the phoneme /p/, namely , [pH] and [p-]are in complementary distribution the first occurs at the end of the word,as in Pat and Pot, the second may occur at the end of the ward , as in help. the fact of their complementary distribution means that these allophones do not occur in minimal pairs in English.

19: Why was korea able to remove kanji but japan wasn’t when both languages use homophones? (score 35335 in 2013)

Question

I am strictly interested in the question of homophones and kanji. Korean has homophones yet they removed the Chinese characters and are getting by just fine? Or are they?

Japanese kanji lovers say kanji is required in order to deal with homophones. Unlike English, Japanese has many homophones and you can’t know from context which is being used. As such we need to use kanji in order to deal with homophones.

Why was korea able to remove it even though Korean has homophones but japan hasn’t? I am strictly focusing on the question of is kanji necessitated by homophones and if yes how?

————- EDIT —————————

Apparently relying on context alone is not enough. In Japanese tv the frequently use subtitles/kanji to remove ambiguity. Have a look at this text also: http://www.cjk.org/cjk/reference/japhom.htm. A single Japanese word homophone may have 20 meanings and for these kanji they argue for kanji.

Given what I have mentioned do you think the kanji is still required or could it be removed?

Answer accepted (score 16)

the problem is a bit in the framing of your question: “Why was korea able to remove kanji but japan wasn’t when both languages use homophones?”. i see two problems with this.

firstly, all human language have homophones to a greater or lesser degree. it is true that Chinese and especially Japanese are somewhat internationally renowned for having big numbers of homophones (there’s a humorous short story about two neighbors running into each other in the street and having an extended chat about nougaku, not realizing that while the one is talking about 農学 (agriculture), the other is talking about 能楽 (No theater)).

secondly, your question is biased in favor of abolition of kanji, making it look like it was the natural and desirable way for any language that has been using kanji to replace that writing system with a more sound-based one; that may or may not be true, but this partisan stance begs the question whether the Japanese and the Koreans have ever collectively wanted to stop using kanji.

i guess the somewhat baffling answer to this question is that for a long time in their histories, neither the Japanese nor the Koreans have wanted to abolish kanji / hanja. as for the Japanese, while the well-known kanji usage regulations stipulated after WWII did limit the number of kanji, they are still used in great numbers. one often sees obscure kanji and kanji with obscure readings. every japanese can at almost any given point in time opt to switch to using kana in place of kanji, and in fact this sometimes happens (like my former landlady who posted a sign containing ‘石っけん’ rather than ‘石鹸’, as she deemed the former to make for an easier read for her foreign student residents). the fact that many japanese continue to write many kanji clearly shows they’re not willing to abolish this system.

likewise, the Koreans have for the 500 years between Sejong’s invention of hangeul and their widespread, nation-wide and almost exclusive adoption resisted the urge to ‘rid’ themselves of kanji. it is only in the 20th century that hangeul rose from being an auxiliary script that many literati looked down on to the primary means of written expression that we see today.

rephrasing your question, i think we should better ask: why did the Koreans stop using kanji, and the Japanese didn’t?, plain and simple, and i think the answer lies to a big part in: a matter of preference and choice. homophones have relatively little to do with that; as has been pointed out, when you do have a ‘phonetic’ script at your hand, and you can state yourself unambiguously in speaking, then the same should hold when writing with that phonetic script, homophones or not.

i’m also afraid i can’t quite follow @lzmtky‘s argumentation with those vowelcounts and whatnot. firstly, reading ’どうぶつゆらいかんせんしょう’ i do have every difficulty with that string as you, but that is really caused by reading habits, not by the fact that that string is not in principle very readable.

further, there are rather more than 400 syllable blocks in Korean—Unicode defines 11,172 precomposed hangeul syllables (not all of which are in frequent use). your number for Japanese ‘sounds’ is also slightly misleading: there are 45 (not 46) kana with distinct sounds, but: there are an additional 25 kana with (han-) dakuten, of which 2 have to be subtracted because they’re phonetic duplicates; further, there are the youon digraphs (a la しゅ), of which there are 27 with distinct sounds. that leaves us with 45 + 23 + 27 = 95. but what have we counted here? well, that’s about the number of different morae in Japanese, not what is commonly assumed the basic building block in segmental phonetics—that would be the number of distinct vowels and consonants.

to compare an alleged number of 400 hangeul syllable blocks with an alleged number 46 distinct kana is pretty unjustified; it is comparing two writing systems at best anyhow, not two sound systems. for that to do, you must obviously (at least) count vowels, consonants, and their possible combinations. if your intention was to present syllable counts (and thought, reasonably, that the amount of distinct, common hangeul blocks will just about equal the number of distinct Korean syllables), well then you’d have to take into account that a Japanese syllable may have more than one mora: おおさか has four morae, but only three syllables; likewise, かわい has three morae, but only two syllables. i think we can safely say that Japanese has more in the vicinity of 200 distinct syllables.

commenting on your statement that “even in Korean, there are different ways of writing a word/name which sounds the same. For example, 민아 and 미나, 윤아 and 유나, 각오 and 가고. Therefore, this variety eliminates the need for kanji in Korean”, let me ask: are you saying that because there’s more than one way to spell a given Korean word, therefore it is not necessary (‘eliminates the need’) to use kanji? that’s, like, because there’s two ways to reach my bank from my house, therefore i don’t have to ride the bicycle? i don’t get that logic.

Answer 2 (score 4)

It’s probably just a government policy thing - Japanese kids start learning kanji basically as soon as they’re done with kana, while Korean kids don’t start learning hanja until they’re in middle school.

As for the homophony, Japanese likely could be written phonetically just fine. I’ve actually read a few texts written by nonconformists that use kana and spaces instead of kana and kanji. Meanings are contextually obvious frequently enough that it’s not really a problem most of the time (and I can imagine that if the switch was made, those problem words would be replaced with more distinct ones fairly quickly). It’s also true that Korean has more potential readings for hanja than Japanese has for kanji (it allows syllable-final consonants, for one thing), but I’m not sure how much this affects anything.

Answer 3 (score 3)

As a native Korean speaker, the issue of homophones in our vocabulary does exist. However, as several people mentioned here, there are far more letter blocks and syllables that are pronounceable and can be created in Korean using Hangul, as opposed to Japanese whose pronunciation itself is somewhat more limited, so to speak, than Korean due to that there are only 5 vowels and each kana representing one syllable.

Moreover, for example, しょう and しょ are two different on-readings of a word and a part of some words containing that sound and, therefore, have different kanji representing them. But they can sound almost identical to someone who’s not proficient in Japanese and is not used to the context from which a Japanese person would naturally understand the meaning of the word being mentioned. Another reason why we are successful with our exclusive usage of our phonetic alphabet while only using Hanja in school curriculum and in media solely for the purpose of clarifying any potential ambiguity or identifying a term being emphasized, is the use of spacing in sentences.

Kanji is important in Japanese in terms of identifying ‘blocks of meanings’ in a sentence because it doesn’t use spacing. Sure, Japanese can entirely be written in its phonetic script. However, it will be cumbersome to look at such Kana-only sentences and identify different parts of the sentence from a long series of meaningless(individually) scripts in order to understand it. That is not the case in Korean where every major part of a sentence is grouped with spaces between each of them. Even so, Koreans still need to have a working knowledge of at least a few hundred Hanja’s as well as their readings-we only have 音 reading just like the Chinese do- and their meanings, because 70% of our vocabulary is built based on Hanja and consist of adopted words from China and a lot of words from Japan which convey modern concepts and ideas.

Although Korea seems to be doing fine going on without mixture of Hanja and Hangul in printed materials, a problem is manifesting itself in which significant number of young Koreans often make mistakes properly spelling some homophones and are negatively affected by a wave of English loan words taking place of existing Korean vocabulary. This seems to me a good enough reason to revive the extensive Hanja education and usage in printed materials and to require Hanja proficiency in all levels of society. I strongly believe Korea needs to readopt Hanja at a comprehensive level in order to resolve the crisis of Korean language in the 21st century.

Back in the 70s and 80s when Korean newspapers heavily used Hanja except for native Korean words and other grammatical elements like particles, it was possible to ‘speed-read’ an article and comprehend the content under time constraint, for example, when they had to get to work in the morning just by skimming through the keywords printed in Hanja. For the same reason, Korean and Japanese people could, to a certain extent, understand news articles from each other’s country.

20: What is the difference between complements and adjuncts? (score 34922 in 2019)

Question

What is the difference between complements and adjuncts? I always have a problem drawing a tree diagram for the syntax structure of a sentence with placing complements with word level category and adjuncts with different nodes; I don’t actually know how to distinguish both and where to put them.

Answer accepted (score 15)

The distinction is between arguments (sometimes also called complements) and adjuncts. In general, arguments are expressions that complete a predicate, and that are required by the predicate. Adjuncts, on the other hand, are not required by the predicate, but they do add (usually temporal or locative) information.
Here are some examples:

(1) Paul lives in London. 
(2) Paul met Peter in London.  

Paul in (1-2) is the subject of the verbs lives in (1) and met in (2). Subjects are arguments. Peter is the object of the verb met in (2). Objects are arguments, too. The PP in London is an argument in (1), because without a locative PP lives would mean something else. In (2), however, in London can be omitted without affecting the meaning of the predicate, hence the PP in (2) is an adjunct.
Look at this page on Wikipedia for more information.

Answer 2 (score 3)

Adjuncts are invariably interpreted as nth-order predicates of heads of different ontological types (for n > 1; a first-order predicate is never a syntactic adjunct, it is a syntactic ‘predicate’), whereas complements are interpreted as arguments of the predicates that constitute their heads. That holds whether the head is a noun or expanded noun, a verb or expanded verb, an adjective, a few adjective-derived adverbs that do take a complement, or, in fact, any other category, including ‘functional’ ones.

If, on the contrary, you try to distinguish complements from adjuncts by syntactic criteria, under current Larsonian/Cinquean ‘all-in-spec’ approaches there is no way to do so, and criteria like optionality/obligatoriness (or inclusion of complements in phrases substituted by pro-forms like one or do so) are notoriously problematic, as well, as certain complements are arguably ‘optional’, whereas would-be adjuncts like very well in e.g. They treated us very well are obviously obligatory.

Depending on your syntactic assumptions, of course, there may be ‘locality’ criteria that also distinguish complements from adjuncts, but in surface structure such locality constraints may be cancelled by ‘displacement’ due to information-structure or by related PF constraints. In general, the above-mentioned semantic criterion is, as far as I know, the easiest and most consistent heuristic tool in this respect.

Answer 3 (score 1)

Adjuncts and complements are different. An adjunct is not necessary, and adds extra information. A complement is necessary in order to complete the meaning:
[S]He [V]put [O]some salt [C]in the soup. The verb put must have a complement saying where something is put. Without the complement (in the soup), the clause would not be complete. We cannot just say He put some salt.

21: Is there an online tool to convert IPA symbols into audio sound? (score 33861 in 2017)

Question

As many amateurs and beginners know, IPA is difficult to memorize and internalize at first. Does software exist where one can paste in IPA text and hear synthesized speech (ideally in the form of a web page converter)?

I realize that such software would have to have caveats: it would be language specific since IPA to sound mappings can vary among languages, it would sound artificial and thus not be a completely accurate compared to the real sound of a given dialect, it might not get syllable stress right without diacritics. Nonetheless, it could be helpful when one reads a IPA representation of a dialect that one does not have a recording of and has not heard in real life.

This question is similar, but not the same as it is basically asking for a text-to-text lookup table. Some of the links in this answer are good for audio of individual phonemes (fascinating in itself), but do not seem to aggregate them into words.

This online forum mentions a converter here, but it seems to be mostly discontinued and not IPA as far as I can tell.

If this does not exist, I am considering writing a crude one based on the clips from here. I wonder what I will learn through that process.

Answer accepted (score 14)

I’m currently using espeak, an open-source software. Not so bad, even if the voices sound artificial. Details here.

E.g. : (Seneca, Epistulae ad Lucilium, 1.1.1)

espeak -v eo --ipa -s120 -p60 -a20 "[['ita'fak'mi:lu:'ki:li:]]"

i.e. speak with an Esperanto voice (-v eo), print the IPA on the console (--ipa), set the speed at 120 words per minute (-s120; range : 80-500, 260 recommended by the doc), set the pitch to 60 (-p60; range : 0-99), and the amplitude to 20 (-a20; range : 0-100).

Answer 2 (score 12)

Amazon Web Services’ Polly text-to-speech service supports Speech Synthesis Markup Language (SSML) and specifically its <phoneme> element.

You will need to create an AWS account, but you can then use the ‘get started’ demo to hear the speech of any (supported) SSML. The demo is here.

For example, I put in the IPA from the Wikipedia article for the Binturong as this /bɪnˈtuːrɒŋ/ like this:

<phoneme alphabet="ipa" ph="/bɪnˈtuːrɒŋ/ ">Binturong</phoneme>

It seems to work pretty well and you can choose from a wide selection of voices.

Answer 3 (score 8)

You might try using mbrola. http://tcts.fpms.ac.be/synthesis/mbrola.html

It’s a downloadable speech synthesizer, where you type in a sound and it uses diphone synthesis to play it back.

To play a given sound, you would type in something like this:

_ 200 10 120

a 300 10 120

_ 200 10 120

The above text means, line by line “play silence for 200ms, and at 10% of that 200ms duration, the pitch should be 120Hz. Then play the sound /a/ for 300ms, and at 10% of that the pitch is 120Hz, and then play 200ms of silence again.” So it goes duration [tab] pitch-time [tab] pitch value.

It doesn’t use IPA, it uses X-SAMPA, but you can easily convert between the two scripts, and the sound file databases will also have IPA in them.

22: What’s the ‘official’ term for when a word is at the tip of your tongue? (score 33137 in 2011)

Question

If I remember correctly from the half year I studied linguistics, there is a sort of official name for the situation or state your brain (or your speech center) is in when a word is at the tip of your tongue but you can’t quite think of what it was again.

What was that term again?

Answer accepted (score 14)

There are multiple words for the phenomenon you are speaking of.

  • tip-of-the-tongue (TOT) is the actual used term for the moment at which the subject has the specific difficulty, or more technically presque vu (‘almost seem’).

  • dysnomia is the term for the general chronic difficulty in retrieving vocabulary items. A more common but inclusive term is aphasia which is for any kind of impairment of language though often used for retrieval difficulties.

Answer 2 (score 10)

This is one of the phenomena that are studied a lot by experts, along with other facts, under many points of view. The one we are interested in is psycholinguistics, that is an interdisciplinary field between neuropsychology, cognitive psychology and linguistics.

This phenomenon where people seem to temporarily forget a certain word, is normal and occurs to everyone at least once in a life. The common expression is tip-of-the-tongue, but the terminology adopted by psycholinguists seems to be simply TOT.

If you want to go more in depth you can read Sparkling at the end of the tongue: The etiology of tip-of-the-tongue phenomenology by Bennett L. Schwartz.

Answer 3 (score 1)

Lethologica - In other words, when you have the word on the tip of the tongue but you keep losing it and can’t quite grab hold of it.

23: Automated French/Italian/German to IPA transcription (score 31608 in 2016)

Question

I’m looking for a website or software that will take text written in a source language and produce a transcription in IPA. The languages I am interested in are French, Italian and German, but if you know resources for other languages I would be happy to know.

For example, English Phonetic Transcription does English - IPA for free. Here’s a sample of their transcription.

All that glitters is not gold.

ɒl ðæt glɪtərz ɪz nɑt gold.

I’m looking for a tool like this. If you’re wondering why one would need such a tool, besides the obvious application of helping people learning a new language, it’s also useful to singers/actors who have to perform in a foreign language.

Answer accepted (score 22)

The open source eSpeak program can do this:

espeak -v lang --ipa "text goes here"

where lang is:

  • fr for French
  • it for Italian
  • de for German

It is not 100% accurate in pronunciation in all cases, but will speak out the pronunciations.

You can also use Kirshenbaum-like ASCII IPA:

espeak -v lang --ipa "[[ipa transcription goes here]]" 

If you use -x instead of --ipa you get the phonemes in the format that espeak accepts.

Answer 2 (score 5)

I happend to find this: EasyPronunciation.com

Works okay for French, but it goofs up on some words, so watch out. Looks like English, Spanish and Chinese are also available there.

Here’s something for German: Donnerstag

Answer 3 (score 3)

Have you tried IPANow? It does IPA transcriptions for Latin, Italian, German, and French.

24: Why is “Aurora Borealis” from Greek, but “Aurora Australis” from Latin? (score 28797 in 2013)

Question

In astronomy we have the Aurora Australis in the south and the Aurora Borealis in the north. According to Wikipedia, auster is in fact the Latin equivalent of the Greek νότος, or southern wind. However, boreas is a Greek word, βορέας, not Latin! The Latin equivalent is aquilo. So, are the “southern lights” derived from a Latin word whereas the “northern lights” are derived from a Greek word? I can think of other places where auster/boreas are used as south/north. I find it unusual that this common pair would be derived from two different languages. How did this convention arise?

Answer accepted (score 12)

The term “aurora borealis” was arguably first used by a French scientist Petrus Gassendus aka Pierre Gassendi in 1621, in his treatise “Physics.” For further discussion, see Siscoe, George. 1986. An historical footnote on the origin of ‘Aurora Borealis.’ In History of geophysics, volume 2

The phenomenon itself has been known for a long time in Europe; for example, the ancient Greeks called it “blazing skies” or “flaming sky dragons” (Hesiod, Theogony).

The term “aurora australis” was arguably first used in 1741 (OED)

Notice that when those terms were coined, they were used as Latin words. In other words, the Latin word “borealis” was used, not the name of a Greek god.

Answer 2 (score 1)

They are both Latin, Aurora Borealis means “morning light coming from the north” and Aurora Australis means “morning light coming from the south”. .

Answer 3 (score 0)

Common sense guess: The northern lights have been known in the west for a much longer time than the southern lights. Maybe the ancient greeks themselves knew of it and named it. The southern lights were discovered (by the west), and thus named, much later, by people who had neither greek nor latin as their first language. Maybe they didn’t see a problem in mixing greek and latin terms.

Notice how many of the older scientific terms for animals and plants are a mishmash of greek and latin (now they are a mishmash of just about anything). If following that tradition, a mix of greek and latin to describe another natural phenomenon is perfectly understandable.

25: Simultaneous bilingualism vs Sequential bilingualism (score 28381 in 2011)

Question

Simultaneous bilingualism (or multilangualism) is when a child acquires two (or many) languages simultaneously, for example when they are raised by parents speaking more than one language.

Sequential is when the child acquires the second language(s) after having considerably learnt the first language, for example when the parental tongue is different than the main language of the community or education system.

I’ve heard that these two will result in different types of bilingual. How will they differ? Which one will become better speaker of the two languages, assuming both languages are acquired during the critical age?

Answer accepted (score 7)

People that grow up with more than one language, can end up being fluent in each language, and have native-speaker competency, but there will still be a dominant language. This could arise from social stigma (e.g. avoiding speaking a language that is not a dominant language in the community), from lack of full community to engage in (e.g. speaking a parent’s native language in a community where few, if any others, speak that language, also commonly referred to as heritage speakers) or from simple cognitive barriers. This barrier is usually associated with when the L2 is learned and how much the L2 is used, which leads to four main groups: early critical period-high proficiency, early critical period-low proficiency, late critical period-high proficiency, late critical period-low proficiency.

So even with a Simultaneous Bilingual, there will be sociolinguistic pressures that make one of the languages a dominant language. In this sense you can analyse the development of the bilingualism in much the same way as you would any second language acquisition. Unfortunately there is no guarantee that the Simultaneous Bilingual will have better cross-linguistic ability when compared to the Sequential Bilingual, as this is a function of many variables, but I would feel comfortable putting my money on betting that the Simultaneous Bilingual would have a higher probability of being a high proficiency user of the L2, simply because they have more time, and we can assume that the L2 is acquired before the critical period is over. Whereas the Sequential Bilingual might not begin learning the L2 until later in the critical period. And it is important to recognize that the critical period fades over time, and at different rates for each person, and that the early critical period is more valuable than later stages of the critical period.

But to answer your question more directly, there will be differences between the two bilinguals, but neither is a solid predictor of language competency. Which I can attest to from personal experience, where I learned German at a later age (7) than my brother (5), and I have a greater competency than he does. A good book I can recommend is the book Second Langauge Acquisition, which has a lot of good information on bilingualism.

Answer 2 (score 4)

Assuming the existence of a critical period in life to learn a language that extends from early childhood to puberty and affects both the acquisition of the first language and the acquisition of the second language according to (Penfield and Roberts, 1959).

In my opinion both cases are examples of bilingualism and what changes is the form of acquisition.

  1. Regarding simultaneous bilingualism, it is known that at the beginning it tends to mix both languages and create initial confusion, but once fluidity is acquired in both, they can be used interchangeably, even changing from one to the other automatically. In this situation two things can happen depending on the exposure to the language:

    1. One of them can become dominant.

    2. Have an indifferent use of the two in different contexts, achieving a perfect bilingualism.
    3. On sequential bilingualism is the one that is learned after acquiring a mother tongue usually when you have to attend school, this language L2 becomes the main and dominant, for its academic use. The reverse situation can occur at home if the parents are not responsible for vehicular L1 since the linguistic structures may be more deficient.

In my experience, the two types of bilingualism have nuances depending on the family and the social context. Going further, I believe that the ability to understand, write, read and speak should be separated. Use of spoken language even within schools produces rejection in sequential bilinguals because they are considered (themselves) disadvantaged with respect to the simultaneous ones. This point would deserve a good analysis.

Answer 3 (score 4)

Assuming the existence of a critical period in life to learn a language that extends from early childhood to puberty and affects both the acquisition of the first language and the acquisition of the second language according to (Penfield and Roberts, 1959).

In my opinion both cases are examples of bilingualism and what changes is the form of acquisition.

  1. Regarding simultaneous bilingualism, it is known that at the beginning it tends to mix both languages and create initial confusion, but once fluidity is acquired in both, they can be used interchangeably, even changing from one to the other automatically. In this situation two things can happen depending on the exposure to the language:

    1. One of them can become dominant.

    2. Have an indifferent use of the two in different contexts, achieving a perfect bilingualism.
    3. On sequential bilingualism is the one that is learned after acquiring a mother tongue usually when you have to attend school, this language L2 becomes the main and dominant, for its academic use. The reverse situation can occur at home if the parents are not responsible for vehicular L1 since the linguistic structures may be more deficient.

In my experience, the two types of bilingualism have nuances depending on the family and the social context. Going further, I believe that the ability to understand, write, read and speak should be separated. Use of spoken language even within schools produces rejection in sequential bilinguals because they are considered (themselves) disadvantaged with respect to the simultaneous ones. This point would deserve a good analysis.

26: Is rhyming a uniquely English language construct? (score 26368 in 2012)

Question

I will freely admit that this question is based in ignorance of languages other than English (well, American). But do other languages have the concept of rhyming? Thinking back to my few Spanish classes in high school 25 years ago, I cannot think of very many words that I would consider to be rhyming. Mostly due to the different verb forms and feminine/masculine noun forms, it seems like it would be difficult to build a coherent sentence the way (my understanding of) the language works. Other languages that are even more complex just seems impossible to me that rhyming occurs easily or frequently.

Answer accepted (score 11)

Rhyming is not a uniquely English phenomenon. It is present in Chinese, for example. As one article puts it, “Chinese words are made up of so few possible syllables that it’s almost hard NOT to rhyme.” (original article: Rhyming in Mandarin)

The main difference between rhyming in English and Chinese, however, is that Chinese rhymes must take into account both the pronunciation of the syllable and the tone (i.e. the tones must match).

Rhyming was an integral part of classical Chinese poetry and there were even large dictionaries of Chinese characters indicating which characters rhymed with each other. (Since the pronunciation of Chinese has changed over time, many classical Chinese poems don’t actually rhyme anymore when read using modern Chinese pronunciation, however.)

Your question mentions the possible difficulty of constructing rhymes that form coherent sentences in some languages. This is a valid point, for example, for languages that have different syntax than English. Tibetan and Japanese generally have SOV (subject-object-verb) word order where the verb always comes last. Thus if you are constructing a poem in these languages and you want the last words/syllables of two sentences to rhyme, you would be forced to find two verbs that rhyme with each other.

I imagine there are other languages which are also easier to rhyme in than English due to differences in syllable structure. Hawaiian language, for example, has a (C)V syllable structure, which means that all syllables always end in a vowel. Thus I imagine it would be fairly easy to rhyme in Hawaiian.

But then again, perhaps other languages have different criteria for what constitutes a “rhyme”.

Answer 2 (score 8)

End-rhyming is actually not even an originally English mechanism! Old English poems didn’t rhyme like we do now. Instead, alliterative verse was widely found in the early Germanic languages. Alliterative verse uses head-rhymes (alliteration) and stress-timing, instead of modern end-rhymes and syllable-timing.

The modern English rhyming scheme was borrowed from the Classical and Romance traditions.

As you say, end-rhymes are more natural to languages that make wide use of suffixal inflections (like most Romance languages and Latin/Greek). Syllable timing is more natural to many of these languages, too, since the interval between syllabic nuclei is approximately equal. Vowel quantity also played an important role in Latin and Greek meter. Stress-timing on the other hand, where the interval between stressed syllables is approximately equal, is more natural to languages like English and Portuguese that have notable vowel-reduction processes. And since English has nearly no suffixal inflections, end-rhymes are less common though, of course, still plentiful in most cases.

See my answer to a similar question on English.SE for a fuller explanation of the verse form.

Answer 3 (score 4)

Well, Mark covered a lot of what I would have said. So I’ll simply add a couple other types of rhyming and languages that prefer them:

  • Biblical Hebrew was fond of rhyming of ideas, where consecutive lines express similar concepts rather than similar sounds.
  • Apart from Western influences, Japanese seems quite fond of onomatopoeia: rhyming between the sound of a word and its concept.

27: How many “words” do I need to learn? (score 25627 in 2013)

Question

I am interested in learning a second language. To do this I have created a list of the 1000 most common words and phrases for a given language. I’ve also established sentences which contains each of these words. Acknowledging the challenge of making a truly comprehensive list of this type, I’m doing my best to include as much relevant information as I can.

I want to know what the average vocabulary size is for someone capable of engaging in everyday casual discussions. Approximately how many words and phrases must one know to have a reasonably fluent conversation in a language? From this answer I plan to expand my vocabulary list.

Examples of words and phrases that I think might well be in this list for English: apple, banana, gay, orange, apartment, sunny, leg, sex, word, sentence, pink, second, holidays, carry, finally, potato, onion, sometimes, rice, shower, whatever, pasta, since, plastic, always, fish, really, last, french, beef, pork, water, sea, fall, love, river, penis, chicken, knife, hit, fork, stolen, spoon, cup, brilliant, probably, cool, breast, dollar, box, foot, circle, fix, journey, help, always, between, ass, remember, sit, clock, run, buy, either, want, normal, fart, hairy, straight, time, internet, on the other hand, after all, kind of, hurry up, shut up, as well, not for long, lie down, oh my god, america, shampoo, pair, around, okay, under, table, money, milk, toilet paper, come on, pick it up, put it down, never, remember, business.

Examples of words and phrases that might not find it onto the list: grumpy, lemon, adore, lawn, infant, hero, witnessed, miracle, punch, gorilla, tip, heel, knuckles, approximately, fluent, author, return, truly, salmon, moon, purple, pineapple, elbow, napkin, accordingly, assumption, as a matter of urgency, have a blast, go to town, get real, don’t get me started, lighten up, get down

Answer accepted (score 14)

From what I’m seeing it’s generally accepted that most English speakers have a vocabulary size of 15000 to 20000 words. It should be noted that some sources place this estimate as high as 50000 to 75000 words.

Of course, only a fraction of those words are used regularly. This study found that knowing as few as 2000 words could lead to a 95% comprehension rate for English speakers. If one increases their vocabulary size to 5000 words (250% the vocabulary size as before), the comprehension rate only increases to 96%. This is why focusing on just the most common words in a language, as you are doing, is a common tactic for language learners.

For lower levels of competency, I’ve found this Yahoo! Answer (and several like it) suggesting that 400-500 words and 150 phrases are enough to communicate in a second language (albeit with a fair amount of difficulty). However, I cannot find any academic research to back up these claims.

The exact number of words you need to know depends on how you define “fluency”, the language you’re speaking, and the context you’re speaking in (for example, business speech, casual speech, and even academic speech are all different registers with different vocabularies). Additionally, this number (as well as how you define “word”) will change depending on the language.

Another problem with these estimates is that they vary in how they define “vocabulary size”. It could mean the number of words that a speaker knows and can produce in conversation or it could mean the number of words that a speaker is able to recognize in conversation, even if they would never use it.

So in summary, the exact number of words one must need to know to be “fluent” in a language varies depending on how you define fluency, which language you’re talking about, if you’re talking about comprehension, speaking, or both, and how you define “vocabulary size”. The best lower-bound estimate I can find is 2000 words. Note that that particular study was based only on comprehension, so maybe you could get away with a slightly smaller number of “active” or “producible” vocabulary (words you know how to use in conversation).

However, as I’m sure you know, there’s more to speaking a language fluently than vocabulary size alone. For starters, there’s your grammar skill. Also, remember that “fluency” is derived from the same origin as “fluid” so a big part of “being fluent” is being able to speak smoothly, evenly, and at a reasonable pace. Similarly you need to be able to listen and understand other speakers to be able to respond appropriately. This all take lots and lots of practice; preferably with native speakers. Even just getting your tongue used to pronouncing foreign words at a reasonable speed can be incredibly difficult. While learning the most frequent vocabulary words is a good exercise, it alone cannot teach you a language.

Answer 2 (score 2)

That depends on dominant semantic scope of a vocabulary for a given language. Some languages, like English, Spanish or Chinese, have more than one meaning for an item.

Others, like Japanese, Finnish or French, target at specific, or ‘occasional’ semantics. E.g, there are special words for ‘nest’ and ‘a nest within a hollow of a tree trunk’, ‘alone pine’ or ‘pine in a forest’ in Finnish, or special words for ‘you-underling’, ‘you-equal’ or ‘you-superior’ in Japanese.

The former languages usually have fewer words in vocabularies as compared to the latter ones.

Or the languages with greater number of cases might have no word for, say, ‘at’ or ‘within’, but a word like ‘up’ might have more than three different varieties.

Answer 3 (score 1)

Going by the words you provided, you might need somewhere around 5000–8000 separate words in English, where inflected forms are counted as one word each. Your phrases or expressions will have to be extrapolated based on this. If you can give us a few more words that you think should be in one’s vocabulary, and a few more that wouldn’t, more precision could be achieved. But it is still a bit arbitrary, unscientific, and non-transferable to other languages…

How your words rank in the list of most frequent words in TV and movie scripts (Wiktionary corpus):

814th apartment

2927 orange

3021 holidays


5182 banana

8110 elbow

8127 assumption

(> 10,000) pineapple

(> 10,000) accordingly

28: What is the relationship between syntax and semantics? (score 25035 in )

Question

There are a number of positions you can take on what the relationship between syntax and semantics.

You could think that syntax is prior and so think that an expression’s syntactic function determines (or, weaker, constrains) the expression’s semantic role. Or you could hold the converse (i.e., an expression’s semantic role determines/constrains it’s syntactic function). Finally, you could deny any significant connections exist and think that they are two largely orthogonal parts of language.

Most people I talk to seem to agree that there is some relationship, but they’re not sure what exactly it is.

What views have linguists developed about the relationship between these two branches? What are some of the classic/canonical articles and what is the current state of the art?

Answer accepted (score 11)

I’ll speak for the research tradition I work in, namely Construction Grammar. In CxG you have something called constructions which are symbolic units that directly link form and meaning. A construction can the be a word, say tree, but it also can be an idiom kick the bucket, or a semifixed construction [the mother of all X] or [what is X doing Y] (what is that fly doing in my soup?), or an argument structure construction (kinda hard to represent). Syntax then is intrinsically interrelated to semantics, and there is a really strong link between both.

There are many flavours of construction grammar, you have on the one hand some very formalist approaches like sign based CxG by Kay and Fillmore; and on the other hand some more cognitive takes on the matter like Langacker, Croft, Bybee or Goldberg. I like better this latter approaches, if you are interested you should check out Croft 2001 Radical Construction Grammar and Langackers Cognitive Grammar, also Goldberg has a very important book Constructions: A Construction Grammar Approach to Argument Structure.

Answer 2 (score 6)

It is not easy to describe the relation between syntax and semantics, but it is probably easy to say why that is not easy: there are different perspectives about syntax and semantics, so the relation depends on what you understand by form and meaning, structure and content. If you look at the history of Chomskyan linguistics, you will find the chapter in which a group of people were working on deep structures so much that they actually were doing semantics and not syntax. Moreover, anaphors and quantifiers became really problematic for the framework, so it became insufficient to explain the linguistc phenomena under discussion. But Chomsky and others were not happy about the division, and today there are many syntacticians who keep themselves away from the “dangerous” interface with semantics.

Now, to be more specific about your question, but still general about the definitions, I think that you could see syntax as independent from semantics but not the other way around. Let’s say the goal of syntax is to develop theories about the similarities and differences between linguistic structures within and across languages. Let’s also assume that we can study elements that are necessary for those structures to be well-formed, and that their meaning is not essential for the interpretation of the whole structure. Then it is possible to say that syntax does not need semantics, or that it is structure what determines meaning. Whether that is interesting or helpful is up to the syntacticians who work under such view. As for semantics, it simply cannot be studied without reference to syntax, for any meaningful phrase or sentence is always a that, a phrase or a sentence, so it must have a certain structure.

If we want to study language in a more comprehensive way, I think the relation bewteen syntax and semantics must be one of interdependence, and thus it is more fruitful to study the way structures are built up and also the meaning that arises from such building operation. Just as we have structure building from a syntactic perspective, we have function application from a semantic perspective. This is one of the several general descriptions of the relation between syntax and semantics, but again, the specific views depend on the theories of syntax and semantics which you are working with. Even if the view is that syntax and semantics are related in some way, there are approaches in which syntactic and semantic structures are generated independently, for instance 1 below. There’s also an interesting and recent article about the syntax-semantic interface that you might find useful, and that’s 2. Another interesting presentation of the mutual influence of syntax and semantics can be found in 3 (link to video).

  1. Jackendoff (2002) Foundations of language: brain, meaning, grammar, evolution.
  2. Hackl (2013) The syntax-semantics interface.
  3. Conference by Barbara Partee (2009) video

Answer 3 (score 4)

As a supplement to MGN’s great answer on CxG in general, there’s a closely related approach by Ray Jackendoff and Peter Culicover called “Parallel Architecture” that I think is relevant here.

The question asks whether syntax determines/constrains semantics, semantics determines/constrains syntax, or neither. But there’s a fourth possibility: both.

Syntax is inherently tree-structured; that’s presumably uncontroversial. And semantics is also inherently tree-structured: PAST(LIKE(John, PROPERTY-OF(Mary, smile))) doesn’t make sense unless the argument to PAST is the whole liking, and one of the arguments to LIKE is Mary’s smile. There are also both syntactic and semantic mechanisms for coreference. And so on.

The way PA handles this is to treat each construction as having not a syntactic structure with dependent semantics, or a semantic structure with dependent syntax, but instead both a semantic structure and a syntactic structure, plus an interface between the two that determines how components are linked up (and possibly adds additional constraints).1

If you look at a fixed idiom construction, like MGN’s “kick the bucket”, you get a simple semantic structure with a nonsimple syntax:2

  • cognitive structure: DIE1
  • syntactic structure: [VP kickV [NP^ the bucket]]1

In other words, at the syntactic level, we have a perfectly normal verb phrase, but the verb, noun head of the complement, etc. aren’t attached to any meaning; rather, the entire verb phrase is attached to a meaning (that’s what the coindexed subscript represents).

Or, for an idiom with a “variable” in it:

  • cognitive structure: SUPERLATIVE(x1)2
  • syntactic structure: [NP theD motherN [PP ofP [NP allD [N’ a1 -s]]]2

Here, both semantics and syntax are compositional, but with different structures. The “a” inside the syntactic construction is linked to the “x” in the semantic construction, and the entire syntactic tree is linked to the entire semantic tree, but, e.g., the PP syntactic subtree isn’t linked to anything.

So, which tree do you “build first” when producing or processing a sentence? Neither. As you go along, you select constructions that have both syntax and semantics and assemble them into larger compositions that also have syntax and semantics, following constraints on both.

In fact, all of the things that traditional generative grammar considers “rules” are like this—the simple active sentence construction is just an idiom with more, and more complicated, variables.

Of course often both syntax and semantics are perfectly compositional, and coindexed in the obvious way, like this construction:3

  • cognitive structure: PROPERTY-OF1(x2, y3)
  • syntactic structure: [NP [DP [NP a2] -’s] bN, 3]]1

… but that’s just a very common feature of many constructions, not actually a rule that applies everywhere, as it would be be in a syntax-first or semantics-first theory.

If you could show that in every case, the semantic structure can be derived from the syntactic structure (as traditional generative grammar holds), or vice-versa (as Generative Semantics claimed), then this whole mechanism would just be extra complexity for no real purpose. But Jackendoff and Culicover argue that can’t be done. Their book Simpler Syntax is a reasonably approachable introduction to their argument.


1. In fact, there’s also a phonological structure, and there’s interfaces between all 6 pairs. For example, that’s how you handle the altered prosody that goes with topicalization, or syntactic effects of phonological weight in cases like heavy NP shift.

2. The “standard” way to draw things is with two trees, and arrows between nodes in the trees, because using “bracket trees” and indexes gets hard to follow as soon as you try to deal with large phrases or anything nonlocal like anaphora or coordination. But I’ll try to keep things simple enough that we can stay with brackets.

3. This is oversimplified; “Mary’s” is probably a separable component at the semantic level as well as the syntactic level, and “-’s” probably has an inherent meaning that the “x’s y” construction relies on rather than duplicating. And I’ve left out most of the semantic constraints on all of these trees—for example, as drawn, there’s no reason you couldn’t use these constructions to build a phrase like “John’s mother of all Marys kicked the bucket”, which you obviously can’t. But I think this is enough to show the idea.

29: What are the historical origins of terms for north, south, east and west? (score 23401 in 2019)

Question

In the course of researching the etymology of the word “Australia”, I was trying to find the Latin words for north and south (the cardinal directions). I found some websites that translate north as “Septentrionalis”, but I understand this to refer to the seven oxen, or what we today call the Big Dipper, as it is in the northern sky. Other websites translate north as “boreas” or “aquilon” though I think that the former is actually a Greek wind god and the later is his Roman name.

Now of course in English the words north, south, east, and west have no transparent etymologies; they are just the words for the directions. But it seems that Latin and Greek had direction words that were derived from other things (gods, stars, oxen, winds).

This line of thought leads to a question: historically and cross-linguistically, how did terms for the cardinal directions arise? Or differently stated, is it the case that the cardinal direction terms are derived from similar processes across different language families?

Answer accepted (score 31)

Direction words arise from the need to coordinate direction. Thus, their nature and usage can vary widely from one language to another. To start, here are a few aspects of a people and their land that can influence the form that direction words / direction morphemes end up taking:

  1. The types of activities requiring coordination (migration, herding, hunting, fishing, gathering, farming, trade relations, war, etc.)

  2. The geographical features of the area (coastline, a central river, tributaries, woods, islands, steppe, mountains).

  3. The current state of inherited cultural knowledge about direction (including mythology; this may largely go back to (1) and (2) from prior states of the language).

The systems that thus arise may be influenced to be similar to the common cardinal directions in English, or instead make the primary distinction “upstream” vs. “downstream,” “inland” vs. “toward the coast,” or “direction towards” vs. “away from a certain very distant landmark.” Taken beyond that landmark, a person might not be as effective at communicating navigation. (Although, technically, the same thing can happen in English: explain which way to go from the North Pole!)

In Orientation Systems of the North Pacific Rim by Michael Fortescue, a close examination of the orientation systems of Wakashan, Tsimshianic, Haida, Tlingit, Eyak, Athabaskan, Eskimo-Aleut, Tungusic, Nivkh, and Ainu leads to the enticing conclusion that by conceiving of a language’s orientation system diachronically, looking at etymologies of direction words and inspecting any inherently paradoxical methods of expressing direction, we can evidence a hypothesis as to whether a language is relatively new to a region (say, having arried within the past 2000 years), as well as perhaps find out where they came from!

For example, if people in a dialect continuum settle a turning peninsula, the meaning of a direction word at the source may be different from at the tip. Similarly, whereas most IE languages use the PIE root aus- “to shine, dawn” for “east,” the Latin word australis “south” may be from the same root. This is "perhaps is based on a false assumption about the orientation of the Italian peninsula, ‘with shift through “southeast” explained by the diagonal position of the axis of Italy’…Or perhaps the connection is more ancient, and [directly] from PIE root aus- ‘to shine,’ source of aurora, which also produces words for ‘burning,’ with reference to the ‘hot’ south wind that blows into Italy." OE

To address your question about English (all quotations from the Online Etymology Dictionary, OE):

  • north < norð < nurtha possibly derives from PIE ner- “left, below” “as north is to the left when one faces the rising sun (cf. Skt. narakah ‘hell,’ Gk. enerthen ‘from beneath,’ Oscan-Umbrian nertrak ‘left’).” OE

  • south < suð < sunthaz is "perhaps related to base of sunnon ‘sun,’ with sense of “the region of the sun.” OE

  • east < east < aus-to-, austra-, "from PIE *aus- ‘to shine,’ especially ‘dawn’ (cf. Skt. ushas ‘dawn,’ Gk. aurion ‘morning,’ O.Ir. usah, Lith. auszra ‘dawn,’ L. aurora ‘dawn,’ auster ‘south’), lit. ‘to shine.’" OE

  • west < west < *wes-t- “from PIE wes- (source of Gk. hesperos, L. vesper ‘evening, west’), perhaps an enlarged form of root we- ‘to go down’ (cf. Skt. avah ‘downward’), and thus lit. ‘direction in which the sun sets.’” OE

Some languages use non-compound words for the ordinal (secondary) directions. For example, Finnish (with help from Wiktionary and Finnish Wikipedia):

  • luode “northwest,” possibly the same etymology as identical luode meaning “ebb / low tide,” loan from a Germanic language, cognate of German Flut, Swedish flod.

  • pohjoinen “north,” from pohja “bottom,” is because the sun is in the north when it’s underneath the horizon, possibly also because the back of a dwelling should be facing the north so as to maximize warmth.

  • koillinen “northeast,” from koi “dawn.”

  • itä “east,” possibly related to itää “to germinate,” that the sun grows in the east.

  • kaakko “southeast,” equated with kaakkuri “red-throated diver (loon)” and kuikka “black-throated diver (loon).” Compare Latin ornithias “bird-wind,” the spring wind that brings the birds.

  • etelä “south,” antonymously to pohjois, is the direction in which the front of the house should face. Compare eteen “to the front.” The Estonian cognate edel means “southwest.”

  • lounas “southwest, lunch” indicates the direction the sun is in at lunchtime. The Estonian cognate lõuna means “south.”

  • länsi “west” may have to do with the day being “flattened” (läntätään ~ litistetään) as the evening arrives.

Answer 2 (score 13)

I will look specifically at the western Classical origins of these terms. The Ancient Greeks in particular had extensive mythology and naming associated with all the Winds (Ἄνεμοι Anemoi) and directions, not just the cardinals. The Greeks were a seafaring people and wind direction was central to their lives.

North

L septentriō (adj. septentriōnalis) = septem “seven” + triō, that is “the seven plough-oxen (stars of Ursa Major). triō (pl. triōnes) is problematic. Most sources give this as meaning”plough-oxen", but this term for plough-oxen is used nowhere else in Latin. A few sources posit that triō < PIE *(s)tē̆r- “star” with loss of initial s as in Indic, and that later mythology led people to reinterpret the root as “oxen”. The transparent meaning of this word to the Romans was “in the direction of the constellation the Plough (Ursa Major)”.

L boreās (adj. boreālis) was also used to mean “north” or “North Wind” and was a direct borrowing from Greek Βορέας. The Romans also called this Wind Aquilō; this latter word is of unsure etymology. There have been attempts to relate it to aquila “eagle”, aquilus “dark” and aqua “water”, viz. “rainy wind”.

Gk Βορέας boreas “north, the North Wind” was a Greek word also of unsure etymology. It has cognates in other Balkan and Slavic languages such as Alb borë “snow”, Srb бура “cold north wind”. It is often said to come < PIE *gʷor- “mountain”. This very likely is a reference to a North wind, cold and perhaps arising from mountains, that was prominent in these people’s original homeland.

South

L merīdiēs (adj. merīdiōnālis) meant “noon, midday” < medius “middle” + diēs “day”. Since the sun is in the South at midday in the Northern hemisphere, this word is self-explanatory.

L auster (adj. austrālis) was the Latin name of the South Wind and the South. Now here is a fascinating bit of history and its relation to language change. Most scholars believe that auster < PIE -aus “shine” - which is the same root that gives rise to the words for “dawn” and “east” in other IE languages! How can the same root be used for different cardinal directions in sister languages? One theory is that since the Italian peninsula runs diagonally NW-SE, the word for “east” shifted to mean “south” since both were in the direction of the Tyrrhenian Sea. Another theory is that since the lands to the South were burning hot, or alternatively since the sirocco was hot, the root aus- referred to this heat.

Gk νότος notos was the South wind of the Greeks. I have no idea what its etymology is.

East

L oriēns (adj. orientālis) was the usual Latin term. The meaning was transparent in Latin: “rising”, viz. “in the direction of the rising sun”. oriēns is the present participle of the deponent orior, “rise” < PIE *or- O-grade of “move”

Gk ἠώς eos was used in Greek and also meant “dawn”, which is cognate with Latin aurora and Germanic east. ἠώς < PIE h₂ewsṓs/h₂ausōs < *aus- “shine”. This is conjectured to be a reference to the shining dawn; but see ‘south’.

West

L occidēns (adj. occidentālis) was the usual Latin term. The meaning was similarly transparent in Latin: “going down/setting”, viz. “in the direction of the setting sun”. occidēns is the present participle of occidō, “fall/go down” < ob “towards/facing” + cadō “fall” < Proto-Indo-European *ḱad- “fall”.

L vesper “evening” was also used to mean “west” in reference to the setting sun.

Gk ἕσπερος hesperos was found in Greek, cognate to Latin vesper and Germanic west. ἕσπερος < PIE wesperos/wekeros “evening” < wes “wind, blow” + *pero “source”. In origin this may have been something like “the direction from which the wind blows”; one can imagine that this is possibly a reference to the prevailing winds in the PIE urheimat.

Answer 3 (score 0)

To add to the above, in PIE

  • South: *aug-
  • North: *(s)kewer- (also North wind)
  • East: *aus(t)-

and the word for the West is unknown.

30: Is “double positive meaning negative” a common phenomenon? (score 22743 in 2014)

Question

The following joke is popular:

An MIT linguistics professor was lecturing his class the other day. “In English,” he said, “a double negative forms a positive. However, in some languages, such as Russian, a double negative remains a negative. But there isn’t a single language, not one, in which a double positive can express a negative.”

A voice from the back of the room piped up, “Yeah, right.”

But I wonder how common is the phenomenon. “yeah, right” isn’t exactly double-positive, or at least it relies on intonation to convey the negative meaning. However, in Bulgarian “да, да” (“da, da” = “yes, yes”) almost always means “no” (still, intonation plays a role).

What other languages exhibit this phenomenon?

The romanian also exhibits this feature. “Da, da”, said in a certain way means “in no way” or “never”.

Answer accepted (score 10)

Yes, for example, it’s the same in Italian “sì, sì” (= yes, yes), but it’s ambiguous, it depends on intonation and not on the words themselves; this means that “double positive = negative” is wrong speaking about the words, but it works through other means. Changing intonation, that “sì, sì” can be absolutely positive as well. We also use a small variation in written language to substitute the intonation. We write “seh seh” or “se se”… More or less like the English slang variation “ye ye”.

By the way, the “double positive” also works for French “c’est ça oui” or Spanish “sí sí”, still, it’s the intonation that plays an important role.

And going back to that… Intonation is a suprasegmental prosodic feature, along with pitch, stress, rhythm and they all belong to Prosody. (There might be other features but right now I don’t remember them.)

The prosodic features of a “unity” in spoken language are called suprasegmental because they occur simultaneously to the utterance (segments). When you say words, forming a sentence, you include rhythm, intonation, stress, pitch and these two “categories” occur at the same time.

You don’t utter words first and intonation later.

So, usually that’s what makes that utterance have a positive or a negative meaning to the hearer.

Answer 2 (score 9)

Ja, ja in Dutch and German can express disbelief too.

But I don’t think ja, ja, да, да, yeah yeah, or yeah, right should really count; they do not express a double positive in the sense that one positive modifies the other or that both modify the same thing. They are just strong (because repeated) positives used ironically; they indicate disbelief—but is it an actual negation? You could use any locution to express a negative illocution:

Oh, I’m sure your grandmother won’t be shocked if you announce that you’re gay on Christmas Eve and kiss your girlfriend. She will no doubt be thrilled.

As an alternative, it is possible that ja, ja originates in a polite phrase that no-one believes any more (instead of a faded ironic phrase).

The fact that doubling a negative gives a positive is inherent in negations, because they invert whatever falls under their scope. Positives do not invert; they do not change what falls under their scope like that.

Then again, who is to say what counts? Depending on one’s definition, it could be argued that there are two positives, and that the whole is negative. I suppose it is a trivial matter at any rate.

Answer 3 (score 7)

There is a difference between semantics (literal or metaphorical meanings) of utterances and their pragmatics (the real world contextual implications).

Sarcasm (though possibly marked phonetically in English by change in prosody) is a phenomenon of pragmatics.

In English, two positives never make a negative, the ostensible semantic derivation is that it is emphasis by repetition or, with mathematical terminology, idempotent.

The surface meaning of ‘yeah, right’ or ‘yeah, yeah’ is assuredly positive. It is the situation that let’s us see that the student is only implicitly contradicting the teacher.

As to other languages, I’m sure the same joke can be translated, but that doesn’t make the utterance of two positives a negative there also. Your Bulgarian example may be a native ostensible definition for ‘da, da’, but it is not a well known phenomenon among the worlds languages (if truly at all) for it to be a true negative.

31: What is the difference between a word root and a word stem? (score 22692 in 2014)

Question

What is the difference between a word’s root and a word’s stem?

Answer accepted (score 15)

This a metaphor. Both terms refer to plants, but words are not plants.

Metaphors are rarely exact, so there’s no reason to expect the difference between root and stem to be consistent for all languages. The distinction is only useful in a highly inflected language like Latin; in English both words are used in the same way – to indicate what one adds an affix to. Since there are very few affixes in English, it really doesn’t matter.

In Latin, however, it does. Latin verbs typically have at least two, and frequently three, different stems: the infinitive stem, which forms the nonperfect tenses and some non-finite forms, and the perfect stem, which forms the perfect tenses and other non-finite forms.

The case, gender, tense, person, number, mood, and/or voice suffixes are added to the appropriate stem. But each stem is formed from a basic root, for each verb. I.e, the metaphor is that the root is the base, and there are several stems growing out of it, all covered with fully inflected leaves formed by adding suffixes.

For example, in Latin

  • am- is the root for ‘love’, with infinitive stem amã-, perfect stem amãv-, and participial stem amãt-
  • vid- is the root for ‘see’, with infinitive stem vidē-, perfect stem vīd-, and participial stem vīs-
  • aug- is the root for ‘help’, with infinitive stem augē-, perfect stem aux-, and participial stem auct-
  • cap- is the root for ‘start’, with infinitive stem capi-, perfect stem cēp-, and participial stem capt-

More details are available here and here.

Answer 2 (score 2)

I thought to quote from two websites that aided me, but to facilitate reading, I edit slightly and eschew blockquotes (>). The first quote is written with plainer and simpler diction and so ought to be read before the second with more formal diction.


1 of 2 quotes

Bases, stems, and roots are the main components of words, just like cells, atoms, and protons are the main components of matter.

In linguistics, the words “roots” is the core of the word. It is the morpheme that comprises the most important part of the word. It is also the primary unit of the family of the same word. Keep in mind that the root is mono-morphemic, or made of just one “chunk”, or morpheme. Without the root, the word would not have any meaning. If you take the root away, all that you have left is affixes either before or after it. Such affixes do not have a lexical meaning on their own.

An example of a root is the word “act”.

Now let’s look at what is a stem and a base and apply them to the root “act” so that you can see how they differ and interconnect to transform a lexical word altogether.

The stem occurs after affixes have been added to the root, for example:

Re-act ↝ Re-act-ion

Hence a stem is a form to which affixes (prefixes or suffixes) have been added. It is important to differentiate it from a root, because the root alone cannot be applied in discourse, whereas the stem exists precisely to be applied to discourse.

A base is the same as a root except that the root has no lexical meaning while the base does: “to act” is the infinitive of “act” and is structured with the base “act”. In many words in our language, a word can be all three: a root, base, and stem (eg: “deer”). They differ in how they are applied during discourse (stem, base) and whether, on their own, they have any lexical meaning (stem, base) or no lexical meaning whatsoever (root).

An example of root, base and stem joined together is the word “refrigerator”:

The Latin root is frīg, which has no meaning in English on its own, and which requires a change in spelling for suffixes.

⟹ refrigerāre = Latin prefix + root + suffix, with no meaning in English of its own yet.

⟹ re- + friger + -ate + -tor = prefix + root + 2 suffixes.
The 2 suffices now produce lexical meaning = stem; spelling changes are required for suffixes.

[The links included with the answer contain the Glossary of Linguistic Terminology for further information.]

Sources: http://www-01.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsABoundRoot.htm
http://www-01.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAStem.htm


2 of 2 quotes

Root, stem, base
Taken from: […] English [W]ord-[F]ormation […] by Laurie Bauer, 1983 (published by Cambridge University Press).

‘Root’, ‘stem’ and ‘base’ are all terms used in the literature to designate that part of a word that remains when all affixes have been removed.

A root is a form which is not further analysable, either in terms of derivational or inflectional morphology. It is that part of word-form that remains when all inflectional and derivational affixes have been removed. A root is the basic part always present in a lexeme. In the form ‘untouchables’ the root is ‘touch’, to which first the suffix ‘-able’, then the prefix ‘un-‘ and finally the suffix ‘-s’ have been added. In a compound word like ‘wheelchair’ there are two roots, ‘wheel’ and ‘chair’.

A stem is of concern only when dealing with inflectional morphology. In the form ‘untouchables’ the stem is ‘untouchable’, although in the form ‘touched’ the stem is ‘touch’; in the form ‘wheelchairs’ the stem is ‘wheelchair’, even though the stem contains two roots.

A base is any form to which affixes of any kind can be added. This means that any root or any stem can be termed a base, but the set of bases is not exhausted by the union of the set of roots and the set of stems: a derivationally analysable form to which derivational affixes are added can only be referred to as a base. That is, ‘touchable’ can act as a base for prefixation to give ‘untouchable’, but in this process ‘touchable’ could not be referred to as a root because it is analysable in terms of derivational morphology, nor as a stem since it is not the adding of inflectional affixes which is in question.

Answer 3 (score 0)

Root, stem, base are used in various linguistic sectors with slightly different meanings, so in each case you have to get information how a special linguistic field or a special author uses these terms. In Latin grammar root and stem have one meaning, in the IE field another, so a general definition of those three terms brings nothing.

32: Why is /h/ called voiceless vowel phonetically, and /h/ consonant phonologically? (score 21498 in )

Question

Why is /h/ called voiceless vowel phonetically, and /h/ consonant phonologically?

Answer accepted (score 24)

A good question, and a very basic one that illustrates an important difference between Phonetics and Phonology (or, as it used to be called, Phonemics): They use different criteria for what’s a vowel and what’s a consonant.

First, an important caveat:

  • This is only true of English; i.e, it’s the English phoneme /h/ we’re talking about.
    (This is not, for example, true of the Malay phoneme /h/.)
    Phonemes and Phonology are localized to individual languages,
    whereas Phonetics is independent of individual language systems.

What that means is that when an English speaker pronounces the words

  • heat, hit, hate, hen, hat, hot,
    for instance,

they are pronouncing the phoneme strings

  • /hit/, /hɪt/, /het/, /hɛn/, /hæt/, /hat/

Phonemically, and using /h/ as a Phonological consonant.

Phonological consonants are sounds that pattern in some language

  1. on the borders of syllables,
  2. and that are not used in those languages as syllable nuclei, like voiced vowels are in English.

The key word here is Pattern.
Phonology is all about the patterns that sounds fit into in a given language.
In English, /h/ patterns as a consonant, and that’s that.

However, in pronouncing those phoneme strings
– which represent the way speakers hear the words –
the actual /h/ sounds that the speaker says can be classified physiologically as voiceless vowels,
because a Phonetic vowel is defined by how it’s pronounced,
rather than how it patterns with other sounds.

Phonetic vowels are produced by passing lung air through the open mouth
and without significant contact between any articulators.
I.e, vowels are differentiated only by the positions of the tongue and the lips.

For historical reasons, English /h/ only occurs before vowels.
It never occurs before a consonant, or at the end of a word (i.e, before Zero).
It used to occur everywhere, but those /h/’s went silent or mutated,
and are represented in English spelling as GH.
Which is why words with GH in them are so perplexing.

Now, the biggest difference between an /h/ and a following vowel is that
the vowel is voiced, whereas the /h/ is voiceless.

Further, there is not much friction necessary to distinguish an /h/ from its absence, which is the only thing it contrasts with, so all that is really needed is a transition
between voiceless and voiced occurring after the vowel has started.
Rather like the Greek concept of a “rough breathing” (Greek only had /h/ before vowels, too).

And the easiest way to accomplish this reliably and efficiently turned out to be
to pronounce /h/ with a different allophone for every vowel it preceded, like

  • [i̥it], [ɪ̥ɪt], [e̥et], [ɛ̥ɛn], [æ̥æt], [ḁat]

A vowel symbol with a circle below represents a voiceless (whispered) vowel.
Any English speaker can prove this to themself easily:

  • whisper eat, it, ate, ett, at, ott, holding the vowel long, to hear its voiceless sound
    then whisper each vowel, but start voicing the vowel and continue with the word.
    You’ll hear an /h/ in each case, if you’re an Engish native speaker.
    And, if you’re paying attention, you’ll notice you don’t move your tongue – only your larynx
    which means you’re saying a different voiceless vowel in each case.

English has no other uses for voiceless vowels,
so they’re available as allophones for /h/.
Language rarely wastes resources.

Answer 2 (score 5)

English h is not a voiceless vowel; it’s a voiceless glide – the non-syllabic counterpart to a voiceless vowel. (If it were syllabic, it would be a voiceless vowel.)

I disagree with Lawler’s answer above only in regard to his omission of a non-syllabic mark on the first sound of [i̥it], [ɪ̥ɪt], [e̥et], [ɛ̥ɛn], [æ̥æt], [ḁat], which he writes as though each word had two syllables.

33: Why does English have both Latin and Greek origins (score 20408 in 2013)

Question

I always assumed that Latin and Greek were related due to English having so many roots from both-but they aren’t, right? So why does English have so many Greek and Latin roots?

Answer accepted (score 20)

English (and most other Western-European languages) adopted many words from Latin and Greek throughout history, because especially Latin was the Lingua Franca all through Antiquity, the Middle Ages, the Renaissance, and later.

However, English has many more words borrowed from Latin than have other Germanic languages, which it owes to the conquest of England by the Normans in the year 1066. The Normans spoke Norman French, which was still much closer to Latin than modern French, especially in spelling. From then on, French was used as the language of administration for a while, and much of this was incorporated into English even as the influence of Norman culture in England waned.

Note that, very, very long ago, in prehistoric times, the Germanic and Italic branches (the ancestor of Latin) diverged from the (supposed) proto-language called Proto-Indo-European. That’s why e.g. English, Greek, Russian, Persian, Urdu, and Latin have certain things in common, although most similarities are now only apparent to the trained eye. The similarities you see between English and Latin are mostly caused by what happened after 1066.

Answer 2 (score 5)

Latin and Greek are related due to both being descended from the same prehistoric ancestor language. English also shares a common prehistoric ancestor with Latin and Greek.

Most languages have a single origin (though creoles and mixed languages have two).

But the origins of a language need not be the origins of each and every word. English has borrowed words from very many languages but from French it has borrowed en masse due to the Normans, who spoke an old variety of French, invading and ruling England almost a millenium ago.

The grammar and core vocabulary of English have origins in Germanic.

Many English words have origins outside English. Many of those had their origins in Norman French.

Norman French, of course has Latin as its origin.

So it’s not correct to say “English has both Latin and Greek origins”. Origin means starting point and the Norman borrowing happened much later than the starting point of English, though still in a remote time from our point of view.

English has Germanic origins. Individual English words have hundreds of exotic origins from languages all over the world. A large percentage of English words have Norman French origins. Norman French had Latin origins. The majority of Norman French words had Latin origins (though there are surely words Norman French borrowed from other languages it was in contact with and passed on to English).

But of course even the Germanic and Latin languages didn’t spring out of nowhere. They also had origins. They are both traced back to a common origin in Proto-Indo-European. Some linguists try to trace the origins even further back and have many hypotheses, but most linguists agree that such retracing is not possible.

Answer 3 (score -3)

Greek words are also seen in the bible. Along with Latin phrases as well.

34: Why are consonants distinguished differently than vowels? (score 19938 in 2013)

Question

Consonants are distinguished normally by features like place of articulation, manner of articulation, voiced/voiceless, etc. while vowels are usually distingusihed by stuff like tongue’s position and roundedness. Why can’t a universal methodology be used to classify both of them? Some of them are similar anyway, e.g. /i/ vs /j/

Answer accepted (score 15)

Because that is precisely what separates consonants from vowels: consonants are sounds produced with a partial closure of the vocal tract. Depending on what type of closure the speaker does (in combination with other factors), a different consonant will be pronounced. And, as you have already seen, consonants can be analyzed according to several features: place of articulation, manner of articulation, aspiration etc.

Vowels, on the other hand, are pronounced with an open vocal tract. Therefore, by definition, it is not possible to apply those features to classify vowels. Other features, such as roundedness, must be used in this case. So, for example, it wouldn’t make sense to speak of place of articulation for vowels.

Answer 2 (score 8)

Basically, vowels are syllable nuclei, and consonants are syllable peripheries. Consonants are the sounds that don’t occur in the middle of a syllable, and vowels are the ones that do.

That’s all, really.

Aside from diphthongs like /ay/ in light, which involve tongue movement during pronunciation, vowels are determined by the shape of the resonating chambers formed by the tongue in the mouth (this is what decides the formants).

Consonants, however, are always modulated by approach or touch of an articulator (usually some part of the tongue) to another part of the vocal chamber, and they have very complex wave-forms, with fricatives being especially noisy.

Independent articulators outside the inner mouth cavity can apply to either kind of sound because they don’t participate in this alternation. For example, the lungs are independent and therefore there are both aspirated vowels and aspirated consonants; the lips can round vowels or consonants; the larynx can glottalize vowels or consonants, as well as voice either, etc.

Answer 3 (score 3)

The SPE feature system is actually the “universal methodology” that you seek. This is because the features represent independently controllable aspects of articulation. In his book Speech Sounds and Features, Gunnar Fant called this property of the feature system “orthogonality”. It means that for the binary features, all combinations of + - are possible, so that 2^(number of features) different sound segments can be distinguished.

The appropriate feature difference between “consonants” and “vowels” is not terribly clear, but let’s say vowels are [+vocalic] and consonants are [-vocalic]. Then the number of possible vowels is the same as the number of possible consonants, and for the consonants classified by the place of articulation features, there are also vowels classified in exactly the same way, by place of articulation.

Chomsky and Halle do depart from the orthogonality assumption when they declare the combination [+high,+low] to be impossible.

The first person to make theoretical use of feature orthogonality, so far as I know, was William Holder in Elements of Speech. Holder argued that there must be a velar nasal consonant in the series [m, n, …] to match the places of articulation of [p, t, k].

35: How many different distinctive sounds can an average human make? (score 19047 in )

Question

If we wanted to create an all new alphabet composed of as much letters as possible, with each letter corresponding to one distinctive sound. What’s the maximum amount of letters we could have?

Oh and please don’t answer “an infinite amount”.

Answer accepted (score 7)

The notion of ‘distinctive’ sounds indicates that the discussion must be limited to phoneme inventories found in a single language. To do this we can consider the largest known inventories of contrastive (i.e. which I’m taking ‘distinctive’ to mean for the purposes of this answer) consonants, vowels and tonal features.

Consonant inventories

According to the World Atlas of Language Structures online, chapter 1 ‘Consonant inventories’, the language with the largest known set of consonants is !Xóõ (Southern Khoisan), which has 122 consonants. My understanding is that it is not entirely clear that all the !Xóõ clicks contrast across enough environments that we can be fully satisfied with this number, so we could take instead the largest such inventory that does not include clicks; this is usually said to be Ubykh (Abkhazo-Adyghean) with around 80 consonants (not counting the consonants that only occur in onomatopoeia and loanwords).

Vowel inventories

WALS takes a simple approach to counting vowel inventories, only counting the vowel qualities rather than all distinctive vocalic segments. Thus they only consider the three features of height, backness and lip rounding, and ignore length contrasts, nasalisation, etc. With this approach they find the largest inventories to be around 14 vowels (German, Indo-European). If we include diphthongs and triphthongs then we have, e.g., Fering/North Frisian (Indo-European) with 22 vowels. Of course, if we move even further from the WALS approach and include secondary articulations such as nasalization, phonation, tongue root retraction, pharyngealization, etc then we could probably find much larger inventories.

Tone

There are a few languages thought to have up to 9 tones, e.g. the Kam language (Tai-Kadai)

Largest attested segmental inventories

Combining the WALS maps we can identify a small group of languages that have large inventories of both C and V. !Xóõ is the largest with Ubykh following, but several others have total C+V inventories approaching 60 or so segments:

  • !Xóõ (Southern Khoisan) depending on analysis has anything from 107 phonemes up to 142 (as well as 2 tones) so comes close to the theoretical maximum outlined below
  • Ukykh (Abkhazo-Adyghean) has 86 phonemes
  • Sindhi (Indo-European) with 62 phonemes, but this includes some vowel length contrasts as well as a phonation contrast on consonants.

Largest theoretically possible inventory

If we take the more conservative numbers of 80 C, 14 V and ignore tone (potentially 1,120 CV syllables) then we require the orthography to represent only(!) 94 phonemes.

If however we combine the maximum consonant, vowel and tonal inventories then we could theoretically have a language with up to 122 C, 22 V and 9 T. This would require an orthography with 144 segmental symbols and some kind of tone-marking as well. If tone marking used segmental symbols (as is common in the Tai-Kadai and Hmong-Mien families) then a further 9 symbols could be added bringing the total to 153. It’s worth noting that such a language would theoretically have over 24,000 possible CV syllables!

Answer 2 (score 4)

I do not know for an average person, but I can describe my own situation.

Let’s count how many phonemes I can recognize and produce. My native language is Russian, so my phonetics is based on that language with some additions. For sounds existing in Russian I will use the transcription system taught in Russian scools, based on Cyrillic, because IPA is not quite suitable for the case.

Some notes.

  • Most consonants (with exceptions I will mention) come in 4 variants for me: the voiced hard, voiced soft, voiceless hard, voiceless soft. Some of these variants do not exist in Russian language, yet they are readily distinguishable to me and I can produce them. Some quadruples form pairs between each other, corresponding to plosive and fricative variants so to produce octets.

  • The vowels come in 2 variants for me: the variant coming after a soft consonants and the variant coming after a hard consonant. I would call them “soft” and “hard” vowels, although the Russian phonology does not use this distinction. For me the distinction is like between German u and ü and o and ö. For me soft vowel comes always after a “soft” consonant and a “hard” vowel always comes after a hard consonant (that’s why they are not considered distinct phonemes in Russian phonology), yet for me either “soft” or “hard” vowel can be at the beginning of a word, a distinction not existing in Russian, but existing in German.

    Consonants

    [б], [п], [б'], [п'] - four variants
    [в], [ф], [в'], [ф'] - four variants, forms plosive/fricative octet with the prevuous
    
    [д], [т], [д'], [т'] - four variants
    [ҙ], [ѳ], [ҙ'], [ѳ'] - [ҙ] represents English "th" in "this", [ѳ] represents 
                           the "th" in "thick". The soft-hard distinction is not 
                           phonemic in any language known to me, yet I can easily 
                           produce and recognize it. 
                           Forms a plosive/fricative octet with the prevuous.
    
    [з], [c], [з'], [c'] - four variants
    [ӡ], [ц], [ӡ'], [ц'] - Russian has only hard voiceless variant and it is difficult
                           to me to recognize the soft variants because they mix for
                           me with [д'] and [т']. 
                           Forms affricate/non-affricate octet with the prevuous.
    
    [ж], [ш], [ж'], [ш'] - Russian does not have the soft variant [ж'] 
                           yet I can easily recognize and produce it.
    [ӂ], [ч], [ӂ'], [ч'] - Russian has only soft voiceless variant. 
                           Forms affricate/non-affricate octet with the prevuous.
    
    [г], [к], [г'], [к'] - four variants
    [ғ], [х], [й'], [х'] - for me the hard variant of й is equal to voiced velar
                           fricative, which does not exist in Russian. 
                           Forms plosive/fricative octet with the prevuous.
    
    [р], [р']            - this has no voiceless variant for me
    [л], [л']            - this has no voiceless variant for me, 
                           groupped with the prevuous as glides.
    
    [м], [м']            - this has no voiceless variant for me
    [н], [н']            - this has no voiceless variant for me, 
                           groupped with the prevuous as nasals.
    
    [ŋ], [ŋ']            - velar nasal not existing in Russian

    So this constitutes 50 consonants.

    I also recognize the following vowels:

    [а], ['а] - in Russian only the first variant can start a word
    [о], ['о] - in Russian only the first variant can start a word
    [у], ['у] - in Russian only the first variant can start a word
    [э], ['э] - in Russian only the second variant can start a word
    [и], ['и] - in Russian only the second variant can start a word, 
                except some foreign placenames starting with Ы letter.

    This gives 10 vowels, yet the “softness” of a vowel is totally determined by the preceding consonant, except at the word’s beginning.

    Also each vowel can be stressed or not, but no more than one vowel per word.

    Of course there are other phonemes in different languages, but I have just described what I myself can recognize and produce.

Answer 3 (score 1)

131,072 is the number of distinguishable sound segments in a system of binary distinctive features with 17 different features. My informal count of the distinctive features in the Sound Pattern of English feature theory is 17.

If you want to derive a writing system that can distinguish 131,072 distinctive sounds, I’d suggest the use of diacritics (lots of them).

36: What characteristics are unique to English (or at least rare among language as a whole)? (score 18699 in 2018)

Question

After wondering about this today at work, I turned to the Internet.

A short piece that focuses on pronunciation points toward “none”. I’ve scoured ELU and Google (perhaps not as thoroughly or effectively as some others might), but cannot find an answer specific to this question.

I realize this may be considered a broad question, though, so let me clarify what I mean by “unique”. I am not referring to words that only occur in English or one-off exceptions to grammatical rules, i.e. trivial language-specific features (that are innumerable and don’t belong here anyway). I’d like to see something more along the lines of what is presented in this paper on unique features of Lithuanian.

The author gives nine unique traits to Lithuanian (I list some with my comments in parentheses):

  1. frequentative past tense
  2. 13 participles in active use (more than other languages?)
  3. four functional locative cases (more than others?)
  4. no irregular, or suppletive forms in the comparative and superlative forms in adjective and adverbial systems
  5. uniform stressed syllable intonation
  6. preserved several words or forms exactly as they are reconstructed for the distant proto-language (PIE)
  7. all the basic possessive adjectives and possessive pronouns are expressed by genitive form

I’m not sure if #2 and #3 are unique in the sense that English could also claim X participles where only the number is significant, not the existence of certain participles unique to the language.

I am not looking for a thesis, but perhaps a short list along the same lines as the paper above. In short, what does English do that no other language does?

Edit 1: For anyone voting to close, perhaps you can help me rephrase my question or so that it’s in line with the kind of succinct answer I’m hoping for (e.g. some “unique features of Lithuanian”). Comments/critiques are welcome.

Edit 2: John Lawler mentioned WALS, and it is the kind of features cataloged that I’m after. For instance, double-headed relative clauses or optional triple negation are only found in a few languages. Again, it may be the case that English is too “mixed” with cross-linguistic features for it to have any unique quality.

Answer accepted (score 9)

English commonly resorts to post-auxiliary ellipsis as in

  1. They arrived late, as I thought they would.

This is already pretty rare among the world languages. But, as far as I know, English is unique in the variety of constructions where ellipsis is tolerated. For instance, it tolerates voice mismatch as in

  1. This problem was to have been looked into, but obviously nobody did.

or even non-verbal antecedent as in:

  1. Mubarak’s survival is impossible to predict and, even if he does [survive], his plan to make his son his heir apparent is now in serious jeopardy.

I have never heard of another language tolerating constructions like (2) and (3).

Answer 2 (score 18)

English commonly resorts to post-auxiliary ellipsis as in

  1. They arrived late, as I thought they would.

This is already pretty rare among the world languages. But, as far as I know, English is unique in the variety of constructions where ellipsis is tolerated. For instance, it tolerates voice mismatch as in

  1. This problem was to have been looked into, but obviously nobody did.

or even non-verbal antecedent as in:

  1. Mubarak’s survival is impossible to predict and, even if he does [survive], his plan to make his son his heir apparent is now in serious jeopardy.

I have never heard of another language tolerating constructions like (2) and (3).

Answer 3 (score 18)

If expletive infixation isn’t unique to English, what about recursive expletive infixation, however impractical it may be.

Ex:

“Holy mother-mother-fucking-fuck!”

37: Differences between phonemic and phonetic transcriptions (score 18384 in 2015)

Question

As far as I know, there are three main differences between phonemic and phonetic transcriptions:

  1. Phonetic transcriptions deal with phones or sounds, which can occur across different languages and speakers of these languages all over the world. On the other hand, phonemic transcriptions deal with phonemes, which can change the meaning of the words in which they are contained if replaced; for example, /bɪt/ and /pɪt/.

  2. Phonetic transcriptions provide more details on how the actual sounds are pronounced, while phonemic transcriptions represent how people interpret such sounds.

  3. We use square brackets to enclose phones or sounds and slashes to enclose phonemes. So, for instance, it would be incorrect to place the aspirated allophone of the phoneme “t” between slashes–it should go inside square brackets because it’s not a phoneme (which means that its omission wouldn’t change the meaning of the word that contains it).

Are there any more differences between phonemic and phonetic transcriptions?

Answer accepted (score 5)

Different linguists have different ideas about phonetic versus phonemic. Mine is one that I think is close to the original conception of Baudouin de Courtenay and his student Kruszewski (to whom we owe the term phoneme). My version is a little dumbed down.

Phonemes are what we hear. Phonetics is what we say.

At least, naive native speakers hear phonemes, not phonetics. After a course in articulatory phonetics, speakers start to hear some phonetic detail. Recognition of dialects requires some ability to hear phonetics, and that doesn’t depend on schooling.

This is why traditional alphabetic writing systems are mostly phonemic, rather than phonetic, why poetic rhyme is phonemic, and why secret languages like pig latin are based on phonemics.

Answer 2 (score 1)

Phonemic and phonetic transcriptions are created with different purposes in mind.

Phonemic transcriptions are often kind of a proto-orthography for spoken languages, and they are used to describe languages (in example sentences or grammars) that lack a proper orthography.

Phonetic transcriptions are often used by dialectologists to describe how the sounds of dialects differ in space and time. Sub-phonemic sound shifts are a typical research topic in this area.

Answer 3 (score 1)

Phonemic and phonetic transcriptions are created with different purposes in mind.

Phonemic transcriptions are often kind of a proto-orthography for spoken languages, and they are used to describe languages (in example sentences or grammars) that lack a proper orthography.

Phonetic transcriptions are often used by dialectologists to describe how the sounds of dialects differ in space and time. Sub-phonemic sound shifts are a typical research topic in this area.

38: Why do Japanese people have difficulties in pronouncing English? (score 17826 in 2017)

Question

When I watch Anime, I notice that Japanese English pronunciation is really bad, they twist all the sounds, and they can’t pronounce sounds like “L”. I think English is the easiest language when it comes to pronunciation, and I don’t think it has something related to their native language, as for me, I learned Arabic,French and English, and I have no problems with the pronunciation, even though my mother language (Tamazight) has nothing related to these languages.

Answer accepted (score 106)

Several reasons:

English pronunciation isn’t easy

Don’t think that, just because you find it easy, most people in the world will; English pronunciation is actually quite complex by any measure. The language has something around 10 vowels (not counting diphthongs) and 44 phonemes; well above the average, and more than double Japanese’s 5 vowels and 17 phonemes. What’s more, English syllables are unusually complex, and may have long sequences of consonants (as in “lengths”) and consonant-only syllables (as in “bottle”). Even people in Spain or Italy will unconsciously add vowels between the consonants in order to simplify English syllables—and Japanese syllables are even simpler.

People generally have trouble with foreign sounds

You say you find it easy to pronounce sounds not in your native language, and I don’t doubt you. However, you’re being unfair to the Japanese by singling them out; just listen to second-language speakers anywhere in the world, and you’ll find that people generally have trouble distinguishing sounds not in their languages. For example, the Chinese speakers I known find it difficult to distinguish sounds like b/d or e/ɛ when learning my native Portuguese. English gringos have trouble with the nasal “ã”, or even with saying a simple /o/ without turning it into a diphthong [oʊ]. I have trouble with English t/th; anyone from a non-tonal language has trouble with Chinese tones, and so on and so forth.

Of course, people can learn these sounds (if properly instructed); but it’s a well-known fact that many don’t, even after years living in another country. This is interesting, because babies always learn all the sounds people speak around them (well, except for a few cases of “speech dysfunction”, like lisps; but these are unusual). Adult foreign learners have a lot more difficulty.

English is incorporated into the Japanese sound system

Did you know that around 58% of English words came from Latin and French? But they pronounce them very different than actual Latin or French. The word “Latin” itself, for example, is like [ˈlæ.ʔn̩] rather than /ˈla.tin/. Et cetera is /ˌɛtˈsɛtɹə/ instead of /ɛt ˈkeːtera/, and so on.

My point is, when you incorporate lots of words from a foreign language into your own, you adapt the pronunciation to fit yours. It would be a bother to keep changing from English sounds to Latin sounds all the time.

Japanese has something like 30% “foreign” (gairaigo) words (not counting Chinese), most of which are English. Just like English speakers adapt words like “Et cetera”, “Paris” or “Mexico” to fit their own sound system when speaking English, Japanese adapt English words (and others) to fit their own sound system when speaking Japanese. So “hamburger” becomes hambāgā, “cut” becomes katto, etc.

Unfortunately, this means Japanese people have an easy-available but inexact “version” of English already in their minds, and this makes it harder to learn actual English pronunciation. When English people learn Latin, they have to begin by unlearning the way they pronounce Latin words and Latin letters, and get used to actual Latin sounds. Japanese people, too, have to learn to set aside their native gairaigo pronunciation.

If the Japanese person needs to actually speak English, for example to live in England, they will learn it; but if they’re just living in Japan and talking to other Japanese people, they’ll just use the Japanized English words, for the same reason that English speakers use an Anglicized pronunciation of Latin words when talking to each other.

English classes in Japan usually suck

Japan has mandatory but atrocious English education, which is worse than no education. Richard Schmidt has pointed long ago that foreign learners won’t acquire non-native sounds unless you draw conscious attention to them first. However, the typical English course in Japan doesn’t try to teach the basics of articulatory phonetics or how to enunciate the sounds; they just keep doing grammar drills and such. I’m living in Germany, and I got a Japanese student to pronounce German-only sounds like “ö” and “ü” in one afternoon by explicitly explaining tongue and lip positions to her. I’m sure that, if a Japanese person is actually taught English phonology, they’ll be able to pronounce English without major issues. I’ve met plenty of Japanese individuals with better English pronunciation than mine (admittedly, they usually have lived overseas, and therefore had incentive to learn actual English pronunciation, as opposed to their native Japanized English).

Answer 2 (score 16)

Here’s an answer from developmental psychology:

When a baby is born they can natively pronounce phonemes of every language, but as they develop, their brains are constantly calculating and keeping track of which phonemes are more often said. This causes the baby to lose their ability to natively pronounce or even differentiate other phonemes except the ones spoken in the languages of their environment. This does mean that biracial babies will be able to pronounce more phonemes natively.

After what is known as the “critical period”, it is very, very difficult to retrain the brain to learn to pronounce/differentiate new phonemes.

Quote from Neuroscience, 2nd Edition, specifically answering about the pronounciation of the l/r:

Very young human infants can perceive and discriminate between differences in all human speech sounds, and are not innately biased towards the phonemes characteristic of any particular language. However, this universal appreciation does not persist. For example, adult Japanese speakers cannot reliably distinguish between the /r/ and /l/ sounds in English, presumably because this phonemic distinction is not present in Japanese. Nonetheless, 4-month-old Japanese infants can make this discrimination as reliably as 4-month-olds raised in English-speaking households (as indicated by increased suckling frequency or head turning in the presence of a novel stimulus). By 6 months of age, however, infants show preferences for phonemes in their native language over those in foreign languages, and by the end of their first year no longer respond to phonetic elements peculiar to non-native languages. The ability to perceive these phonemic contrasts evidently persists for several more years, as evidenced by the fact that children can learn to speak a second language without accent and with fluent grammar until about age 7 or 8. After this age, however, performance gradually declines no matter what the extent of practice or exposure.

-The Development of Language: A Critical Period in Humans

As other answers note, Japanese and English doesn’t necessary have much phoneme overlap

From a very famous and informative TED talk about this topic, that explains that it’s very difficult for the brain of a native Japanese to differentiate between an “l” and an “r”:

What you see here is performance on that head-turn task for babies tested in Tokyo and the United States, here in Seattle, as they listened to “ra” and “la” — sounds important to English, but not to Japanese. So at six to eight months, the babies are totally equivalent. Two months later, something incredible occurs. The babies in the United States are getting a lot better, babies in Japan are getting a lot worse, but both of those groups of babies are preparing for exactly the language that they are going to learn.

-Patricia Kuhl, The linguistic genius of babies

The TED talk even shows demonstration videos on this effect, check it out.

Answer 3 (score 5)

Part of a theory of foreign pronunciation proceeds straightforwardly from David Stampe’s theory Natural Phonology. Every natural language is phonetically difficult for a child, because many sounds tend to be changed into easier ones. A child’s task is to learn not to let this happen for the languages he must learn to pronounce. See Stampe’s article in CLS 5, The acquisition of phonetic representation. The phonological system of a language consists in the set of those those simplifications of sounds, the “processes”, which needn’t unlearned by children in order for them pronounce their language correctly.

There is a considerable difference between what must be learned by children learning Japanese and children learning English. For instance, children have a tendency to drop word-final obstruents (p/b/t/d/k/g/f/v/s/z/…), and they have to get over that to pronounce English correctly, but to learn Japanese, they needn’t bother, because Japanese doesn’t have word-final obstruents. This is why Japanese trying to pronounce English drop such obstruents at first. It is simply something they didn’t have to learn when learning Japanese as children.

I don’t know whether it is available online, but Julie Lovins’ dissertation Loanwords and the phonological structure of Japanese is an excellent account along these lines of what we can infer about the structure of Japanese from borrowings of English words into Japanese. See Julie Beth Lovins.

In summary, English words are easy to pronounce for English speakers, because they’ve learned how to do that, but difficult for Japanese speakers who haven’t yet learned how. Pretty obvious, really (though the details are complicated).

39: Why are affix hopping and head movement considered as distinct operations? (score 17654 in 2011)

Question

Affix hopping is a morphological operation by which an unattached affix in the T position is lowered onto a verb. This attachment is done by the “Phonetic Form component” (the posited component in the mind that transforms the inputs it receives from the syntactic component into phonetic spell-outs). For example:

The boy kicks the ball.

The underlying syntax tree for this sentence can be represented as:

The boy kicks the ball

Here, the suffix -s (that indicates third person, singular, present tense) has been taken from its original position and then attached to the end of the verb, producing the form kicks.

Head movement is another operation, by which the head of a phrase is moved (in fact, copied) from its original position to another one. This is a syntactic operation. An example of head moving for a question in English:

Will you marry me?

which can be syntactically represented as:

Will you marry me?

In this case, the auxiliary will has been moved from its original position to become attached to the null complementizer that marks a question.

Both operations involve moving a head from one position to another. But affix hopping is assumed to occur at a different level (or different “mind component”) than head movement. What is the fundamental difference that makes linguists classify these two apparently similar kinds of movements as two very distinct operations?

Answer accepted (score 10)

There are a couple of reasons that syntacticians make this distinction:

  • Word order facts
  • Locality facts

As to the first, in a language like French (which has obligatory V -> T head movement), adverbs can separate the (main) verb from its object:

(1)  Jean embrasse souvent Marie
     J.   kisses    often   Mary
(2) *Jean souvent embrasse Marie
     J.   often   kisses    Mary

In English (no V -> T movement), the facts are exactly the reverse:

(3) *John kisses often Mary
(4)  John often kisses Mary

The second point has to do with the fact that Lowering cannot jump across an intervening projection, whereas Head Movement can1. So in a structure like the following, we predict that T cannot lower onto V, because of the intervening Neg. This is the motivation for do-support – do is inserted to “support” the features in T (which would otherwise be flagged ungrammatical by the Stray Affix Filter, which says basically that you cannot pronounce a bound morpheme in isolation).

tree

*"John -ed not kiss Mary" --> "John did not kiss Mary"

Other languages which lack head movement have different strategies for avoiding this clash. Mandarin, for example, which lacks verb movement to C, has a non-affixal question complementizer ma.

Analyses which merely say “and then it moves in the phonology!” are punting the question of what happens in Lowering derivations. If you are curious about how to say something substantive about Lowering/Affix Hopping in the phonology, this 2001 paper by Embick and Noyer (pdf link) is a recent-ish attempt to construct a theory of post-syntactic movement. (It also has several examples of post-syntactic movement other than the standard English verb-inflection one.)



(1): Many syntacticians would like to say that head movement in fact cannot skip intervening heads, and must stop at every one along the trail from source to destination. But this isn’t agreed on by everyone; “long head movement” is the name that putative counterexamples go by.

Answer 2 (score 2)

Oh dear, I haven’t heard about affix hopping since the good olde GB-days. In Minimalist syntax (aka Program), there is Agree. So, in your first example, T has an uninterpretable phi-feature, number (among many others), and it gets its value from (or it is checked by) the DP in the specifier position of (little) v. See Adger’s 2003 Core Syntax for a start.

Answer 3 (score 0)

Concerning mininamlism: Also Adger’s sophisticated Concept of features does not get rid of the basic problem, that do support is irregular - it only applies to negative affirmatives and interrogatives and positive interrogatives (without having an extra emphatic taste to it). Adger tries to capture this via a last resort do insertion, which in my opinion is a far too powerful concept. Fun fact: diachronically speaking do support used to be regular - it could be used in a non emphatic sense in modern English positive affirmative sentences.

40: Why does linguistics focus on spoken languages rather than written ones? (score 17248 in 2011)

Question

I might be wrong since I’m unable to find any sources supporting this, but it’s increasingly my gut feeling that linguistics appears to focus on spoken languages as opposed to written ones. If this is the case, why is it?

Answer accepted (score 17)

Linguistics is the scientific study of language. A language is narrowly defined as the set of rules that “speakers” (speaking or signing) acquire when they are very, very young. There is evidence for processes of language acquisition underway at the very youngest testable ages (under a year old).

A speaker with a grammar like mine knows, without ever being told, that strings of words like colorless green ideas sleep furiously are well-formed, but strings like furiously sleep ideas green colorless are not. I can say that the first sentence would be true if (i) there existed some things that were colorless, green, and ideas, (ii) such things could sleep, and (iii) such sleepings could happen furiously. The other string supports no such interpretation. Since I don’t have to look up these two strings in some kind of book to tell whether one is good or bad, we say that my grammatical rules are internalized.

The rules of grammar (again, in the linguist’s sense) are learned implicitly. They are not taught, as the rule “i before e except after c” has to be learned when one learns how to spell, but rather inferred on the basis of experience in a linguistic environment (i.e., a community of speakers who do some speaking in the presence of the child).

Lastly, knowledge of language is universal. Barring severe cognitive deficits or social deprivation, every normally-developing child acquires a native tongue.

Written language stands in sharp contrast: children learn to read and write much later (usually once they start attending school) than they learn to understand and produce spoken language; written language has no internalized rules, it reflects only the speakers’ internal rules and, possibly, a speaker’s regard for the prescriptive rules of style; written language has to be learned explicitly (often laboriously); and, written language is not universal—there are many, many people on this planet who have knowledge of language but can’t read or write.

“Written language” is an artifact of (some) human cultures who already had spoken languages. Spoken language is definitional of our species, and it is this kind of species-specific capacity that linguists study. This is not to say that the capacity for writing is not complex; it is in fact more complex than the capacity for language, in that explaining how we can have writing systems requires in part an explanation of what the language we’re writing down is in the first place.

Answer 2 (score 10)

Your question is ambiguous, so I’ll just cast my net wide.

There is a lot of research into different aspects of language that don’t necessarily involve speaking. Psycho- and neurolinguistics, both thriving subdisciplines, focus their efforts on understanding the psychological and neurological components of language production and processing. Morphology, phonology and syntax focus on the rules that govern the production of language, not necessarily the particular modality of production. A lot of language acquisition research, such as that focusing on how babies process pointing gestures or gazes, or on the strategies that children use to learn a language, has very little to do with spoken language.

A lot of disciplines that prima facia deal with spoken language, such as phonetics or phonology, can and have been successfully adapted to sign languages. For example: research has shown that their are sign language analogues for place and manner of articulation, and that sign language has units of production that can be rightfully refereed to as phonemes.

Written language is a reflection of spoken, so most of what we know about spoken language can be transfered to written. The converse is not true. Still, there is a great deal of research focusing on how written language is processed.

Spoken language is a part of what makes us human. Every child will become fluent in their native language unless they suffer from massive cognitive or social deficits. Even mentally retarded children, or others with low IQ’s, still achieve a great deal of fluency. If someone is deaf or mute, they will either acquire or invent a signed language. Acquiring literacy, on the other hand, is laborious process. Most languages do not have a written form, and even in those that do, not everyone achieves literacy. Spoken language is more widespread and universal, and therefore more interesting than the mostly cultural invention we call ‘writing’.

Finally, a lot of linguistics has to be done with spoken language. A lot of phonetics, for instance, is about sound production, transmission, and perception. Children learn spoken language far earlier than written language, so most interesting research on language acquisition is about spoken or signed language. As I pointed out above, most languages do not have a written form, so any linguistic research concerning them has to focus on their spoken forms.

Answer 3 (score 9)

The attitude that spoken language is somehow primary to written (usually in a more general sense than geneologically) is known as logocentrism and has been extensively studied by, amongst others but arguably most famously, Jaques Derrida in his Of Grammatology.

41: Difference between discourse analysis and pragmatics (score 16694 in )

Question

Could you explain for me what is the main difference between pragmatics in linguistics and discourse analysis? Both are related to study of use of language in real world.

Answer accepted (score 2)

As far as I understand it:

Speaker 1: would you like to go for a drink?

Speaker 2: great! what time?

The discourse analysist would be looking at how this communicative event works mechanically whereas the pragmatist would be looking at underlying (implicit) meanings (in this case ‘drink’ means trip to the pub, for example).

Answer 2 (score 1)

As for as I inferred, pragmatic focuses on impliticit meaning that z being conveyed with reference to specific context.. While discourse analysis focuses on communicative aspects..

Answer 3 (score 0)

AFAIK Consider a paragraph of sentences, discourse analysis looks how the sentences are glued together while pragmatics looks on meanings that are not encoded in the sentences!

e.g The teachers allowed the children to play because they are busy in some work! The teachers allowed the children to play because they are too tired to listen to classes!

Here the word “they” means different in the above two sentences which requires external world knowledge! It appears to me that pragmatics is a deeper branch of discourse analysis!

42: What is the difference between “anaphora” and “deixis”? (score 16375 in 2017)

Question

I’ve been trying to understand an answer from another SE site (Japanese Language & Usage) where the answer involves the differences between anaphora and deixis. Here is the link.

What would be the difference between anaphoric and deictic usage? If possible, please provide some examples showing the differences.

Answer accepted (score 9)

I don’t know any Japanese, but generally, an anaphora is an expression that refers to something mentioned earlier in the text:

John is tired because he has been working all day.

Mary said the moon is made of cheese. I don’t believe that.

He in the first example refers to John; that in the second one refers to the moon being made of cheese.


A deixis is an expression whose exact meaning varies with the context in which it is uttered.

  • Who exactly is meant by you? That depends on who is being spoken to.
  • Where is here? Whereever the speaker is, or where he is pointing to.
  • What time will it be in one hour? That depends on when it is being said.

Answer 2 (score 9)

I don’t know any Japanese, but generally, an anaphora is an expression that refers to something mentioned earlier in the text:

John is tired because he has been working all day.

Mary said the moon is made of cheese. I don’t believe that.

He in the first example refers to John; that in the second one refers to the moon being made of cheese.


A deixis is an expression whose exact meaning varies with the context in which it is uttered.

  • Who exactly is meant by you? That depends on who is being spoken to.
  • Where is here? Whereever the speaker is, or where he is pointing to.
  • What time will it be in one hour? That depends on when it is being said.

43: Why is jargon sometimes used instead of familiar words? (score 16304 in 2018)

Question

I wonder why certain words are used in occupations when possibly a common known synonym could also be used?

Examples:

  • in law, desist instead of stop, cease
  • in economics, parity instead of equality
  • in medicine, onset of a disease instead of start, beginning

What role or function does professional jargon have in these instances?

Answer accepted (score 5)

Reasons for the use of jargon are precision and the tendency of communities (professional or otherwise) to develop their own vocabulary.

Precision

Members of a community, such as lawyers, might sometimes feel the need to invent new words for concepts that exist in general usage to make it clear that they are referring to a strictly defined term. This might help avoid confusion between the looser meaning of the word in general usage and the narrower meaning of the one used in specialised discourse.

Another option is to take the word from general usage and (re)define it in a way that suits the needs of the professional community. For example, German Klage means both lawsuit (law jargon, among others) and complaint, lamentation. Lawyers know that Klage if used by their peers refers to a lawsuit and not a complaint without legal consequences, there is no ambiguity involved. So if precision were the only principle shaping the language of professional communities one would expect very little or at least less coining of new terms to occur, especially since redefining an existing word would be more economical than coining a whole new word.

Identity

Another, perhaps more important, argument I think is that communities tend to come up with their own vocabulary. This is in part because they need to refer to concepts those who do not belong to the community have no interest in or knowledge of. But jargon can also arise in cases where the concept that a new word refers to is not particularly complicated or new, such as in the examples given in the question (desist for stop).

Jargon serves to create a common identity among the members of the group. If you know the jargon you belong to the group, if you don’t know it everybody will notice you don’t belong to the group. Spolsky (1998), for example, says that

A specialised jargon serves not just to label new and needed concepts, but to establish bonds between members of the in-group and enforce boundaries for outsiders. If you cannot understand my jargon, you don’t belong to my group.

In this sense jargon is comparable to slang, which also serves to mark group boundaries. The difference, as Kollataj (2009) points out, is that slang is stigmatised but jargon is not. How groups of people, whether they work in the same professional community or belong to more loosely defined social groups, develop their own ways of communicating has been described by Penelope Eckert as communities of practice.

Answer 2 (score 5)

Reasons for the use of jargon are precision and the tendency of communities (professional or otherwise) to develop their own vocabulary.

Precision

Members of a community, such as lawyers, might sometimes feel the need to invent new words for concepts that exist in general usage to make it clear that they are referring to a strictly defined term. This might help avoid confusion between the looser meaning of the word in general usage and the narrower meaning of the one used in specialised discourse.

Another option is to take the word from general usage and (re)define it in a way that suits the needs of the professional community. For example, German Klage means both lawsuit (law jargon, among others) and complaint, lamentation. Lawyers know that Klage if used by their peers refers to a lawsuit and not a complaint without legal consequences, there is no ambiguity involved. So if precision were the only principle shaping the language of professional communities one would expect very little or at least less coining of new terms to occur, especially since redefining an existing word would be more economical than coining a whole new word.

Identity

Another, perhaps more important, argument I think is that communities tend to come up with their own vocabulary. This is in part because they need to refer to concepts those who do not belong to the community have no interest in or knowledge of. But jargon can also arise in cases where the concept that a new word refers to is not particularly complicated or new, such as in the examples given in the question (desist for stop).

Jargon serves to create a common identity among the members of the group. If you know the jargon you belong to the group, if you don’t know it everybody will notice you don’t belong to the group. Spolsky (1998), for example, says that

A specialised jargon serves not just to label new and needed concepts, but to establish bonds between members of the in-group and enforce boundaries for outsiders. If you cannot understand my jargon, you don’t belong to my group.

In this sense jargon is comparable to slang, which also serves to mark group boundaries. The difference, as Kollataj (2009) points out, is that slang is stigmatised but jargon is not. How groups of people, whether they work in the same professional community or belong to more loosely defined social groups, develop their own ways of communicating has been described by Penelope Eckert as communities of practice.

Answer 3 (score 1)

In addition to precision, jargon is used to mark status and what social group(s) you are a member of, or wish you were a part of. Such jargon then coincides/overlaps with sociolect. For instance, the Alice of “Alice in Wonderland” used “looking glass” for “mirror”. See also U and non-U English at the Wikipedia.

44: Which language has the biggest vocabulary? (score 16210 in 2013)

Question

I am thinking that it is English because it has so many borrowed words and most you French, Italian, or German words can be written in English as is. Am I right?

Answer accepted (score 4)

I would propose a three-way distinction to illuminate the complexity:

  1. Vocabulary - Set of words familiar to any one individual speaker
  2. Lexicon - Set of words available to a particular linguistic community
  3. Dictionary - Complete set of words attested in a language across all linguistic communities

(Note: this is simply for the purposes of elucidating the question of size - not an objective, hard and fast distinction.)

But I would hypothesize that both vocabulary and lexicon would be roughly comparable across similar types of communities over time. So a rural community in modern American will have roughly the same lexicon as a rural community in Ancient Rome. Equally, a community of scholars in a discipline will be operating with lexica of similar magnitude.

If one were to look for a difference, it would be in dictionaries. And here English has the claim to probably one of the largest dictionaries purely by virtue of spanning a number of linguistic communities in many different environments over a long period of time. But the vocabulary of any one individual speaker will be much more limited.

Some of the things people have mentioned here are a bad place to look for size differences:

  1. Identifying any one area where one language has more synonyms than another (e.g. English having multiple words for many verbs) - each language has different areas of diversity and redundancy and areas of compression.
  2. Looking at typological differences (e.g. German creating new words through compounds vs. English using phrases) - these phenomena only account for very marginal differences.
  3. Borrowing or tendency to avoid borrowing (e.g. French vs. Russian) - all languages in contact borrow - or create new words to reflect needs.
  4. Homonyms - some languages (particularly those with more limited phonologies/phonotactics may have more homonyms than others but even that doesn’t impact on overall size - unless we limit ourselves to counting unique headwords regardless of sense.

Ultimately, this question has no precise answer nor does it really need one. The important thing for a language is its facility to create as many new words as are necessary to its speakers. And in this all languages are the same.

Answer 2 (score 4)

I would propose a three-way distinction to illuminate the complexity:

  1. Vocabulary - Set of words familiar to any one individual speaker
  2. Lexicon - Set of words available to a particular linguistic community
  3. Dictionary - Complete set of words attested in a language across all linguistic communities

(Note: this is simply for the purposes of elucidating the question of size - not an objective, hard and fast distinction.)

But I would hypothesize that both vocabulary and lexicon would be roughly comparable across similar types of communities over time. So a rural community in modern American will have roughly the same lexicon as a rural community in Ancient Rome. Equally, a community of scholars in a discipline will be operating with lexica of similar magnitude.

If one were to look for a difference, it would be in dictionaries. And here English has the claim to probably one of the largest dictionaries purely by virtue of spanning a number of linguistic communities in many different environments over a long period of time. But the vocabulary of any one individual speaker will be much more limited.

Some of the things people have mentioned here are a bad place to look for size differences:

  1. Identifying any one area where one language has more synonyms than another (e.g. English having multiple words for many verbs) - each language has different areas of diversity and redundancy and areas of compression.
  2. Looking at typological differences (e.g. German creating new words through compounds vs. English using phrases) - these phenomena only account for very marginal differences.
  3. Borrowing or tendency to avoid borrowing (e.g. French vs. Russian) - all languages in contact borrow - or create new words to reflect needs.
  4. Homonyms - some languages (particularly those with more limited phonologies/phonotactics may have more homonyms than others but even that doesn’t impact on overall size - unless we limit ourselves to counting unique headwords regardless of sense.

Ultimately, this question has no precise answer nor does it really need one. The important thing for a language is its facility to create as many new words as are necessary to its speakers. And in this all languages are the same.

Answer 3 (score 3)

If you count the number of separate roots, English seems one of those with the highest number. If you count word stems, English is far behind say German and Russian. But in the latter case one cannot make an exact comparison because the later two languages have extensive rules of the formation of the new stems by the use of suffixes and root concatenation, which produces astronomical numbers of possible combinations most of which are meaningless or hardly useful.

45: What do all languages have in common? (score 16196 in )

Question

What do all languages have in common ? I’m looking for a list of features (such as grammatical, semantic or phonetic elements) that are present in all natural languages.

Answer accepted (score 3)

Any linguistic answer to this question has to be at least partly theory laden. There are many approaches to linguistic universals.

The most general points would be: 1. All natural languages can be acquired by people born into a community of speakers, or learned by people as a second language (with well-known limitations). 2. The propositional content of all natural languages can be translated from one another given limitations of vocabulary and contextual understanding. 3. The non-propositional content of all languages can be conveyed between one another through some means of human communication.

From the above it follows that all languages will have the same physiological, cognitive, and neural underpinnings (whatever those may be).

However, that seems quite trivial. So when people ask those questions, they are generally looking for some feature like nouns, vocabulary, syntactic constraints. Here the people making the strongest claims are:

  1. Universal Grammar which claims that all languages share certain constraints on their syntax which are the only way to explain their learnability. However, there are only a handful of these and those are virtually incomprehensible. This is a position most often associated with Chomsky and most popularly explained by Steven Pinker.
  2. Semantic Primes claims that there are about 60 words that are common to all languages and through which all meaning can be described (they call this Natural Semantic Metalanguage). It is based around the work of Anna Wierzbicka and the group does some really interesting research even if their broader claim is a bit less palatable.
  3. Modern linguistic typology (whose perspective was taken on in @Darkgamma‘s answer) makes much weaker claims few of which stand and fall with a single counter example. The most accessible explication of this position is RMW Dixon’s ’Basic Linguistic Theory’ which, while not always most uptodate on all issues, gives an accessible overview (over three volumes). Historically, this work on universals was associated with Joseph Greenberg but now the project is much less interested in universals than common patterns and tendencies.
  4. Historical linguistics also makes some claims about universals mostly to do with common origins. The strongest claims are made by some branches which try to find common origins of all languages, e.g. Nostratics, who are not necessarily taken seriously by the mainstream. But even more traditional historical linguistics makes some claims about fairly universal principles of language change.

By the way, both 3 and 4 sometimes call themselves comparative linguistics, so it’s sometimes hard to know what is meant by the term without some context.

Personally, I recommend that every linguist or even anybody interested in language spends some time with modern linguistic typology (which has moved far beyond the inflectional/agglutinting business). While Dixon’s ‘Basic Linguistic Theory’ may not be the first thing on the list, it should be required reading for any linguistics graduates long before they read anything by Chomsky.

Finally, The World Atlas of Linguistic Structures (WALS) http://wals.info is a great place to check any claims as to universality, even if it (unavoidably) relies on data of uneven quality. But it also nicely illustrates the extent and the limitations of the current evidentiary base for claims about universals.

Answer 2 (score 3)

Any linguistic answer to this question has to be at least partly theory laden. There are many approaches to linguistic universals.

The most general points would be: 1. All natural languages can be acquired by people born into a community of speakers, or learned by people as a second language (with well-known limitations). 2. The propositional content of all natural languages can be translated from one another given limitations of vocabulary and contextual understanding. 3. The non-propositional content of all languages can be conveyed between one another through some means of human communication.

From the above it follows that all languages will have the same physiological, cognitive, and neural underpinnings (whatever those may be).

However, that seems quite trivial. So when people ask those questions, they are generally looking for some feature like nouns, vocabulary, syntactic constraints. Here the people making the strongest claims are:

  1. Universal Grammar which claims that all languages share certain constraints on their syntax which are the only way to explain their learnability. However, there are only a handful of these and those are virtually incomprehensible. This is a position most often associated with Chomsky and most popularly explained by Steven Pinker.
  2. Semantic Primes claims that there are about 60 words that are common to all languages and through which all meaning can be described (they call this Natural Semantic Metalanguage). It is based around the work of Anna Wierzbicka and the group does some really interesting research even if their broader claim is a bit less palatable.
  3. Modern linguistic typology (whose perspective was taken on in @Darkgamma‘s answer) makes much weaker claims few of which stand and fall with a single counter example. The most accessible explication of this position is RMW Dixon’s ’Basic Linguistic Theory’ which, while not always most uptodate on all issues, gives an accessible overview (over three volumes). Historically, this work on universals was associated with Joseph Greenberg but now the project is much less interested in universals than common patterns and tendencies.
  4. Historical linguistics also makes some claims about universals mostly to do with common origins. The strongest claims are made by some branches which try to find common origins of all languages, e.g. Nostratics, who are not necessarily taken seriously by the mainstream. But even more traditional historical linguistics makes some claims about fairly universal principles of language change.

By the way, both 3 and 4 sometimes call themselves comparative linguistics, so it’s sometimes hard to know what is meant by the term without some context.

Personally, I recommend that every linguist or even anybody interested in language spends some time with modern linguistic typology (which has moved far beyond the inflectional/agglutinting business). While Dixon’s ‘Basic Linguistic Theory’ may not be the first thing on the list, it should be required reading for any linguistics graduates long before they read anything by Chomsky.

Finally, The World Atlas of Linguistic Structures (WALS) http://wals.info is a great place to check any claims as to universality, even if it (unavoidably) relies on data of uneven quality. But it also nicely illustrates the extent and the limitations of the current evidentiary base for claims about universals.

46: What meaningful distinction is there between morphology and syntax? (score 15707 in 2011)

Question

While I am not interested in hearing the common distinction made in introductory text-books, I am interested in hearing what meaningful distinction there can be between morphology and syntax. Is there any structural, or rather formal, difference between the two? Also if that distinction requires the concept of word, how does one formally describe that?

Answer accepted (score 13)

The Oxford Concise Dictionary of Linguistics defines morphology as ‘The study of the grammatical structures of words and the categories realised by them’, and syntax as ‘The study of grammatical relations between words and other units within the sentence’.

As you point out in your question these (pretty standard) definitions depend on the notion ‘word’ to delimit the domains of each subject. The notion of ‘word’ is problematic as there’s no good definition that applies across all languages. It seems that it’s best (or easiest) to define ‘word’ on a language by language basis, although even then there are problems as different kinds of ‘word’ can often be distinguished depending on the sets of criteria used: often there is a ‘prosodic word’ (defined on phonological bases) which differs from the ‘grammatical word’ (defined in terms of morphosyntax).

If ‘word’ can be precisely defined in a given language then the domains of morphology and syntax can be fairly clearly distinguished, but if not then there is not a clear boundary.

So cross-linguistically the notion of ‘word’ is fuzzy, as are the terms ‘morphology’ and ‘syntax’. But for most languages, and in most cases, it’s still pretty clear what’s meant by these terms; and of course there’s the term ‘morphosyntax’ for boundary cases. So while these two terms are not precise they are still useful as a general way of referring to certain domains within a language.

Answer 2 (score 10)

The very first thing on the Syntax Topics handout for my linguistics classes was:

Syntax and morphology are the two parts of grammar.

  1. Morphology deals with the internal economy of the word.
  2. Syntax deals with the external economy of the word.

By which is meant that things that take place within the word are morphological, and things that take place between words are syntactic. Without a sense of “word”, there would be no easy way to distinguish. But we’d need to distinguish anyway, because there are big differences between grammar as expressed morphologically and grammar as expressed syntactically. And plenty of examples to deal with.

In particular, there are many analytic languages where grammar is almost all syntax, like English or Mandarin; and also many synthetic languages where grammar is almost all morphology, like Russian or Inuit. In an analytic language word order is terribly important, and there are many little auxiliaries and articles and particles to augment the word orders. In a synthetic language, word order is mostly a matter of style or taste, because the structure of the sentence is all tied up and color-coded with the agreement phenomena that morphology enables.

There are other differences as well. Bound morphemes are very fussy about their order, and normally don’t tolerate any variance. Syntax, however, spins off variation at the drop of a syllable, resulting in many competing patterns which frequently die off, but just as frequently fission into separate constructions. These are very different kinds of things, even if they do somehow “cover the same ground” semantically. It’s not necessary to restrict one’s observations to only the generalizations one wants to capture. Generalizations are nice, but facts are more useful.

You may encounter people who insist that “syntax” means “grammar”. That’s just the theory talking; normally they’re just parroting somebody else’s dictum, and normally this marks them as generative theory adherents. That’s harmless unless you think they mean what they say; really they just want you to use their terminology as a mark of its superiority. Sort of like saying that “Mercedes-Benz” means “automobile”. I.e, it’s not true, or even really helpful, but it lays out your allegiance clearly and that’s the important thing.

Answer 3 (score 6)

I think another way of getting at the same kind answer is to ask what can be done in morphology that can’t be done in syntax. There are approaches to morphology where much of what is traditionally considered with morphology is handled with the same kind of formalism as is used in syntax, and morphology may be referred to as “word syntax” on occasion. However, I do not know of approaches to syntax that strive to account for most common syntactic phenomena using the language of morphology.

When word derivations have hierarchical structures, and affixes are given the status of functional heads, it may seem like it’s syntax all the way down. There are some facts about inflectional morphology, however, which make syntax-like accounts difficult to pursue to their logical end. (See Ch.1 of Stump 2001 “Inflectional morphology” for discussion) One of these is non-concatenative morphology. In affixal morphology, it is not a real problem to treat the stem and affix as constituents of a tree: [[dog]N.root [-z]PL]N -> dogs. When the morphology is non-affixal, there is a dilemma. But how is “men” derived? In Distributed Morphology, stem-changes are handled in the following way: a zero affix triggers a “readjustment rule” which changes the stem vowel: men = [[man]N.root [zero]PL] -> /man-zero/ -> men. Stump considers this a resort to “extraordinary means”, since it seems like quite a large coincidence that so many languages should have an identical (zero) affix triggering stem changes. Non-concatenative morphology is a challenge for syntax-like approaches to morphology because the basic units of syntax are considered to have a strict precedence relationship: for two constituents X and Y, X must either precede Y or follow it; tertium non datur.

Another issue is morphologically-conditioned phonological rules. Particular morphemes often trigger specific types of phonological alternations in a language (e.g. changes induced by suffixation of -ity on English nouns), but comparable syntactic phenomena, where, say, a particular type of category triggered an idiosyncratic phonological process in a sister constituent, seem very rare.

A third issue is variable exponence. A given property, when realized morphologically, may have a wide range of unrelated exponents. In a single language, plural on nouns may be realized by either a prefix, a change in the stem vowel, or a change in the tone, depending on the word. Parallel phenomena are hard to find in syntax.

If the issue is whether there should be a distinction at all, I suspect that it is in theory possible to remodel our theory of language structure so that syntax and morphology are modeled in the same way, but at the moment there are quite a few details of morphology which do not lend themselves to a syntactic explanation.

47: Is learning German easier for people who know Sanskrit, and vice versa? (score 15672 in 2012)

Question

I’ve heard many times that learning German is easier for those who speak Sanskrit, and vice versa. Is there any linguistic basis for this? What similarities exist between the two languages that may be able to explain this?

This article, for example, shows a few superficial1 similarities and claims that Indians2 can learn German easily.


1: i.e., relating to specific words and not language structure
2: Most Indic languages are descended from Prakrit(a “sister language” of Sanskrit), so what applies for Sanskrit may sort of apply for these as well.

Answer accepted (score 17)

This is an answer not to the part about whether it is easier to learn German after Sanskrit (I don’t know), but rather, a few more assorted points re. “What similarities exist between the two languages”, or even more generally, “Why would people make such a claim?”

As Cerberus noted, most of these claims come from people whose familiarity, outside of Indian languages, is with mainly English, and perhaps a bit of French (or rarely, Spanish or Italian). So even though many similarities noted between Sanskrit and German are in fact those shared by many members of the Indo-European family, the claim just means that among the few languages considered, German’s similarities are remarkable.

[My background: I have a reasonable familiarity with Sanskrit; not so much with German. For impressions about German I’ll rely on the Wikipedia articles, and, (don’t lynch me) Mark Twain’s humorous essay The Awful German Language — of course I know it’s unfair and not a work of linguistics, but as examples of what the average English speaker might find unusual in German, it is a useful document.]

With that said, some similarities:

Cases

German apparently has four cases; Sanskrit has eight cases (traditional Sanskrit grammar counts seven, not counting the vocative as distinct). Cerberus notes above that “Sanskrit and German have several functional cases, whereas French/Spanish/Italian/Portuguese/Dutch/English/etc. do not. Those are the languages one might be inclined to compare Sanskrit with”.

Compound words

Although English does have short compound words (like bluebird, horseshoe, paperback or pickpocket), German has a reputation for long compound words. (Twain complains that the average German sentence “is built mainly of compound words constructed by the writer on the spot, and not to be found in any dictionary — six or seven words compacted into one, without joint or seam — that is, without hyphens”) He mentions Stadtverordnetenversammlungen and Generalstaatsverordnetenversammlungen; Wikipedia mentions Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz and Donaudampfschiffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft. But these are nothing compared to the words one routinely finds in ornate Sanskrit prose. See for example this post. Sanskrit like German allows compounds of arbitrary length, and compounds made of four or five words are routinely found in even the most common Sanskrit texts.

Verb appearing late

It appears that German words tend to come later in the sentence than English speakers are comfortable with. I notice questions on this SE showing that German has V2 word order, not SOV. However, many English speakers seem to find late verbs in German worth remarking on. One of my favourite sentences from Hofstadter goes

“The proverbial German phenomenon of the”verb-at-the-end“, about which droll tales of absentminded professors who would begin a sentence, ramble on for an entire lecture, and then finish up by rattling off a string of verbs by which their audience, for whom the stack had long since lost its coherence, would be totally nonplussed, are told, is an excellent example of linguistic pushing and popping.”

Twain too, says “the reader is left to flounder through to the remote verb” and gives the analogy of

“But when he, upon the street, the (in-satin-and-silk-covered-now-very-unconstrained-after-the-newest-fashioned-dressed) government counselor’s wife met,”

and also

“In the daybeforeyesterdayshortlyaftereleveno’clock Night, the inthistownstandingtavern called `The Wagoner’ was downburnt. When the fire to the onthedownburninghouseresting Stork’s Nest reached, flew the parent Storks away. But when the bytheraging, firesurrounded Nest itself caught Fire, straightway plunged the quickreturning Mother-stork into the Flames and died, her Wings over her young ones outspread.”

Well, this is exactly typical Sanskrit writing. Those sentences might have been translated verbatim from a Sanskrit text. Sanskrit technically has free word order (i.e., words can be put in any order), and this is made much use of in verse, but in prose, usage tends to be SOV.

Of Sanskrit’s greatest prose work, Kādambarī, someone named Albrecht Weber wrote in 1853 that in it,

“the verb is kept back to the second, third, fourth, nay, once to the sixth page, and all the interval is filled with epithets and epithets to these epithets: moreover these epithets frequently consist of compounds extending over more than one line; in short, Bāṇa’s prose is an Indian wood, where all progress is rendered impossible by the undergrowth until the traveller cuts out a path for himself, and where, even then, he has to reckon with malicious wild beasts in the shape of unknown words that affright him.” (“…ein wahrer indischer Wald…”)

(This is unfair criticism: personally, I have been lately reading the Kādambarī with the help of friends more experienced in Sanskrit, and I must say the style is truly enjoyable.) Now, the fact that this was a German Indologist writing for the Journal of the German Oriental Society somewhat goes against the claim of Sanskrit and German being similar. But one could say: for someone familiar with Sanskrit’s long compounds and late verbs that even Germans find difficult, the same features in German will pose little difficulty.

Adjectives decline like nouns

In Sanskrit, as it appears to be in German, an adjective takes the gender, case, and number of whatever it is describing. (Twain: “would rather decline two drinks than one German adjective”)

Gender of nouns has to be learned

By and large, it is so in Sanskrit as well. Twain notes that in German “a tree is male, its buds are female, its leaves are neuter; horses are sexless, dogs are male, cats are female – tomcats included, of course; a person’s mouth, neck, bosom, elbows, fingers, nails, feet, and body are of the male sex, and his head is male or neuter according to the word selected to signify it, and not according to the sex of the individual who wears it – for in Germany all the women either male heads or sexless ones; a person’s nose, lips, shoulders, breast, hands, and toes are of the female sex; and his hair, ears, eyes, chin, legs, knees, heart, and conscience haven’t any sex at all”. (He goes on to write a “Tale of the Fishwife and its Sad Fate.”) It does not seem quite so bad in Sanskrit, but yes, gender of words needs to be learned. (In Sanskrit there exists a word for “wife” in each of the three genders.) However this is a feature common to many languages (including, say, languages like Hindi or French that have only two genders) so I shouldn’t list it among similarities.

Spelling

This is something quite trivial, and linguists often don’t even consider orthography a part of the language proper, but spelling seems to be a pretty big deal to Indians learning other languages. The writing systems of most Indian languages are phonetic, in the sense that the spelling deterministically reflects the pronunciation and vice-versa. There are no silent letters, no wondering about a word spelled in a particular way is pronounced. Indian learners of English often complain about the ad-hoc inconsistent spelling of English; it seems a bigger deal than it should be. From this point of view, the fact that it is claimed that for German, “After one short lesson in the alphabet, the student can tell how any German word is pronounced without having to ask” means that that aspect of German is easier to learn.

The harmony of sound and sense

This is extremely subjective and will be controversial, and perhaps I will seem biased, but to me, in Sanskrit, it seems possible to pick words whose sounds match the desired feeling, better than in other languages. I have seen people who knew many languages say the same thing, and also Western translators from Sanskrit etc., so it is interesting for me to see Twain make a similar remark about German. Anyway, this is subjective, so I’ll not dwell on this much.

Non-similarities

There are of course many; e.g. Sanskrit does not have articles (the, etc.) unlike German. It also has very few prepositions (has only a few ones like “without”, “with”, “before”), as the work of prepositions like “to”, “from”, or “by” is handled by case. The difficulty of German prepositions does not seem to be present in Sanskrit.

TL;DR version

Some alleged difficulties of learning German, such as cases, long compounds, and word order, are present to a far greater extent in Sanskrit, so in principle someone who knows Sanskrit may be able to pick them up more easily than someone trying to learn German without this knowledge. However, this may not be saying anything more than that knowing one language helps you learn others.

Answer 2 (score 2)

Well since your question contains nothing to compare to and is therefore ambiguous, the answer could only be yes. Had the question been in the form of a comparison between the ease and rapidity with which one could acquire German with a working proficiency in Sanskrit relative to the ease with which one could acquire German with a working proficiency in language X, then there are sufficient considerations to account for which may allow for a “no” answer.

Your question is syntactically tantamount to asking “Is the act of person X walking from point A to point B easier, and vice versa?”. Easier than what, precisely?

Answer 3 (score 2)

Well since your question contains nothing to compare to and is therefore ambiguous, the answer could only be yes. Had the question been in the form of a comparison between the ease and rapidity with which one could acquire German with a working proficiency in Sanskrit relative to the ease with which one could acquire German with a working proficiency in language X, then there are sufficient considerations to account for which may allow for a “no” answer.

Your question is syntactically tantamount to asking “Is the act of person X walking from point A to point B easier, and vice versa?”. Easier than what, precisely?

48: What are “hard” and “soft” consonants? (score 15539 in 2016)

Question

Many writing systems make a distinction between “hard” and “soft” phonemes represented by the same grapheme or an accented version thereof. What writing systems make this distinction and what are the consonants so distinguished?

Answer accepted (score 4)

This is probably the most widely-used non-technical non-literal terms employed to refer to a technical concept. As a general term it most often refers to palatalized consonants in Slavic (soft) versus not-palatalized (hard). This underlies English usage where “soft” c,g refer to sibilant versions (cent, gentleman) as opposed to velar stop versions (car, gut). It is also used to refer to voicing in Dutch (hard is unvoiced), oral obstruent vs. nasal in Tamil but with a third category for oral sonorants. It has been used to refer to emphatic (pharyngealized) consonants in Arabic. The adjective is applied to vowels (hard is back) in Spanish and Swedish; in Dinka, it refers to creaky phonation (also termed “harsh”).

49: Argot vs Jargon (score 15370 in 2012)

Question

I’m stuck understanding the difference between argot and jargon.

According to many sources, e.g. Wikipedia:

Argot is a secret language used by various groups—including, but not limited to, thieves and other criminals—to prevent outsiders from understanding their conversations.

Jargon is terminology which is especially defined in relationship to a specific activity, profession, group, or event.

My confusion is based on reading that the term of argot has appeared to describe a language of lower class young people in France. Can’t recall the link, however.

Do they both refer the same term?
To be specific, computer professionals speak with argot or with jargon?
To be even more specific, let’s stick to a community of computer pro’s whose native language is other than English. The terminology in question includes “file”, “deadline”, “overflow”, “hit” (as per Web page), and so on.

Answer accepted (score 8)

Based on just the definitions you quote, computer professionals do not speak argot, they speak jargon. The jargon of computer professionals was not constructed for the purpose of hiding the meaning of what they are saying from outsiders - it may have that effect, but that was not the purpose. The purpose is to have short hand words that have specific defined meanings that allow for more efficient communication. For example the word “file” can replace the phrase “a block of information stored as a unit on an information storage device”. So jargon is a matter of efficiency.

From your definitions “argot” has the purpose of secrecy that would prevent eavesdroppers from understanding the meaning of the conversation.

Answer 2 (score 2)

In the particular meaning of argot, the words are more or less synonymous, though I think argot is now much less common than jargon in this sense. As you say, argot has another meaning, that of street slang.

Answer 3 (score 1)

It seems possible to conflate professional vs. ad-hoc usage, and inclusivity vs. intent to broadly communicate.

In much literature of fiction “thieves cant” is BOTH professional and inclusive, and would be considered the poster child for the term argot. If you do not know this language you are not welcome to the contents of its conversations either, an added social aspect.

Slang is ad-hoc but is inherently formed out of the intent to broadly communicate.

Jargon is professional terminology. In addition, whether or not it is ad-hoc (which may lead to internal slang; see also “TLA”/three-letter-acronym and similar meta linguistic phrases) it is driven from the attempt not so much to broadly communicate, as to DEEPLY communicate - to provide more content in the same amount of verbal space.

To complete the grid, language of broad intent to communicate but a professional aspect would lead to those languages that have an enforced formal grammar and approved words. Possibly French. For letterings, the Joyo Kanji, with which Japanese newspapers are printed, so that enough of the country can read it even though there are many more symbols in their writing.

Ad-hoc language linking groups together that have difficulty communicating otherwise could be a pidgin and would probably avoid slang lest the already rocky communication fail entirely.

There is a blurred line in the computer industry, because there was a small group of people who collected the industry-and-academia slang and anecdotes into a compilation entitled “The Jargon File”.

Hope that helps.

50: Online etymology dictionary for Latin (score 14699 in 2014)

Question

Is there an etymology dictionary for Latin that is available on the Internet? For example, I know of http://etymonline.com/, which is a great resource for English etymology, but I have not been able to find an equivalent site for Latin.

So if someone knows of such site, I would be grateful.

Answer accepted (score 9)

Unfortunately, most of the resources are behind a paywall. For example, The Indo-European Etymological Dictionaries Online database (by Brill) already includes eleven dictionaries, including de Vaan’s Etymological Dictionary of Latin and the other Italic Languages.

Answer 2 (score 8)

The English Wiktionary has lots of Latin entries, and of those many have etymologies.

If you find one that lacks an etymology and you’d really like to see it added, it’s a little-known fact that you can request it. Click the edit link on the page, if it’s a page with entries for several words in various langauge that happen to share a spelling, then click on the edit link next to the “Latin” heading.

Sometimes there will already be one or more “Etymology” headings with no details in them. This is mainly where two or more unrelated words in the same language are known to have different origins but coincidentally ended up with the same spelling. In this case the “Etymology” headings will be numbered.

Otherwise the Etymology sections are not numbered.

Etymology sections come after the language heading and before any “part of speech” heading. I’m never certain whether they should go before or after a “Pronunciation” section if there is one. You’ll be forgiven if you get it wrong though.

Below the Etymology heading you then use the rfe template, this stands for “Request For Etymology”. The result will look something like this:

==Latin==

===Etymology===
{{rfe|lang=la}}

===Noun===
... stuff ...

Or this:

==Latin==

===Etymology 1===
{{rfe|lang=la|Possibly related to Greek "foo"?}}

====Noun====
... some stuff ...
...
...

===Etymology 2===
{{rfe|lang=la|Could this be cognate with Sanskrit "phu"?}}

====Verb====
... some other stuff ...
...

The la is the ISO 639 language code for Latin. The comment after the | is optional and will appear in the entry to give other Wiktionary contributors a place to start looking.

To know when there’s a response, click the star in the menu and check your watchlist. People may well then discuss it in the talk page before, or they may just add an etymology.

Other langauge Wiktionaries probably have similar systems, but they’d be a bit different so I can’t comment on them here.

Answer 3 (score 5)

Good up-to-date dictionaries are under copyright and not on line. I suggest you get a reader’s ticket at a well stocked university library.

51: What’s the difference between a “false cognate” and a “false friend”? (score 14572 in )

Question

There are two terms used for pairs of words (in the same or different languages) that look similar but are actually unrelated: false friend and false cognate. Are these terms synonymous? If not, what’s the difference?

Answer accepted (score 39)

They are distinct.

Definitions
  • False cognates are words that are similar in their modern forms despite having different etymologies. This is regardless of whether the modern meanings are similar.
  • False friends are words that are similar in their modern forms despite having different modern meanings. This is regardless of whether the words are etymologically connected.
Examples
True cognates, true friends

Words with a common etymology and modern form and meaning. These are extremely numerous, of course.

True cognates, false friends

Words with a common etymology but which have shifted to have different meanings.

False cognates, true friends

Words etymologically unrelated that nevertheless now have similar meanings (typically by coincidence, but sometimes by influence/reinforcement).

False cognates, false friends

These are completely unrelated words that happen to look similar:

Complications
  • Reinforcement and language contact. In the case of false cognates/true friends, if the two languages are in contact, or it occurs within a single language), the similar meanings tend to reinforce each other. They will possibly even be re-analyzed as forms of the same word, thereby merging the lexemes.
  • False recent cognates that are true cognates more distantly. Example: Malay nama ‘name’ might look like a loanword from English name (Malaya was a British colony), but it is actually an older loanword from Sanskrit (cognate to the English via Proto-Indo-European).
  • Re-analysis: false-cognate words may have their spelling altered to match the form of a true friend in the other language. English examples: indict, victuals, from French but with elided Latin consonants re-added. Spelling and pronunciation are borrowed via different paths, so the spelling could be considered cognate though the pronunciation is not. (In this case, both are more distantly cognate via Latin; the French words were sound-changed.)
  • Calques or loan-translations: the form of the word is cognate, but its components are often not. Example: Latin insecta, Ancient Greek ἔντομον ‘insect’, from different roots meaning ‘cut’

Answer 2 (score 5)

Most often, the expression “false cognate” is used as a synonym for “false friend”. If you google with them, you will mostly find pages that use them synonymously.

However, other meanings have also been proposed. In Concise Encyclopedia of Semantics edited by Keith Allan, the article “False friends”, p. 308–309, describes false cognates as a special case of false friends, namely false friends that are not etymologically related. And in a Unilang discussion, it was suggested (without references) that “false cognates are words that look the same and have the same meaning but have different roots”, so that false cognates would not be false friends at all.

Thus, the expression “false cognate” is best avoided, since it has different meanings to different people. When you encounter it, assume that it probably means “false friend”, but with some suspicion: it might mean something rather different.

52: Meaning of star/asterisk in linguistics (score 14209 in 2011)

Question

In some dictionaries/lexica, I’ve seen the asterisk in front of old words. What does it mean/stand for?

Example: http://en.wikipedia.org/wiki/Proto-Germanic#Pre-Proto-Germanic

ǵʰóstis “stranger” > gʰóstis > *gastiz “guest”

Answer accepted (score 30)

An asterisk is generally used to indicate that a certain form or construction is not found in natural language. To be precise, it means there is insufficient evidence to assume that it could exist or could once have existed in natural language.

When describing proto-languages, this usually means that a certain root or word has been reconstructed: based on phonological rules, we think it must have been somewhat like this—but we cannot be sure, as it is always possible that some unique irregularity would result in a different form, and we have no written sources that contain this form. When a form has never been found in a real source, we say it has not been attested; in other words, it is unattested. In most proto-languages, all forms are hypothetical, so that they should all be preceded by an asterisk.

With modern languages, we usually have plenty of sources to establish whether a certain form is possible. In fact, many linguists make up example sentences when they need to, because many forms and constructions are not at all controversial. So I might use Achilles hated Hector in an article if I needed an example. However, I may also want to use an example that is ungrammatical, i.e. that I believe would never be used naturally in the language under observation. Then I would put an asterisk in front of it:

*Achilles did hated Hector.

Note that it depends on context whether a construction is grammatical: if I were writing about standard English, I’d have to use an asterisk; but, if I were writing about a certain dialect where this construction is actually used by some, the asterisk is out of place.

Note also that “natural language” is a flexible and sometimes unclear concept; if you are researching English poetry of the 16th century, it is much harder to acquire enough evidence than with modern prose. Context or explanation should make it clear what natural language is supposed to be in a certain text.


Most people would use a question mark instead of an asterisk with well-attested languages if the form or construction is doubtful. In proto-languages, the question mark is sometimes used to indicate a form that is even more uncertain than the common hypothetical ones—for example, if I am in doubt as to the most probable form of a certain root.

Some people use the percentage sign to mark something that is only grammatical in a certain non-standard dialect.

Answer 2 (score 6)

It’s true that an asterisk indicates a reconstructed word or morpheme in historical linguistics.

In theoretical linguistics, as Cerebus suggested, it indicates ungrammaticality. There are a few other such markers: question marks indicate relative unacceptability, and hash marks indicate anything from semantic incoherence (ex. (1c)) to infelicity, a pragmatic failure (ex. (1d)).

1.a. *John slept the baby.

  1. ?John ate much rice.

  2. #Colorless green ideas sleep furiously

  3. #His horse sleeps until dawn. [In response to the question “How are you doing?”]

Answer 3 (score 5)

In the context of historical linguistics, it means the word is just a hypothetical reconstruction, not an actually attested word.

(The asterisk has another meaning which is to mark an ungrammatical utterance)

53: What are the exact relations between Slovak and Slovene? (score 14176 in 2019)

Question

The former seems to have more speakers, while the latter seems to possess the elder history.

Slovak said to be a West Slavic language, while Slovene seems to belong to the South Slavic group (reputedly having some traits shared with the West Slavic languages).

Are these two languages mutually intelligible? To which degree? Which dialects of Slovene resemble Slovak language the most?

Answer accepted (score 10)

Bluntly, Slovak and Slovenian have nothing in common other than being both Slavic languages. No more than Slovak or Serbian or Slovak and Ukrainian. This is a question driven by superficial similarity in their names which I’m sure members of both nations are thoroughly sick of.

On the other hand, it is true that Czechs and Slovaks will perceive Slovenian as less alien than Serbian from which we could possibly make some tentative conclusions. But that still doesn’t make them mutually intelligible nor is there any meaningful basis for comparison other than the name.

Answer 2 (score 5)

I am Slovenian, and here’s my perspective:

To me, Slovak and Czech languages are very similar - in fact, I wouldn’t be able to tell them apart.

Among all the Slavic languages, the one that’s the easiest to understand for a Slovenian is Croatian language. Not Slovak!

Are they mutually intelligible? To which degree?

Only to the degree that all Slavic languages are somewhat mutually intelligible. Just like all Germanic languages are somewhat mutually intelligible. For example, Dutch and German.

Check the tree of Slavic languages - you can see that Slovenian and Slovak are in very different branches. The name similarity comes from the word Slavic“Slovanski” (Slovenian), “Slovansky” (Slovak, Czech)

Answer 3 (score 2)

I am Slovak: I think there are two sides you need to be aware of before you consider whether they are similar; for me reading Slovene makes it and Slovak seem similar and I can manage to understand it enough to see the context and point of conversation. But listening to it? I would say that that is where the “it’s not similar” arises. I cannot understand Slovene without their “subtitles”. Say I was in a cafe and a pair were speaking Slovene on a nearby table(I can hear them properly). I would, like many, be able to straight away sense it has a Slavic “homely” feeling - all the similar sounds that Slavic words have. Then I would concentrate and see what they’re talking about. I would get a frustrated because I somehow cannot understand and after realizing that it is not Slovak or Czech(easy for Slovaks to understand Czech, not so much vice versa) I would try to see what language it is according to how well I understand the words really and from what I know of their typical sounds.

Order of best understanding (listening only):

  1. Slovak
  2. Czech
  3. Sorbian
  4. Polish
  5. Russian
  6. Croatian
  7. Bosnian
  8. Ukrainian
  9. Serbian
  10. Belorussian(Losing understanding of context from this language onwards)
  11. Slovenian
  12. Macedonian
  13. Bulgarian

This is the basic order for me personally. If the speaker has a ‘deep’ sounding voice, it makes it massively easier to understand (still not past Belorussian though). - One more thing that should be noted, I have not learned Russian before(many people were forced to learn fluent Russian in Slovak schools in the past). From the list, one can tell that it is not easy for me to understand South Slavic languages, and East Slavic ones vary. REMEMBER this is listening to people speaking without looking at them only!

54: Why do so many core Romanian words with Latin roots come from different roots than in the other Romance languages? (score 13197 in )

Question

Romanian is a romance language like Catalan, Italian, French, Portuguese, and Spanish so much of its core vocabulary is derived from Latin.

Why then even in core vocabulary does Romanian so often seem to be based on different Latin roots than its sister languages?

to go a merge < mergere

  • Catalan: anar < ambulāre
  • French: aller < ambulāre
  • Italian: andare < ambulāre
  • Portuguese: ir < īre
  • Spanish: ir < īre

Or is this perhaps a false impression I get just because these words stand out more? I’ll add some more examples if I remember some. (I’m not immersed in Romanian anymore)

Answer accepted (score 16)

Romanian is a language that’s been well established in the Balkan Sprachbund (which is a German term linguists use to talk about a collection of languages that have shared features over a certain period of time).

Alongside Romanian in this Sprachbund are languages like Greek, Albanian and Macedonian (among others) and they have each played some influence in altering the other, and this is in all sorts of linguistic domains such as Phonology / Morphology / Syntax.

An interesting thing to note about Romanian is that, although deriving from Latin which had no articles, as East/West-Latin were still splitting up, Proto-Romance had started the grammaticalisation process resulting in the determiners observable today in Modern Romance. As Italian and Romanian both share the East-Latin branch they both originally had determiners that PRE-modified the elements they were attached to and over the course of time, due to influences from languages like Bulgarian (another Slavic language) and Albanian, these have their determiners AFTER the noun.

So to quote the oft-cited example here, look at the Italian word for ‘the men’, which is ‘il uomo’ and then compare it with the Romanian word ‘om’ (man; notice the similarity) then after developing in this Sprachbund around other languages that put their word for the word ‘the’ AFTER the noun, Romanian also took this on and has ‘the man’ as ‘om-ul

So, as much as you do cross-linguistic comparison among the Romance languages, you will find out time and time again that Romanian is often the black sheep of the family due to its development in this language area where linguistic features are shared and different characteristics can be shown to have spread around to neighbouring languages after a long period of bi- and multi-lingualism throughout the history.

To swing this back to your point, this is exactly the reason why not just the Phonology, but also the word-stock (vocabulary) of Romanian is often very different from its sister languages in Western Europe. The same reasons for the sharing of features at a level of linguistic structure only happen in a very sort of ‘linguistically intimate’ setting and when two or more languages are close, the first thing to cross over are words into the other languages. So, if you see that structure has been shared, you’d be wise to make an educated guess that the level of vocabulary-absorption is considerably higher, and in this case you’d be right.

Romanian’s history after splitting off and developing in a co-evolutative zone meant that it has slowly been moving further and further away from its core features in the Italic family, in not just words (but very much so), but also in many other aspects of structure (as I mentioned before).

I hope this helps you with your query.

Answer 2 (score 15)

Romanian does, in fact, have a reflex of Latin ambulāre: a umbla “to walk”. (There are reflexes of ire as well, though it’s become a future modal) I’ve often found that the familiar roots from other Romance languages do indeed exist in Romanian, though as in this case they’ll have slightly different meanings.

The reasons for this are Romanian’s relative isolation from other Romance languages, and its cultural domination by Slavic, Greek, and Hungarian neighbors for most of its history.

Answer 3 (score 14)

Regarding Romanian’s history: it’s a member of the Eastern Romance (sensu strictu) subfamily or the “Vlach” languages. The peculiarities of Romanian are shared by its sisters in that branch such as Aromanian in Greece, Macedonia, Albania; Megleno-romanian in Greece, Macedonia, and Turkey; and Istrian in, well, Istria. Their peculiar history was influenced by at least 4 distinct factors:

  1. Being the earliest area to split off from the Empire, in 271 AD - leading to loss of stabilizing linguistic influence from Rome
  2. Geographical and later social separation from the other Romance languages, leading to less influence by and fewer borrowings from Germanic. E.g. ‘white’, instead of coming from Germanic blank, is alb from Latin albus
  3. The Balkan Sprachbund shared with Slavic, Greek, and Albanian. The Sprachbund had a significant influence on both grammar and, to a lesser extent, lexicon.
  4. The influence of the Paleo-balkan substratum which is not inconsiderable - some scholars report up to 500 roots may be inherited from Dacian

Of course, every modern Romance language has its own peculiar history, but the above I think covers the bases about what makes the Dacian language different.

55: Is there a language whose writing is 100% phonemic? (score 12961 in 2015)

Question

Is there a language that has a complete one-to-one correspondence between the graphemes (letters) and the phonemes of the language?

In other words, is there a language that is 100% ideally phonemic?

Answer accepted (score 14)

Finnish is the usual exemplar for that.

Many recent alphabetizations, like those of Native American languages (Lushootseed is one example), are still phonemic in the sense that the spoken language hasn’t had time yet to change away from the phonemic system it had when the alphabet was developed. Or in other cases, where there are no native speakers any more, all language learning is based on the alphabetic representation, by necessity.

Of course, there are plenty of non-alphabetic writing systems with little or no useful correspondence between phonemes and graphemes. Since phonemics is an alphabetic representation system, it can’t be put into 1-1 correspondence with an abjad or an abugida, let alone with a lexically-based system like Chinese.

Answer 2 (score 8)

Abugidas are going to be more phonetic than other systems. Look at Indian languages. Still, there are some languages there that have ambiguity in pronouncing letters based on position. E.g. Tamil pronunciation of a letter as voiced or unvoiced depends on position of the letter as well as some conventions. Kannada on the other hand is 100% phonetic. No all people speaking Kannada might know how to pronounce those letters that were borrowed from Sanskrit, but the writing system is definitely phonetic.

Answer 3 (score 8)

Abugidas are going to be more phonetic than other systems. Look at Indian languages. Still, there are some languages there that have ambiguity in pronouncing letters based on position. E.g. Tamil pronunciation of a letter as voiced or unvoiced depends on position of the letter as well as some conventions. Kannada on the other hand is 100% phonetic. No all people speaking Kannada might know how to pronounce those letters that were borrowed from Sanskrit, but the writing system is definitely phonetic.

56: How similar are Spanish and French? (score 12731 in 2015)

Question

I know that Spanish and French both belong to the Romance branch and they are very alike. But what I want to make clear is that how similar they are. I mean that if I have mastered one of them, how much easier will it be for me to learn another one?

Answer accepted (score 3)

French and Spanish are indeed members of the Romance branch, but French is an oddity within it. If you speak only English, the phonetics of Spanish are probably much easier than French sounds, so you’d probably make a quicker start in Spanish. On the other hand, if you learn French first, Spanish would then be relatively easy.

Answer 2 (score 1)

I mean that if I have mastered one of them, how much easier will it be for me to learn another one?

First, knowing two languages instead of only one already makes it easy to learn a third one. At the very least, you won’t take the quirks and irregularities of your first language for granted, and will be prepared to expect something different.

Second, learning a language of a given linguistic family definitely makes it easier to learn a further language of that family. This effect is incremental: if you know two languages of a given family, it is even more easier to learn a third one. You will be acquainted to at least part of the lexical entries one needs to learn.

Third, specifically, no, French and Castillian are not particularly similar. If you master Castillian, you will probably be able to read a text in Portuguese or Catalan and understand most of it; but this is not true of French. However, English has borrowed so much lexical entries from French, that a person whose first language is English, and learns Spanish, will probably be in a better position to extract a basic comprehension from a French text than a person whose first language is another one, even another Romance language.

So, in short, learning one of them won’t preclude the need of specialised classes to learn the other. Those classes will be somewhat easier, especially if you learn Spanish before French.

Think of the relation between both like the relation of English to German. There is a similarity, but the languages are far from mutually intelligible.

Answer 3 (score 1)

I mean that if I have mastered one of them, how much easier will it be for me to learn another one?

First, knowing two languages instead of only one already makes it easy to learn a third one. At the very least, you won’t take the quirks and irregularities of your first language for granted, and will be prepared to expect something different.

Second, learning a language of a given linguistic family definitely makes it easier to learn a further language of that family. This effect is incremental: if you know two languages of a given family, it is even more easier to learn a third one. You will be acquainted to at least part of the lexical entries one needs to learn.

Third, specifically, no, French and Castillian are not particularly similar. If you master Castillian, you will probably be able to read a text in Portuguese or Catalan and understand most of it; but this is not true of French. However, English has borrowed so much lexical entries from French, that a person whose first language is English, and learns Spanish, will probably be in a better position to extract a basic comprehension from a French text than a person whose first language is another one, even another Romance language.

So, in short, learning one of them won’t preclude the need of specialised classes to learn the other. Those classes will be somewhat easier, especially if you learn Spanish before French.

Think of the relation between both like the relation of English to German. There is a similarity, but the languages are far from mutually intelligible.

57: Is there a difference between an affricate and a plosive+fricative consonant cluster? (score 12568 in 2011)

Question

Is there a difference between an affricate and a plosive+fricative consonant cluster?

According to wikipedia, there is a difference between a plosive+fricative sequence, as in the following example

  • catch it /kæt͡ʃ.ɪt/
  • cat shit /kæt.ʃɪt/

But I honestly can’t hear the phonetic difference unless the speaker carefully puts a pause between the morphological boundaries.

The semicircle in the IPA /t͡ʃ/ seems to suggest that /t/ and /ʃ/ are coarticulated, or at least articulated together more than in a typical consonant cluster.

Is there any difference between the affricate /t͡ʃ/ and the cluster /tʃ/? (similarly with /dɮ/, /p̪f/, /ʥ/, and so on) If so, what’s the difference, and how could one tell the difference between an affricate and a simple plosive+fricative sequence?

Answer accepted (score 28)

But I honestly can’t hear the phonetic difference unless the speaker carefully puts a pause between the morphological boundaries.

You have very good instincts, because this statement is halfway to the answer. An exploration of the topic is best started in a discussion of the differences between the words shoe and chew, which differ only in that the former has a fricative initially and the latter has an affricate.

The period of silence (caused by voiceless stop closures like [t]) that characterizes stops and the “stop” portion of affricates is indistinguishable from regular silence. We only ever “identify” the presence of “stoppiness” as well as the nature of the stoppiness (e.g., bilabial vs. velar) by the influences that the silences have on neighboring segments.

In the case of the postalveolar fricative versus the postalveolar affricate, in the postalveolar affricate, the stop asserts its status by altering the nature (non-technical term used here; I will elaborate) of the following fricative.

Let’s compare my totally amateur recordings of shoe versus chew.

Shoe

shoe

Chew

chew

What do you notice?

There are two major differences that linguists have narrowed down as the cues our ears and minds use to differentiate shoe from chew:

  • The fricative is longer in duration than the affricate in general.
  • The loudest point in the fricative occurs much later as a proportion of total fricative length compared to the affricate.

You can even test this out for yourself by doing some basic audio editing in Praat or Audacity.

When our brains hear catch it or cat shit, it is applying these two metrics to figure out which one it heard. Slow ascension to maximum amplitude and relatively long frication noise? Must be shit.

This only really tells part of the story, though. Let’s explore the production side a little.

When an oral stop happens, the tongue (or lips) completely constrict the oral pathway. No air (and consequently sound, to an extent) can get in or out. That being said, in the transition process when a non-obstruent like a fricative or an oral sonorant follows a stop in the same syllable (this does not apply all the time, but the story is really complicated), what happens is that in the last few dozen milliseconds air starts coming out from the bottom of the vocal tract. Air pressure behind the stops exceeds air pressure outside the stop. And when the stop is released, air molecules quickly crack from behind the stop to in front of it relatively quickly (on releases of all oral stops, aspirated or not, especially into vowels, you can feel a marked burst of air on your lips or if you put your hand in front of your mouth). This is what we call the stop burst.

Now, many different sorts of things can happen following the stop burst. In the case of the fricatives after stops, the closure relaxes slightly, causing the air to burst, but it remains relatively tight, and air traveling turbulently as it goes through and comes out of that tight corridor is what causes frication noise.

So the story of the affricate (stop into fricative) is that significantly higher than atmospheric pressure builds up behind the closure. When the stop bursts and the tongue goes into constriction position for the fricative, that high pressure of air is released over the early span of the fricative, causing a high amount of noise.

Compare this to what happens when you pronounce a bare fricative. In those cases (e.g., shoe), your tongue goes into fricative position (small constriction), and then your lower vocal tract just starts pushing air through the constriction. The noise does not begin amidst extremely high pressure behind the constriction. That lack of an air pressure gradient causes it to be quieter.

The quicker climb of the affricate case causes a “critical noise/vibration” level to be reached more quickly and the tongue to retreat from its constriction position, hence the shorter duration of fricatives in general (the critical level hypothesis is not that thoroughly explored to my knowledge, but it seems to explain a lot).

So that’s the story of fricative versus affricate.

Bring it all the way back to your case, what happens when there’s a “syllable break” is important. When there is a syllable break after the stop, the pressure buildup behind the stop closure releases (sometimes audibly, depending on the dialect). Then, the tongue assumes the fricative constriction position during a period of neutral pressure, and then the air starts flowing, and the sound occurs as a regular fricative.

I’m sure this is poorly edited, so I welcome volunteers who would be kind enough to correct me on all the errors I may have made.

@aedia asked in a comment a very interesting question that I’ll address as an edit:

perhaps try something like at shoe vs. achoo

That’s actually a slightly different case in my mind. The prosodic characteristics of at (being a very weak unit) in at shoe may cause segmentation issues. While I still haven’t read anything that’s totally convinced me that phonemic affricates exist in English, I suspect that proponents of phonemic affricates might argue that the resegmented form is proof that /tʃ/ the cluster and /tʃ/ the affricate are distinct. Rambling aside, this doesn’t appear to prevail in my dialect and the fricative in at shoe emerges like a vanilla fricative:

At shoe

at shoe

Achoo

achoo

Answer 2 (score 9)

The main difference is phonological, not phonetic. In many languages including English, /t͡ʃ/ is a separate phoneme from /t/ and /ʃ/, although the phonetic pronunciation could be identical in some dialects. That is, most speakers would perceive the phoneme /t͡ʃ/ as sounding different from /t/ and /ʃ/, even if the waveforms are identical.

In many circumstances, however, the waveforms may not be identical and you may be able to tell the difference between an affricate and a plosive+fricative sequence due to other phonological processes in a particular language. For example, many American English speakers would pronounce cat shit (phonemes: /kæt.ʃɪt/) with unreleased final t’s [kʰæt̚.ʃɪt̚] or even final glottal stops [kʰæʔ.ʃɪʔ].

Answer 3 (score 5)

Like @Krubo, I always thought of the distinction between an affricate and a stop-affricate sequence as being primarily defined in phonological terms. (But of course this does not preclude phonetic differences.) But also, like @hippietrail, I was under the impression that native speakers generally don’t have intuitions regarding whether sounds are separate phonemes or not.

So I always thought that in order to tell whether a stop+fricative corresponds to a single affricate phoneme, or a sequence of phonemes, one has to examine its (or their) phonological behaviour with respect to previously motivated phonological restrictions.

For example, if a language disallows syllable margins of a certain complexity, and the only way you could analyze an example with the relevant stop-fricative according to your previously-motivated syllable template is in a way that treats the stop-fricative as a single phoneme (or a sequence of phonemes), then you have an argument that the stop-fricative is a single phoneme (or seq. of phonemes).

Or if you know that vowels are always, say, lax before a complex coda, but not before a simplex coda, and vowels are not lax before the stop-fricative, then you have an argument that the stop-fricative is a single phoneme.

Or a metathesis process, or gemination process, might treat the stop-fricative as a single unit, etc.

58: Predicate vs. Predicator (score 12494 in )

Question

BACKGROUND

According to Oxford Dictionaries Online:

Predicator means “(In systemic grammar) a verb phrase considered as a constituent of clause structure, along with subject, object, and adjunct.”

Predicate means “The part of a sentence or clause containing a verb and stating something about the subject (e.g. went home in John went home).”

In the latter example of John went home, it seems, the predicate is went home whereas the predicator is went, home being an adjunct and thus not part of the predicator as defined.

Now, I looked further in this wikipedia article to better understand the difference. The article recognizes two competing notions of the predicate in theories of grammar:

  1. Predicates in traditional grammar (e.g., went home in John went home)

  2. Predicates in modern theories of syntax and grammar (e.g., went in John went home)

Which, according to the article, causes confusion as to what exactly the term predicate mean, and some grammarians came up with a new term “predicator” specifically for use (2), says the article. No problem thus far.

What bothers me: The article in its explanation of (2) says, “Other function words – e.g. auxiliary verbs, certain prepositions, phrasal particles, etc. – are viewed as part of the predicate.” (Emphasis mine.)

Now, remember this definition of the predicate, i.e., (2) above, corresponds to the new term “predicator”.

I understand that auxiliary verbs are part of this definition of predicate, because an auxiliary verb can be part of a verb cluster. But so can certain prepositions and phrasal particles?

The article has these example sentences (Words belonging to “predicate” (2) are boldfaced as in the article itself; My comments in parentheses.):

The butter is in the drawer. (preposition in being part of the predicate)

You should give it up. (particle up being part of the predicate)

Susan is pulling your leg. (I don’t know why leg is marked as part of the predicate. Maybe a typo?)
QUESTION

Except for the last one, which I suspect is a typo, I’d like to know whether the preposition in and the particle up belong to the predicate as presented in (2) and thus belong to the new term “predicator”.

Answer accepted (score 1)

The examples in Wikipedia can be viewed as correct. The sentences

  1. The butter is in the drawer.

  2. You should give it up.

  3. Sue is pulling your leg.

correctly illustrate one particular understanding of predicates. In sentence (1) the matrix predicate is indeed is in, whereby its arguments are the butter and the drawer. In sentence (2), the matrix predicate is indeed should give…up, whereby its arguments are you and it. And in sentence (3), the matrix predicate is indeed is pulling…leg, and its arguments are Sue and your.

Sentence (3) does not contain a typo; pull X’s leg is an idiomatic expression, which means the predicate includes the object noun. ‘To pull someone’s leg’ does not mean that you actually yank on someone’s leg, but rather it means that you tease them by telling them something fictitious.

The understanding of predicates just described is similar to what one finds in extensive accounts of predicates:

 Napoli, Donna Jo. 1989. Predication Theory: A Case Study for Indexing Theory. Cambridge University Press. 

 Ackerman Farrell and Gert Webelhuth. 1998. A Theory of Predicates. CSLI Publications.

But a word of caution is warranted! The use of this terminology varies tremendously depending of the theory of semantics, syntax, and grammar that one adopts. Traditional grammar would certainly reject the understanding of predicates suggested with examples (1-3), since traditional grammar adopts an (in my view) overly simplistic understanding of predicates that has more to do with Aristotle’s term logic than with grammar.

The difficulty with the terminology is illustrated well when considering the term predicator. If you look that term up linguistics dictionaries, you find that it is indeed employed as a means of overcoming the confusion associated with the term predicate. The extent to which this additional term successfully overcomes the confusion is, though, debatable.

Answer 2 (score 0)

The terms predicator and predicate are meant differently from syntax and semantics points of view.

From syntax point of view, predicate is that part of a sentence apart from the subject, in which predicator is the verb cluster including the auxiliary and the main verb.

From the semantic point of view, a proposition of a sentence consists of arguments ( referring expressions) and predicator, which the the most important part of the remainder. A predicator can be realized by a verb, an adjective, a noun phrase ( not a referring expression), a pronoun and even a preposition. Ex: He is in the garden. Arguments: He, garden. The remainder: is, in: of these two, in is more important than is in presenting the meaning. Therefore, in is the predicator.

Predicate is a word or a phrase that can become the predicator of a proposition( in any sentence in ENglish). Then, a verb, an adjective, a noun phrase ( not a referring expression), a pronoun and even a preposition can be the predicate. This means in one sentence, a verb can be the predicator, while in another sentence, an adjective can become a predicator ( Mr Brown is handsome)

Hope everything is clear to you now.

59: Do unschooled people use cases correctly, e.g. in Germany and in Russia? (score 12480 in 2014)

Question

I wonder if the case system is devised/imposed by literates and not really natural: it is said that the vulgar Latin that most people really used didn’t have e.g. the cases (or all of them) of the ‘classical’ Latin, so I wonder if the common speakers of languages with cases normally fail using them as the prescription says (e.g. not caring for gender).

I would like to know about the situation for German (it seems quite analytic so not too dependent on case for expressing meaning?) and Russian, but any example is welcomed, e.g. Georgian is said to be very morphologically complex; do their speakers simplify it?

In all cases it would be better to know about this for people that is not particularly studied, how well they use case systems without studying them at school. If there are studies of this correct/incorrect case use in ‘uneducated’ (or even studies on schooled ones?) people of countries like these, better, but witnessed/anecdotal answers are welcomed too.

Answer accepted (score 57)

The World Atlas of Language Structures Online Chapter 49 lists 84 languages with at least 6 distinct cases (24 of them with at least 10 cases). A number of them are spoken in remote areas of Australia or South America where schooling is limited, if it happens at all. As far as I know, speakers of these languages have no problem using the case system in an extremely consistent way. In fact, many of these languages have a degree of morphological complexity that makes me sweat just thinking about it, but native speakers, even unschooled ones, manage to use it flawlessly on a constant basis.

Answer 2 (score 39)

The question has been well answered for specifics. I’d only want to add that a little thought would have answered it in general: most of language learning happens before a learner ever goes to school, so level of schooling cannot possibly be relevant. Furthermore, for most of human history, most people have been unschooled, unlettered, and illiterate, and in many places today most speakers do not have advanced schooling. Under those conditions, how would languages with complex morphological rules have ever developed?

As a final point, consider English word order, the rules for which are quite as complex as those for case in other languages. Do you find that English speakers make mistakes in word order if they lack adequate schooling? Any native speaker of English could parse a sentence like “The man that John said Mary expected to receive a watch from hit him” and know that Mary expected the watch from the man (and not from John) and that John was hit by the man (and not the other way around). This is not something you’re taught in school, and neither is case (which is used to express exactly the same relationships)

Answer 3 (score 37)

Though as some other posters have noted, some Russians may use dialect case forms, anyone who is out of diapers uses the full case system. Case is a core concept of the language. The very idea that using cases is a burden is alien to Russian.

If you hear someone speaking Russian while ignoring case and gender, he isn’t uneducated, he is a foreigner. He doesn’t know it, but nobody is quite sure what he is saying.

In English we have the literary language with fairly strict grammar and a relaxed conversational form with loose grammar. For example, the literary form is “It is I.” wereas the conversational form is “It’s me.” In the conversational example we use the wrong case. We can do this because case is now so weak a concept in English that nobody notices that the meaning which “me” as opposed to “I” once conveyed does not fit logically into the sentence.

As a result, English speakers learning Russian have a very hard time accepting that case is an important concept. They assume that it, as in English, is an affectation of the educated which they can safely ignore. Nothing could be furthur from the truth. If you say “It’s me.” (Это меня.) in Russian people will look at you blankly.

I have had some success explaining this to Americans using the following example: In popular culture Tarzan says “Me Tarzan!”. We understand this to mean “I am Tarzan.” This is because our ideas as to the difference between “I” and “me” are hazy. Also, we expect the subject of the sentence to come first. So we accept “me” as the subject even though the fact that it is in the objective case theoretically indicates that it is either the direct or indirect object.

But a Russian does not expect the subject to come first and very clearly understands the difference betwee “я” (I) and “меня” (me). He will immediately reach two firm conclusions: 1) the speaker is most definitely not Tarzan, and 2) the sentence is incomplete. The part which tells what Tarzan did to the speaker is missing. The sentence would be complete if we added a verb such as “saw”.

So no, a Russian would not consider a form of the language in which the use of cases was relaxed to be simpler. He would probably consider it to be borderline incomprehensible baby talk.

You mention a difference in case use between classical Latin and vulgar Latin. I am not well informed about this subject, but some things I have read have left the impression that classical Latin reflects an earlier form of the language before so many cases were lost. There is a general trend for languages to lose cases due to pronunciation shifts or foreign influence.

EDIT to address comments

As Andrey Chernakhovskiy and @Annix point out “Это меня!” is a valid utterance if a subject and verb are supplied by the context. The difference between English and Russian is this: If an American says “It is me.”, the grammatical error is not even noticed because case plays almost no role in conveying meaning. But if our American is learning Russian and translates it word-for-word and says “Это меня!”, he is likely to get blank stares because his use of case implies that there are a subject and verb in the context when there are none.

@Annix compares the ommision of case to the ommision of the word “of” in English. This is a brilliant example. I wish I had thought of it.

@SixthOfFour asked how Tarzan might speak while still learning Russian. In reality his speach would be incomprehensible much of the time. But in a movie it would contain errors carefully selected to impair comprehensibility as little as possible. In Russian he might leave nouns undeclined, avoid the use of pronouns, and speak in the third person. As @Annix points out Tarzan could utter the words “Меня Тарзан!” in a context which implies “зовут”, but in my experience beginning speakers do not understand grammar sufficently to understand that this is possible.

EDIT to address points raised by @dainichi

@dainichi comments served to further illuminate the differences between how English and Russian speakers perceive case.

@dainichi objected to my description of “It’s me” as “wrong” according to the rules of formal literary English. Whether we approve or disapprove of the prescriptive approach, the question that was asked is whether ordinary Russians and Germans use cases and conjugate verbs as the prescriptive rules in grammars require or are these rules an affectation of over-educated snobs.

As an example I referred to the well known prescriptive rule for English and Russian which states that the predicate nominative must be in the nominative case. I then cited a well-accepted expression (“It’s me”) which violates this rule.

My point was that for English it is reasonable to discuss whether and when we should follow this conservative prescriptive rule seeing as it is based on arbitrary or outdated concepts of case. But if we suggested to even the most uneducated Russian that in informal speech “Это меня” (It’s me) is an acceptable substitute for “Это я” (It’s I), he would think we were either joking or very confused. It is not that he knows and loves prescriptive rules. It is because he is acutely aware of the message about the role of the noun in the sentence which each case conveys.

If we English speakers were to ascribed this level of meaning to case, we would not be able to say “It’s me.” or “Me and my friend went to the park.” because these utterances would appear to be nonsensical. Why “It’s me.” is acceptable is an interesting question. Maybe as @dainichi suggests we now perceive “me” as nominative or we have adopted French or Danish practice. Or maybe we just got confused. But no matter what the reason saying “It’s me.” would have been practically impossible had we perceived cases as Russians do. This is why even uneducated Russians use the full case system.

60: What’s the difference between accusative, unaccusative, ergative, and unergative? (score 12474 in 2017)

Question

What does it mean for a language or verb to be one or the other of these typologies (examples would help)? Can it be more than one at once?

Answer accepted (score 9)

Intransitive sentences can be either unaccusative or unergative based on the the subject’s thematic role. If the subject is the agent, the resulting construction is unergative. (e.g. Jane sneezed or John danced).

If the subject is not an agent, the sentence is unaccusative. (e.g. Jane arrived or The snow melted)

Ergative and accusative refer to cases (noun inflections). In languages that have it, accusative marks the objects of transitive verbs. Ergative case marks the subject of transitive verbs.

Languages are often divided into Nominative-Accusative and Ergative-Absolutive, but this is an over-simplification. The Wikipedia article on the subject is pretty good.

I’m sure there is some arcane connection between ergative/accusative and unergative/unaccusative, but in practice they aren’t related.

Answer 2 (score 9)

Intransitive sentences can be either unaccusative or unergative based on the the subject’s thematic role. If the subject is the agent, the resulting construction is unergative. (e.g. Jane sneezed or John danced).

If the subject is not an agent, the sentence is unaccusative. (e.g. Jane arrived or The snow melted)

Ergative and accusative refer to cases (noun inflections). In languages that have it, accusative marks the objects of transitive verbs. Ergative case marks the subject of transitive verbs.

Languages are often divided into Nominative-Accusative and Ergative-Absolutive, but this is an over-simplification. The Wikipedia article on the subject is pretty good.

I’m sure there is some arcane connection between ergative/accusative and unergative/unaccusative, but in practice they aren’t related.

61: X-Bar theory and Trees (score 12387 in )

Question

I am studying linguistics and now this semester my subject is Syntax. I tried a few times to draw a tree using X-Bar theory and Dp’s but I couldn’t achieve. Can someone please draw a tree of the sentence down below for me ?

I am going mad while studying Syntax. I would be appreciate if you can do it for me. Thanks a lot.

“John says that Mary believes their daughter’s constant failing of her classes surprises her teachers very much.”

Answer accepted (score 1)

Below you can see how I would do it. I didn’t use triangles to be clearer.

I follow Carnie (2012), just instead of a TP, a tense phrase, I used an IP, an inflectional phrase (see Tallerman 2005). Supposedly the IP structure is the structure that all finite verbs have and all finite verbs have an inflection as far as European languages are concerned.

XX      To generate this tree:
[IP [NP [N' [N John]]] [I' [VP [V' [V says] [CP [C' [C that] [IP [NP [N' [N Mary]]] [I' [VP [V' [V believes] [CP [C' [C ∅] [IP
[DP [DP [D' [D their] [NP [N' [N daughter]]]]][D' [D ’s] [NP [N' [AdjP [Adj' [Adj constant]]] [N' [N failing]] [PP [P' [P of] [DP [D' [D her] [NP [N' [N classes]]]]]]]]]]] [I' [VP [V' [V' [V surprises]][DP [D' [D her] [NP [N' [N teachers]]]]]][AdvP [Adv' [Adv very much]]</p>

Answer 2 (score 1)

Below you can see how I would do it. I didn’t use triangles to be clearer.

I follow Carnie (2012), just instead of a TP, a tense phrase, I used an IP, an inflectional phrase (see Tallerman 2005). Supposedly the IP structure is the structure that all finite verbs have and all finite verbs have an inflection as far as European languages are concerned.

XX      To generate this tree:
[IP [NP [N' [N John]]] [I' [VP [V' [V says] [CP [C' [C that] [IP [NP [N' [N Mary]]] [I' [VP [V' [V believes] [CP [C' [C ∅] [IP
[DP [DP [D' [D their] [NP [N' [N daughter]]]]][D' [D ’s] [NP [N' [AdjP [Adj' [Adj constant]]] [N' [N failing]] [PP [P' [P of] [DP [D' [D her] [NP [N' [N classes]]]]]]]]]]] [I' [VP [V' [V' [V surprises]][DP [D' [D her] [NP [N' [N teachers]]]]]][AdvP [Adv' [Adv very much]]

62: What is juncture in phonetics and/or phonology? (score 12292 in 2013)

Question

What is juncture in phonetics and/or phonology?

From the Wikipedia article on Juncture, I gathered that juncture is a phonological (and/or phonetic?) phenomenon that allows listeners/speakers of a given language to hear and produce differences between words and phrases each comprise the same sequence of phonemes. Juncture, apparently, allows English speakers to produce and hear the distinction between “nitrate” and “night rate”, “a name” and “an aim,” etc.

I didn’t get enough information from this article. It listed and very briefly defined different types of juncture, but did not explain them or provide examples of them. It also made too many unhelpful references to “recreational linguistics.”

Unfortunately, I haven’t been able to find better articles on the Web. So what is juncture in phonetics and/or phonology, and what are some examples?

Answer accepted (score 2)

“Juncture” is an analytic entity constructed to account for problems in phonemicization, which primarily encodes morphological and syntactic information, often involving syllable structure (syllabification tends to be assigned bottom-up as determined by the morphemic distribution of phonemes – CV always forms a syllable within a morpheme, C+V or C#V does not always). The underlying premise is that surface phones can be mapped to a subset of such sounds, the phonemes, by eliminating all purely “predictable” sounds. But the fact of there being phonetic differences such as “a name” vs “an aim”, or the night-rate / Nye-trait pair, would suggest that English has many more “phonemes” than we want to believe (for example, aspirated and unreleased t would be phonemes, because they both occur in the same surface environment). By positing an entity “juncture” which can be positioned just like any other phoneme, aspiration as a contrastive property of phonemes can be eliminated (and there are a lot of other problems that juncture eliminates).

The problem could also be solved by directly referring to morphological or syntactic structure, but at the time (prior to the 60’s) this would run afoul of the “single level” theory that prevailed, where phonemic analysis can only consider phonemes, morphophonemic analysis can only consider morphophonemes, morphological analysis can only consider morphemes. There was nothing in the theory that would prevent having “juncture” be the realization of morpheme concatenation, so by translating morphological structure into analogs on the phonemic level, mixing of analytic levels was avoided. This is essentially the driving force behind later (1980’s) prosodic theories of the syntax-phonology interface, where rather than letting phonology directly see syntactic structure in determining whether a given rule applies, a phonological analog was constructed (such as “the prosodic word” or “the prosodic phrase”): that way, phonology only refers to phonological things, which are constructed by reference to syntactic things (and similar moves were made for indirectly encoding morphological information). A classic paper using the concept “juncture” is W. Moulton (Language 1947) “Juncture in Modern Standard German”; see also Hockett’s 1958 A Course in Modern Linguistics, Lehiste 1960 An Acoustic-Phonetic Study of Internal Open Juncture, and the not totally unrelated 1956 paper by Chomsky, Halle and Lukoff “On Accent and Juncture in English” (not structuralist, though). Scheer A Guide to Morphosyntax-Phonology Interface Theories provides a reasonable overview. Juncture too on a second and more obscure interpretation, as a means of indicating facts of intonation, supplanted by current theories of boundary tones and different kinds of accents. Basically, the reason why you can’t find web resources is that the concept has fallen into desuetude in linguistics over the past half century.

63: How to calculate the co-occurrence between two words in a window of text? (score 12267 in 2013)

Question

I want to build a keyword extractor based on the TextRank model as explained in RMPT04. But I don’t understand how to calculate the co-occurrence between two words in a window of text explained in the point 3.1. Moreover, is a corpus necessary?

Answer accepted (score 5)

Of course you need a corpus.

Generally in statistical NLP, you train your model based on a corpus. For example, for text classification where an input document is fed to the model and it should output its class (from a list of classes). The model is trained on many documents with their corresponding classes and when the new document is tested under that model, it will use the features (information) which was extracted from those documents to classify the new document.

In the case of co-occurrence of two words, you can use context-vector, which is very common in statistical NLP. It has a simple definition and very easy to implement, but you will need a corpus:

You will define a vector with fixed length (the number of unique words in your corpus) for each unique word in your corpus. The context vector for each word tells us how many times other words have co-occurred with the current word in the defined window, e.g. in a window of words, you see what are the other words occurred with the current word and increment their corresponding element in the context vector. A simple example is show below:

Corpus: A D C E A D F E B A C E D

Window size: 2 (the 2 words of the either side)

Context vectors:

      A B C D E

A     0 1 3 2 3    
B     1 0 1 0 1     
C     3 1 0 2 2    
D     2 0 2 0 4    
E     3 1 2 4 0

Using these context vectors you can get co-occurrences very easy. For example co-occurrence of D and E is D[E] = 4.

64: Why don’t the French pronounce consonants at the ends of words? (score 12175 in 2013)

Question

I am curious what could have caused the shift in pronunciation. I presume it must have occurred after the spelling of words was standardized. According to the History of French wikipedia article, this happened in the transition to Middle French around 1500 and to Early Modern French around 1700.

Answer accepted (score 15)

Word final consonants in French are pronounced, but only under certain conditions that has to do with group- or phrase phonology; they are usually not pronounced at the end of a phrase or a word uttered in isolation.

Descriptions of late 17th century French suggest a stage where most consonants were (still) pronounced, but some were elided mainly before other consonants. These elisions eliminated some consonant clusters inside word groups.


Modern French

Word-final consonants are pronounced when the next word starts in a vowel in certain groups, the phenomenon known as liaison.

Liaison is considered obligatory within some groups, eg between article and adjective/noun

  • les enfants /lez‿ɑ̃.fɑ̃/

between subject/object pronoun and verb (or verb followed by subject/object pronoun)

  • nous avons /nu.z‿a.vɔ̃/

but not between phrases, eg between a subject noun and verb

  • Mes amis arrivent /me.z‿a.mi (*.z‿) a.ʁiv/


Late 17th Century French

Contemporary descriptions of ‘good’ pronunciation emphasize cases when word final consonants are elided and not pronounced, i.e. the reverse of modern descriptions.

An example from L’art de prononcer parfaitement la langue françoise (1689).

il fait, on lit, vous dites, les mains, vos parens, qu’on prononce à peu près comme s’il y avoit, ifai, onli, voudite, lémains, vauparans.” (p 101)

Like in modern French in these examples the final consonant of a pronoun/article before another consonant was not pronounced, but (group) final -s had not yet been generally dropped:

  • le(s) mains, vo(s) parens

  • vous(s) dite(s)

Like in modern French (group) final -t in these examples was dropped, and unlike the ‘l’ of il is also elided before a consonant.

  • on fai(t)

  • i(l) fai(t)

These examples suggest that consontants were elided first inside specific word groups to simplify clusters, eg from |vousdites > voudites|, |ilfait > ifait|, before they were elided anywhere else. Most of these elisions also apply for modern French, but some have been reversed like in the case of ‘il fait.’

Compare La Génie de la langue françoise (1684) which has this to say about word final -f:

  • F is always pronounced in

    “..Bref, chef, fief, grief, nef, esquif, if, arbre; Iuif, motif, naïf, tarif, vif: nominatif, genetif, &c, indicatif, subjonctif, &c, neuf, de novus.” [p20]

But

  • F is only pronounced before vowels, and at the end of a line of verse or phrase (period) in these words:
“..bœuf, œuf, neuf, de novem. On dit un œu dure; du bœu salé; neu soldats”. [p20]

So at this point in time for some words the -f was never elided, but for some it was elided but only before consonant in the same group, ‘bœuf’ (beef) but ‘bœusalé’ (salted beef), ‘neuf’ (nine) but ‘neusoldats’ (nine soldiers).

Furthermore

  • F is not pronouced before plural s
“Au pluriel on prononce toûjours des grièz, du singulier grief: & des bœus”.

Interestingly most of the words the first group - where -f is never elided - are adjectives, and would most commonly occurs as the final word in their group. The elisions in the second group are not current in modern French, but some elisions before plural are, 17 Cent ‘bœus’ for “bœufs” is today ‘bœu’ since plural -s is now generally dropped.

Answer 2 (score 14)

Weakening of consonants (typically stops > fricatives > approximants > nothing) is a process which has affected many languages at many times. It’s particularly noticeable in French because, as you say, the orthography predates the change; but the same is true for certain patterns in English (most words that contain ‘gh’, for example, and for non-rhotic accents such as most dialects of England, ‘r’ after a vowel).

What is slightly unusual in French is that final consonants have often been lost except where followed by a vowel - it’s much more common to find consonants lost between vowels, and indeed this happened in French earlier, eg Lat. credere > Fr. croire. But the fact that the process has mostly affected only word-final consonants means that it is a different process.

Answer 3 (score 3)

Your question’s a bit ambiguous: you could be asking about silent consonants which do exist in the writing but are never pronounced, or those that are there but pronounced only depending on what sound follows (i.e. next word’s initial sound).

In the 1st case it’s because French orthography is veeery conservative, in some regards even more than that of English, so it retains spellings closer to Latin even if nobody speaks like that anymore, like “est”: nobody has pronounced that “s” in more than one thousand years.

The 2nd case is a type of sandhi, “t” in “est” does have a sound in “est-il” but is silent in “est rien”.

65: Do absolute synonyms exist? (score 12166 in 2012)

Question

By absolute synonyms, I mean words (in the same language) that are interchangeable in all situations.
There can’t be differences in register, meaning, or emotional value.

Is there material that treats about this subject, especially showing examples in English? In case they don’t exist, I’d like to read material with an explanation or theory why they can’t exist in any language.

Answer accepted (score 7)

I would like to make a more general contribution, but still regarding your specific question.

There are many definitions of synonymy. Let’s agree in a simple one: it is a semantic relation between the meanings of words or sentences. We now have the problem of defining meaning. In one way or another, the concept of synonymy is understood without requiring absolute identity between words, if there is such thing. Without going in the details, a logical account for absolute synonymy would say that two linguistic forms are synonyms if they are interchangeable salva veritate, that is, keeping the truth value of the expression they are part of. In his attack on the concept of analyticity, Quine (1951) discussed the idea of whether this kind of interchangeability was a condition strong enough for synonymy. Saying that bachelor is a synonym of unmarried man was saying that the proposition

All and only bachelors were unmarried men.

was analytical. But that is circular, and Quine wanted to discuss whether the sufficient condition for cognitive synonymy was interchangeability and not analyticity. Hundreds of articles have been written about this and I cannot review all the arguments, but the point was to illustrate one of the basic ideas about logical synonymy. If you think about the example of the correspondence between bachelor and unmarried man, you will soon realise that there are not really many words like those in natural language.

So, if we talk about words, we think that synonyms are those that can be substituted for each other in sentential contexts, and if we talk about sentences, then they are said to be synonyms if the substitution preserves truth values. But still we have to be careful with the concept of synonymy. For words, we can have different semantic values that may not have a one-to-one correspondence to each other. Think about the words old, ancient, aged, obsolete. They are not interchangeable in all contexts: old/ancient ritual vs. old/?ancient laptop. So it’s not only about the meaning of the synonyms, which is unclear, but also about their relations with other words on their contexts of appearance (the basic idea being compositionality in semantics). We can also use euphemisms as synonyms for the words we don’t want to use. Here we are opening the door to a whole new set of factors that can influence synonymy, as some previous comments/answers point out. We must then consider that synonymy (either absolute or partial) depends on what are the meanings we are trying to compare and the compositional nature of the relations between words in the context of sentences. This is a matter of debate for any kind of approach to meaning, either in semantics, pragmatics and other subfield.

Let us admit that there can be at least two kinds of synonymy: there might be a full synonymy for words that are logically (salva veritate) interchangeable, but that is not very common. And there is also this everyday use of synonymy that treats sameness of meaning in a more or less unrestricted way. One could see meaning as some sort of continuum and words as mapping certain parts of that continuum, with overlaps between them that allows us to call them synonyms. This can be analysed theoretically and empirically in linguistics. As was mentioned before, definitions depend on the framework used to explain meaning. If the meanings of words are their referents, then it is easy to find synonyms. A second option is to understand that meanings are senses (more or less in he Fregean conception), but then you will find many debates as well. A third option is a psychological perspective were meanings are representations or concepts in the mind, and there you will find even more disagreement. The definition of synonymy is then closely related to the semantic perspective adopted and the definition of meaning endorsed. You can compare words in terms of denotations, semantic features, semantic maps, representations, etc. You can also study synonymy in terms of sociolinguistics, comparing dialects and registers, or even from a cross-linguistic point of view. The use of computational methods and corpus research is also frequent. But at the end, you can only say that there is absolute synonymy if you analyse all the possible meanings/contexts, and that is quite difficult, if not implausible.

In sum, definitions of synonymy are relative to the theoretical framework adopted, and particularly dependent on the definition of meaning. What you call “absolute synonymy” can also be what has been defined as logical or full synonymy, but not everyone will agree. Words like bachelor and unmarried man can be said to be full synonyms, but then again, these examples are extremely rare and there might be differences we don’t know yet. That is also the case of gorse and furze, mentioned in a previous answer. Perhaps the reason is that these words have a very narrow range of contexts of use. It might be the case that words with more meanings and more frequently used tend to be less close to the end of the continuum where full synonymy (if it exists) is found. Besides, two or more words that mean exactly the same seem to be not economical for language, and thus it would tend to be avoided. Are there absolute synonyms? There can be, but then again, the answer itself is not absolute. I am sorry I can’t provide references, but it seems that’s the situation, at least in linguistic semantics. Partial synonymy is assumed for methodological purposes without much discussion, while absolute synonymy is considered rare because it is inefficient and uneconomical.

Answer 2 (score 5)

Quoting an excerpt from Haiman’s 1980 paper “The Iconicity of Grammar”

The first type [of iconicity], whose existence is universally (though often only implicitly) recognized in practice, is that of a one-to-one correspondence between the signans and the signatum, whether this be a single word or a grammatical construction. The iconic assumption that such a regular bi-unique correspondence must exist motivates the inclusion under a single heading of the various meanings of a single form in both traditional dictionaries and grammars. It also serves as the unspoken basis for the commonly accepted axiom that no true synonyms exist, i.e. that different forms must have different meanings (cf. Bloomfield 1933:145, Nida 1958:282, Bolinger 1968:127 for representative statements of this axiom). Following Hjelmslev, Kurylowicz, and Martinet, I will refer to this relationship as the iconicity of ISOMORPHISM. (Haiman 1980:515–6)

The citation “Bolinger 1968” is: Entailment and the meaning of structures. Glossa 2.119-28, and “Nida 1958” is: An analysis of meaning and dictionary making. IJAL 24.279-92.

Answer 3 (score 3)

This question triggered a memory of a thread in my favourite language blog.

I couldn’t recall exactly the name of the post of what words were discussed, but in my question for it I turned up another thread containing comments about these terms:

“furze”, “gorse”, and “whin”

Then finally I found the thread I was looking for, which was about the terms:

“gennel” and “snicket”

66: Monogenesis vs. Polygenesis (score 12126 in 2017)

Question

By following the comments to another question about the evolution of Khoisan languages, I learned that there is a heated debate in Evolutionary Linguistics about the origin of language. Some quick research on Wikipedia shows that there are two major, competing hypotheses:

  • The monogenesis hypothesis, which holds that there was a single proto-language, estimated to have originated between 200,000 and 50,000 years ago.

  • The polygenesis hypothesis, according to which languages evolved as several lineages independent of one another.

What is the current state of this debate? What are the most recent (and compelling) pieces of evidence in favor of each side?

Answer accepted (score 18)

I’m not sure there really even is a “current state of the debate.” Most linguists seem to view it as a question that we don’t have appropriate models or analyses to address. To start, I would like to bring up some finer points:

  1. “Genetic relatedness” of language isn’t even really well-defined. It is entirely possible that throughout the past many millennia, languages had a tendency to borrow very differently than today.

    • For instance, suppose you were one of a group of a dozen young women from tribe B who married into tribe C, which has about a hundred people, all at the same time. If tribe B had pronouns, and tribe C did not, but inflected verbs for person, say, only in perfective aspect, you all might start using the pronouns from tribe B; if the people in tribe C like the idea enough, they might start using them. Meanwhile, if tribe A had recently imported grammatical number from tribes elsewhere and lent it to tribe B, you might carry that with you through to tribe C.

    • What we have, in sum, is a mess. Linguists working on pidgins, creoles, and creolization may be our best bet to provide breakthroughs in this.

    • This often brings me to the analogy of genetics in a biological sense: when we look at DNA within the nucleus of a eukaryotic cell, we’re definitely able to determine relatedness; however, in a colony of bacteria/organisms that may not exist anymore, the fact that one organism can “eat” a plasmid from another and maybe even splice it into its own principal DNA ring screws up the notion.

  2. “Human language” isn’t really well-defined. It’s easy enough to separate what is human language from what isn’t, in most cases, in the present day, but the tools we’ve developed for that from our present-day experience probably aren’t adequate to evaluate the situation, say, 300 000—40 000 years ago, or even describe it well. The burden then falls to the inquirer, what is it precisely are you wondering originated once or multiple times?

  3. What if there really was polygenesis of language, but those of the present day all come from a single source (assuming we could even define what we meant by that, and it wasn’t silly)? If the last language not from the same source was, say, some North American language isolate that went extinct in the 18th century, after being sparsely documented? This leads to more interesting questions, such as what if, grammatically, it wasn’t all that out-of-the-ordinary?

But as long as we’re speculating, my bet is that the real answer is a mixture of mono- and polygenesis: there were many periods when tribes maintained close contact with each other, borrowing heavily across the grammatical spectrum throughout a dialect continuum for centuries, and then underwent periods of individuation, due to geographical or cultural effects; during one of these periods, perhaps you would say one of these speech patterns would finally qualify for whatever formal definition you have of “language,” and during the next period of mixture, maybe it was borrowed from more than it borrowed. But then, say, a thousand years later, a dialect half a continent away, to which only a small portion of those innovations filtered, innovated enough on its own to where you could call it a “language.” This whole scenario is to bear out the point that what we really have is a mess.

(I know I’ve employed a nonstandard usage of “dialect”; I just didn’t want to use the word “language” above, for obvious reasons.)

Some references:

Polygenesis, convergence, and entropy (1996), by Lutz Edzard, takes a comparable view

In Language Polygenesis: A probabilistic model (1995), David Freedman and William Wang point out the flaws in the usual probabilistic argument for monogenesis

Addendum:

(let me know if I should create another question or something):

An article in the New York Times, World’s Farmers Sowed Languages as Well as Seeds (2003), bears out the relationship between the success of the comparative method and the presence of agriculture in the cultures being studied. (This relates to my point about dialect continua/grammatical innovation/“membranes” and leads into a response to MatthewMartin’s answer.)

+1 to MatthewMartin for mentioning a vital element of the “debate” Otavio Macedo was referring to. Greenberg conducted some of the most groundbreaking and notable research in linguistic taxonomy of the past hundred years, including but by no means limited to the establishment of Niger-Kordofanian and Afro-Asiatic. There always seems to be a lot of unhelpful, not-so-under-the-surface vitriol in many reactions to the Greenberg/Ruhlen research. On the other hand, when in the Amazon review of Ruhlen’s book, Larry Trask says

of the 13 Basque items presented on page 65 (as ‘language B’), four are wrong, and two more are not even native Basque words, but are words borrowed from Latin or Spanish. And there are also some profound problems concerning the origins and earlier forms of several of the others,

this is a factual objection, which can easily be verified. Similarly, in A Siberian link with Na-Dene languages (which I feel is the most notable finding in linguistic taxonomy of the twenty-first century so far, but that’s beside the point), we read

The first person to claim a genetic link specifically between Yeniseic and Athabaskan-Tlingit (Eyak was then unrecognized as a Na-Dene language) was the Italian linguist Alfredo Trombetti (1923). Since that time, many other linguists, notably Merritt Ruhlen (1998) have repeated the same suggestion, though typically including Haida in Na-Dene

and

Merritt Ruhlen’s (1998) proposed cognate sets contain several genuine cognates, among over 75% coincidental look-alikes. These are Ruhlen’s comparisons for: head, stone, foot, breast, shoulder/arm, birch/birchbark, old, and burn/cook, and possibly a few others. The correct identification of cognate words for “birch/birchbark” is particularly noteworthy, as this basic vocabulary item is specific to families of the northern latitudes. The finding of these cognates, though it was impossible to confirm them as such in the absence of much more investigation, represents an important contribution,

in other words, (1) Greenberg and Ruhlen are correct that Na-Dené and Yeniseian are relatives, (2) Ruhlen’s wordlist is partially correct and helpful, but (3) Haida is not part of their linguistic unit, and (4) 75% of Ruhlen’s correspondences turned out to be invalid.

This is getting a little long, so I’ll summarize the rest. After his Africa research, Greenberg traveled to Papua New Guinea and to South America (IIRC; if anyone can source this I’d be grateful) and collected Swadesh-type word lists from hundreds of tribes. This, too, is incredibly important research, of a nature not many have had the constitution to undertake. The results of it are, too, very important.

What’s the conclusion? More research of this sort needs to be done, to get a lower error rate on the words in the lists. Sound changes need to be added back into the equation with gusto, so that we can find more correct correspondences and throw out more of the ones that don’t actually hold.

And on those pronouns: they certainly do hint at the possibility of Eurasiatic (for instance) as a linguistic unit. But what if, again, there were an era where pronouns were the new hot item? Then they could all reduce to loanwords. Again, the conclusion is more research needs to be done.

Answer 2 (score 11)

I thought I’d follow up Daniel’s excellent comment by giving an example of the problems faced by language reconstruction.

One of the best tools we have for establishing a relationship between two languages is the comparative method. This was developed in the 17th century after people started noticing some deep similarities between Sanskrit, Greek, and some other languages. They hypothesized that they all shared a common ancestor, which they called “Proto-Indo-European”. By comparing the way one concept is expressed in different languages, you can come up with correspondences. If you can generalize this correspondence, you can establish a relationship between the two languages.

Let’s say you discover a new island, Ba and on this island are two distinct cultures with there own distinct language, Tamo and Danu. We suspect that they share a common linguistic ancestor; the island is so isolated that we don’t expect multiple migrations. Still, we can’t rule it out.

We proceed by constructing a lexicon of the two languages using basic words, something called a swadesh list, which contains all the words we expect a language to have. So we’ve done this and developed a reliable list along with their pronunciations. It might look something like this:

 Word     Tamo     Danu 
—————————————————————————
'Man'     /tam/    /dan/
'Woman'   /tami/   /dani/
'Dog'     /kɔd/    /god/
...       ...       ... 

So on and so forth.

We immediately notice that these words look similar. ‘man’ is very similar in both languages: an alveolar stop, /a/, nasal, and rounded back vowel. But, there are some variations. It’s a voiced stop in Danu and an unvoiced stop in Tamo. But look: we get the same pattern with ‘dog’, a voiced alveolar stop in Danu and an unvoiced one in Tamo. If this holds up across a lot of words, then that is strong evidence these languages share a common ancestor, which we might call ‘proto-ba’.

There are lots of things that can go wrong with this. Word-borrowings, crib words, and convergent happenstance can all contrive to make two languages appear related when they are not. You can get over excited about your discovery, and start noticing similarities that aren’t there. You’ll want to do a lot of heavy research about the language and the people that spoke it. Who have they had contact with? Where did they come from? Have there been any major migrations that might have affected the language?

Another problem is that after a certain point, this method stops being useful. Language changes can build up over time, complicated each other or wipe each other out. Our knowledge about the language becomes less and less reliable, and we start losing important pieces to the puzzle. There is a lot of controversy about how Japanese is related to Chinese, for instance, and no one knows where Basque came from.

We’d be fighting through 50,000 years of language change, most of it without any written records and a bare sketch of population movements, and that’s a conservative estimate for the age of language. It’s really easy to under appreciate just how long a time that is. Proto-Indo-European, for instance, started to split up around 6000 years ago. The invention of writing is around the same age.

There are some linguists that have tried to come up with a sketch of ‘proto-world’, but they have been met with intense skepticism. Just google “proto-world” and click around on a few of the links.

Answer 3 (score 5)

Merritt Ruhlen is has a good popularly accessible book on the issue and my answer is based on my reading of his book.

The anti-monogensis stance is by and large based on the proto-indo-european research agenda and their methods. It is not controversial that those techniques fall apart at about 5000 maybe even 10000 years worth of inferences. A pretty good accessible book about this research agenda and how it agrees with the archeological record for IE, “The horse, the wheel and language.” The story for PIE is pretty convincing and the reconstructions almost look like a ghost of the original speakers might be able to understand it, albeit this is hardly the level of proof you get in physics or math.

Ruhlen makes the case that

  • going further back uses different techniques, e.g. focus on the evolution of pronoun systems, ignore step by step transformations, and these techniques are different than used for reconstructing PIE
  • has more modest goals, classification is a much more modest goal (i.e. will not result in a PIE type reconstruction of protoworld)
  • has turned into exercise of academics flinging feces at each other (my words, and darn if it isn’t appropriate)
  • this isn’t the first time the academic community reacted violently against a superfamily and then more or less accepted it, using Greenbergs research on the Bantu languages as an example.

And now for my personal opinion– some of the most vociferous opponents are American Indian language researchers. In that community, so many competing crackpot ideas have been advanced in the past (e.g. the Indians are the lost tribe of Israel and that Amerind languages are some type of Hebrew) that the whole field has irrationally turned against the idea that Amerind languages can be rationally classified into superfamilies.

Now in other fields, genetics, archaeology and anthropology– the researchers are coming to the same conclusions or coming to conclusions supported by the other fields. The classification of superfamilies is matching the migrations of ancient humans.

I think someone in the world of Economics (Keynes maybe?) said progress in academia is made when there are funerals. I guess that makes sense because no one wants to retract a research paper that they wrote a long while back to take into account a major shift in the understanding of the world.

67: Why do English transliterations of Arabic names have so many Qs in them? (score 11986 in )

Question

I remember when the Muslim holy book was the Koran when I was in middle school, but now it’s the Quran. But it’s always been Qatar and Iraq (but still Kuwait.) Who decided that ‘Q’ was going to be represent that sound instead of ‘K’, and why?

Answer accepted (score 79)

In Arabic, in fact, they’ve always been separate sounds! The sound we write “K” is spelled with the letter ك in Arabic, and is pronounced a little bit further forward in the mouth; the sound we write “Q” is spelled with the letter ق and pronounced a little bit farther back. In phonetic terms, “K” is a velar sound, and “Q” is a uvular sound.

English doesn’t distinguish between these two different sounds. But since we’ve got a spare letter lying around—the English letter “Q” is actually related to the Arabic letter ق, if you go back far enough!—it’s become conventional to separate them in writing. “Kuwait” is spelled with a ك, so it gets a “K”; “Qatar” is spelled with a ق, so it gets a “Q”. The “Qur’an”, similarly, has a ق in it.

Nowadays, by the way, the letter “Q” has started to be associated with Arabic in particular, since it shows up significantly more there than in English. So in pop culture, you’ll sometimes see names like “Hakeem” respelled to “Haqim”—even though the original Arabic has a ك in it! Linguistically, this is called hypercorrection, and it’s the same reason you’ll see words like “forté” with an é: English-speakers associate the acute accent with Romance loans (café, fiancé, résumé), but it has no special meaning in English, so it sometimes gets inserted where it doesn’t actually belong (the original Italian word is simply forte).

Answer 2 (score 12)

I was going to propose Julius Klaproth, in his 1823 book Asia Polyglotta. He notates the difference between ك and ق as k versus q. In earlier works such as Hamer 1806 Ancient alphabets both were represented as “k” with a note that [q] ق is “hard”. However, I see that Christian Ravis 1649 in A discourse of the orientall tongues : viz. Ebrew, Samaritan, Calde, Syriac, Arabic, and Ethiopic together with a generall grammer for the said tongues p. 96 or 97 (in the “General grammar” part) notates the kaf / qaf distinction with k versus q as well. Subsequently, it emerged that Otho Gualtperius 1590 in Grammatica linguae sanctae described the analogous Hebrew kaph / qoph distinction as kaph c,ch (=[k,x]) versus qoph “q or k”. It is worth noting that the letter “q” itself derives historically from and still resembles the Semitic letter qoph, which represented a Semitic sound that was either [q] or [k’] and is [q] in Arabic.

So at this point I would say that it is unknown who first devised that convention.

Answer 3 (score 12)

I was going to propose Julius Klaproth, in his 1823 book Asia Polyglotta. He notates the difference between ك and ق as k versus q. In earlier works such as Hamer 1806 Ancient alphabets both were represented as “k” with a note that [q] ق is “hard”. However, I see that Christian Ravis 1649 in A discourse of the orientall tongues : viz. Ebrew, Samaritan, Calde, Syriac, Arabic, and Ethiopic together with a generall grammer for the said tongues p. 96 or 97 (in the “General grammar” part) notates the kaf / qaf distinction with k versus q as well. Subsequently, it emerged that Otho Gualtperius 1590 in Grammatica linguae sanctae described the analogous Hebrew kaph / qoph distinction as kaph c,ch (=[k,x]) versus qoph “q or k”. It is worth noting that the letter “q” itself derives historically from and still resembles the Semitic letter qoph, which represented a Semitic sound that was either [q] or [k’] and is [q] in Arabic.

So at this point I would say that it is unknown who first devised that convention.

68: How can nasalized vowels in English be explained? (score 11881 in 2013)

Question

. . .Auntie Marge’s present, see, it’s here under. . . [audio source]

In the audio above, [mɑː] sounds like this:

[..m..]
  [......ɑː.....]
---- (time) ---->

This sounds close to nasalized [ɑː], like in Indic ॐ [om]:

[..o..]
[.....m.....]
---- (time) ---->

My understanding is that, since English has no nasalization, it must sound like this, with much smaller overlap:

[..m..]
     [......ɑː.....]
---- (time) ---->

The same applies to “see” and “under”, but to a lesser extent.

My question is, how to explain this phenomenon? Is it possible for languages that have no nasalized vowels? Or is it just a random deviation and is not a rule?

Answer accepted (score 16)

This is not a “deviation” at all. Contrary to your understanding, English employs tons of nasalization in its vowels. What it lacks is phonemic nasalization, that is, phonologically nasalized vowels whose distribution is not predictable.

Generally, vowels (and sonorant consonants) in English get phonetically nasalized (i.e. they are pronounced with the velum lowered so that air can escape through the nose) when they are adjacent to nasal consonants. So, the [i] in bead is usually not nasalized, but the [i] in mean is nasalized because it is flanked by nasal consonants. The nasal passage is open for the [m] and stays open through the duration of the vowel and the final [n]. A vowel that is between a nasal consonant and a non-nasal consonant will usually get partially nasalized (more nasalized on the side next to the nasal). This explains why you are hearing the vowels in Marge’s and under as nasalized.

Many speakers leave their vela slightly lowered for a majority of the time while they are speaking, resulting in rampant partial nasalization of most of their vowels and sonorants, regardless of whether they are abutted by nasal consonants or not. As an American, I have noticed this to be a pervasive tendency in many dialects of British English (such as the one in your link). This would explain nasalization in words like see (although in this particular case that vowel does not sound nasalized to my ears).

But the important point here is that, in all the cases above, the nasalization of the English vowels does not contribute to the phonological distinctiveness of those vowels. That’s why speakers who leave their vela lowered most of the time have no trouble communicating with speakers of the same dialect that don’t leave their vela lowered.

69: What are the origins of the word Hebrew? (score 11809 in 2012)

Question

I have read that this word may derive from an Egyptian decree issued by Pharoah Merueptah (1224 which referred to the hebrew word ‘habitu’ (type of slave) who carry stones for the great pylon of the great city of Rameses’.

But what is the actual etymology?

Answer accepted (score 10)

From Chambers Dictionary 11th Ed.:

ORIGIN: OFr Ebreu and L Hebraeus, from Gr Hebraios, from Aramaic ʻebrai (Heb ʻibrī), literally, one from the other side (of the Euphrates)

From Oxford English Dictionary 2nd. Ed.:

ME. Ebreu, a. OF. Ebreu, Ebrieu (nom. Ebreus, 12th c. in Hatz.-Darm.), ad. med.L. Ebrę̄us for cl.L. Hebræus, a. Gr. Ἑβραῖος, f. Aramaic ‭ﻋebrai, corresp. to Heb. ‭ﻋibrī ‘a Hebrew’, lit. ‘one from the other side (of the river)’;

From Etymonline:

late O.E., from O.Fr. Ebreu, from L. Hebraeus, from Gk. Hebraios, from Aramaic ’ebhrai, corresponding to Heb. ’ibhri “an Israelite,” lit. “one from the other side,” in reference to the River Euphrates, or perhaps simply signifying “immigrant;” from ’ebher “region on the other or opposite side.” The noun is c.1200, “the Hebrew language;” late 14c. of persons, originally “a biblical Jew, Israelite.”

From Collins English Dictionary:

[from Old French Ebreu, from Latin Hebraeus, from Greek Hebraios, from Aramaic ibhray</em>, from Hebrew <em>ibhrī one from beyond (the river)]

Answer 2 (score 6)

http://www.abarim-publications.com/Meaning/Hebrew.html#.UA7JVmGwzW4

The word Hebrew comes from the verb (עבר) meaning to pass over, through, take away. The first application of this word is in the name Eber. The second application is in the first occurrence of the word עברי, Hebrew in Genesis 14:13, where Abram is called Hebrew. The first case of narrative use of this verb is in the enigmatic cadaver vision of Genesis 15:17, “…there appeared a smoking furnace and a flaming torch that passed between these parts.”

Answer 3 (score -3)

The word for “Hebrew” in Hebrew is

עברית

pronounced

eev-rit

It’s easy to see how the various language variations could emerge from there (i.e. V changing to B, adding an H at the beginning)

70: Any difference between natural and programming languages? (score 11778 in )

Question

First of all, as a native German speaker, I apologise for my incorrect use of the English language.

After thinking about some different languages and wandering astray on this exact Stack Exchange, I began to notice similarities between natural languages and programming languages:

  • Both languages strongly distinguish between syntax and semantics.
  • Both of them serve the purpose of communication to explain what something is or what is to be done.
  • Both types of languages have a base composition

If we imagine the human mind as a very advanced compiler for all the natural languages, then we can see that:

book, Buch and something along the lines of

ADT Book
{
    variables
    title;
    pageCount;
    publicationDate;
    content;
    rating;
    ...

    functions
    read;
    rate;
    ...
}

all describe one and the same thing: a book, or specifically an object with a title and so on. It might seem like the definition of a book is much more complicated in a programming language rather than a natural language but what we forget is that for our human-compiler-mind book is a shortcut for an object with the specific properties as described in the ADT.

Now one could object that the two types react differently to syntax errors, because a syntax error in typing (teh) is still correctly recognised as the whereas a missing semicolon will cause a compile error in programming languages. What we forget however is that this implicit correction of syntax is just a “feature” of the human-compiler-mind because it is already about 200000 years old and has evolved a great deal.

Another noteworthy objection is that while a programming language is designed by us humans, the origin of natural languages is unknown although theories exist. Based on these theories, one can assume that the natural languages have evolved from some kind of pre-lingustic forms of communication, just like the programming languages have evolved from someone soldering something on a board to writing code on machines running with those soldered boards.

Now regarding everything I have tried to say, my questions are: Are there really any differences between natural and programming languages? If yes, which ones are there and how big of a difference do they make?


Links used:

Programming and natural languages, Alex Chen, September 16, 2004

Natural vs Programming Languages, Rajesh Kumar, December 22, 2012

Artificial Language vs. Natural Language, Cornell University, Fall 1994

The Similarities and differences between languages and programming, Jeff Lau, December 29, 2012

Answer accepted (score 7)

All three of your assumptions about natural languages are questionable. They describe models used by linguists very many of which have been inspired by computer-like algorithms not language itself:

  1. Natural language does not “strongly distinguish between syntax and semantics”. In fact, they are very closely interlinked. Syntactic constructions are used to express all kinds of meanings.

  2. Natural language is used to communicate much more than “to explain what something is or what is to be done”. It’s not even clear that communicating that kind of information is what language evolved to do (see e.g. Dunbar’s thesis). Nevertheless, focusing on the kind of semantics that can be expressed by a programming language is what has been keeping the developments in semantics back.

  3. Natural and programming languages are compositional in very different ways. While you can define all the compositional rules in a programming language, a natural language is much freer - which is what makes language change possible. It also makes the expressive potential of a natural language significantly larger than that of a programming language. There’s no irony in a programming language (that’s not to say that programmers cannot express puns or even parody each other’s code - but they’re communicating those to other humans, not the computer).

In summary, while the language as computer code metaphor has some limited aptness and utility, it is more misleading than useful. Most linguists will have long rejected it if they ever maintained the mapping. If natural and programming languages were even a little alike, we would have working algorithm-based parsers and full-blown AI, already. But natural languages are not algorithmic at all. Understanding and speaking them is more akin to pattern recognition than feature analysis.

What I think is a much more interesting subject for research (and greatly understudied) is the rich and complex ways in which programmers use programming languages to communicate with the computer and each other. In that, programming languages are much more like natural language. For instance, we could find parallels of dialects, accents, and registers in programmer communities. We will find eloquent programmers and those who struggle to get their meaning out there. Not understanding the complexity, often leads people to erroneous statements that one can learn to code in a day. But it is no truer to say that memorizing and German dictionary and learning the rules of German syntax will make you ‘speak German’ than to say that learning all the rules of C will enable you to write a computer program.

Answer 2 (score -1)

An integration of a ‘formalized’ subset of natural languages and programming languages should be possible. Don’t start with a full natural language.

In the programming language Python ‘everything is an object’, whereas there are a few standard objects. I think there are good reasons to extent programming languages, such as Python, with formalized natural languages by adding more standard objects. It would improve interoperability and save time when standard objects would be added that represents the concepts. As a first proposal for example: person, organization, and ISO standardized concepts such as countries, currencies, unit of measure, etc. For each formalized natural language a dictionary could be added with the name(s) and synonym names in that language that refer to the identifier of the object in the programming language. For each name a given language community could be added to enable and distinguish homonyms. Also standard kinds of binary relation could be added, such ‘is a kind of’ and ‘is classified as a’ (to represent instantiations). The Gellish family of formalized languages might be used to provide ideas.

71: What’s the difference between open/closed class words and functional/lexical categories? (score 11753 in )

Question

These two classifications seem to point to the same types of words.

Answer accepted (score 10)

They are two different distinctions that, however, often go together.

“Open vs. closed” is a diachronic distinction meaning how easy a set of expressions can be extended by new elements.

“Functional/lexical” is a synchronic distinction referring to the role played by the expression (e.g. serving as a full verb or as an auxiliary).

Furthermore, the functional vs. lexical distinction is a theoretical concept and depends on one’s theoretical assumption, whereas “open/closed” is more of an empirical observation.

As jlawler said, the two distinctions often amount to the same expressions as being a functional class makes it less likely for other elements to enter that class because of the important role they play for the grammar and their grammaticalized status.

However, there are instances, in which they two distinction do not line up. For instance, prepositions are considered to be lexical categories by many, but they are a rather closed class (for a discussion of the various positions see this article by Joost Zwarts.)


NB: Even if it is true that “functional expressions” have a more abstract, i.e. more grammaticalized meaning, they often still have meaning; e.g. the tense/mood expressed by the auxiliaries in English or many other languages. But there are also cases in which the meaning is more or less lost [do-support with negation, for instance]

72: Is it possible to become a native speaker of another language for someone that already has a mother tongue? (score 11672 in )

Question

Are there any studies/researches on fields like neurolinguistics(or any other fields) to allow people (can be via drugs, psycho training..whatever) to become a native speaker of another language? Is this kind of holy grail?

Answer accepted (score 5)

The term ‘native speaker’ usually refers to someone who has the language in question as their mother tongue, i.e. it is their first language, acquired as an infant (of course, someone may have multiple mother tongues and so be a native speaker of multiple languages). So under that definition it is not possible to become a ‘native speaker’ of a language later in life, after acquiring a different language as a mother tongue. However it is possible to reach a native speaker level of fluency in a language which is not one’s mother tongue, and this is how the highest level of skill in a language is commonly described in linguistics. So looked at this way it is possible to become a ‘native speaker’, at least functionally.

Some individuals seem able to acquire a high level of skill in a new language with seeming ease. These individuals are sometimes referred to as ‘rapid language learners’. A famous example in the world of linguistics is the late Ken Hale.

Answer 2 (score 1)

If you grow up with two parents who speak different languages, and they both speak their own language to you, then you will start to speak one before you speak the other. However, you will be a native speaker of both languages.

One example is my son, growing up in an English-speaking country. All of his friends speak English, and I myself am a native speaker of English. So he is also a native speaker of English, as much as I myself, or any of his friends. But he gained a degree of proficiency in his mother’s language long before he spoke a single sentence of English. So he is the perfect example of someone who had a mother tongue, then became native in something else.

Answer 3 (score -3)

Like you’re native language is both of your parent’s language. For example, a Bollywood actress named Katrina Kaif was born to a British mother and an Indian Father who was a business man in UK. So her native language is English (since her mother is originally from Britain) and Hindi or any language that her father speaks (since he speaks that language) both languages that her parent spoke and grew up with is her native language.

73: Could we rank languages, saying one is superior to the other? (score 11569 in 2016)

Question

Now and then I am faced with claims that language A is better than B, because of some grammar rules or words or …

But is there really a standard or a method to analyse a language from different perspectives and then rank it?

Possibly: (suggestions and not real questions)

  • Language is result of social consensus, so could we say less popular dialects or languages are inferior?
  • Could we rank languages by the average amount of times learners spend to be fluent in it? Ranking them harder and easier?
  • Could we from linguistics point of view decide on Semantics and grammar, etc; ranking languages on ambiguity, vagueness, illogic, redundancy, polysemy (multiple meanings) and overall arbitrariness?
  • Or simply compare languages based on their culture and historical background or any other possible measurement.
  • Or based on The average information rate conveyed during speech or written communication?

Finally if any such method exists will that be a partial comparison or will it be able to rank Language A as superior to B which will result in the best(or most efficient language) in the world?

Answer accepted (score 13)

No. Language is not something that can be “better” or “worse” or in any way objectively be “ranked”. This would scientifically be totally untrustworthy.

Going through each of your points:

Language is result of social consensus, so could we say less popular dialects or languages are inferior?

Popularity of a dialect or language depends on so many factors, most of which are purely cultural ones that are only allegedly in any way related to a language. Such prejudices arise from complicated social factors accumulating, but just because you hate a nation and like to make fun of their language because they sound different or “stupid” doesn’t make the language a bad one.

Could we rank languages by the average amount of times learners spend to be fluent in it? Ranking them harder and easier?

Not either. Even if it were possible to objectively rate learnability of a language (which isn’t so easily possible, because it depends highly on the individual speaker and also at which point you assume that a language is fully, halfway, … acquired), the size of the lexicon or set of grammar rules doesn’t make a language qualitatively better or less good. Both can have advantages or disadvantages - learning lots of grammar rules may seem difficult first, but it might help you expressing things way more precisely than with rather simple and widely applied grammatical constructions.

Could we from linguistics point of view decide on semantics and grammar, etc; ranking languages on ambiguity, vagueness, illogic, redundancy, polysemy (multiple meanings) and overall arbitrariness?

This still seems the most reasonable linguistic approach to me, but I am convinced that every language is equally powerful.
For example, there are languages which have evidentiality affixes (so you indicate directly at the verb whether you know what you are saying by having seen it yourself, concluded it by logical reasoning, only from hearsay, …) so you might think those langauges are more powerful, but still it is totally well possible to express the same in English by means of constructions like “The children must be awake, I hear them talking” or “They say she broke up with her boyfriend”.
In general, there is always a trade-off between economy of the language’s inventory and economy of the way things are expressed - a small vocabulary and simple grammar may be economic in the way the langauge is less bloated, but at the same time you then need more complicated ways to use it if you want to express the same thing (like English having no evidentiality affixes and making use of syntactic means instead). Conversely, if a language permits expressing a lot of slight meaning differences by explicit linguistic means, this is again uneconomic with respect to the size of the inventory the language needs to have then.
What would you define perfect balance on, what is the optimal ratio of language complexity and expression complexity?

Or simply compare languages based on their culture and historical background or any other possible measurement.

What do you mean by that? Japanese is a worthless language as opposed to Latin because the Romans had a huge empire two thousand years ago? Finnish is superior to English because American metal music sucks? This is totally absurd.

Or based on the average information rate conveyed during speech or written communication?

This relates to what I wrote about the third point: This would still come closest to a linguistically sensible way to measure a langauge “quality” based on efficiency, but since there is always a trade-off between complexity of the language’s lexical and grammatical inventory and complexity of the language’s expressions to express the same amount of information, there is no objective way to determine the “ideal language”.

Linking up with what @A. Toumantsev commented (thanks for the amendment):
It is indeed possible to measure single parameters like estimated size of the lexicon, complexity of inflectional paradigms for certain classes of words or text length in relation to the amount of information conveyed as the paper you mentioned did, and it is as well possible to compare these reults between selected languages.
But it is not possible to derive any sort of objective “ranking” from such comparisons; there is no linguistically motivatable judgement about something like superiority or inferiority of a natural language, neither based on features that are actually measurable and even less on solely socially motivated and therefore linguistically completely irrelevant individual or cultural preferences.

You can not just “rate” a language. You can rate something according to to which amount it suits the purpose it was designed for, possibly also with which efficiency it does so. But language wasn’t designed by anyone, language evolved over thousands of years from humans simply using the language, and the purpose which language is meant to fulfill is communication - which does work, in any language, regardless of whether you like the sound of nasals in French or consider a country culturally surperior to another because it has managed to kill more people in the past. (Not presuming this is what you think! Just exaggerating to show why I think this is so absurd.)
Such individual emotions have nothing to do with serious linguistic science, and I encourage you to distrust anyone who makes assertions like “This language is so primitive, just like the people who speak it” or “See, I’ve been learning that language for two years now and I still keep making mistakes, this language is complicated and useless” or “Well, I think we all agree that Swiss German sounds just ridiculous, real German is so much better” or “Oh no, English grammar is so complicated, who even invented those rules, it is so obvious that my mother tongue does it the way that makes way more sense, any linguist will tell you so”. No, just.. no ;)

Answer 2 (score 16)

No, this is not possible.

As already said by lemontree, most of these points are just totally subjective criteria that you can’t possibly rate impartially.

The only points where one could see some chance to come up with an objective rating are the 3 and 5: ambiguity and conciseness. Indeed, as the paper you’ve linked shows, it is possible to empirically compare information content by comparing, for certain sample texts, how much information they pack.

Trouble is that’s there’s not really a reliable way to tell what information content even means. You can in any language express a given information in many different ways. Finding the most concise way is an uncomputable problem, even for rigidly specified programming languages (this is called Kolmogorov complexity). It’s even more hopeless for natural languages, where it’s often not even possible to say whether two samples express the same information. Hence you also can’t judge whether language A has more ambiguity than language B, or just more subtle nuances.

Obligatory XKCD:

XKCD comic on Kolmogorov complexity in natural language


It is possible to measure the total information content of a binary data stream, of of a physical signal channel. But this has little relation to the actual information content in languages, except by giving an absolute upper bound to what amount of information can theoretically be contained in printed text. Actual language gets nowhere close to that bound.

Answer 3 (score 13)

No, this is not possible.

As already said by lemontree, most of these points are just totally subjective criteria that you can’t possibly rate impartially.

The only points where one could see some chance to come up with an objective rating are the 3 and 5: ambiguity and conciseness. Indeed, as the paper you’ve linked shows, it is possible to empirically compare information content by comparing, for certain sample texts, how much information they pack.

Trouble is that’s there’s not really a reliable way to tell what information content even means. You can in any language express a given information in many different ways. Finding the most concise way is an uncomputable problem, even for rigidly specified programming languages (this is called Kolmogorov complexity). It’s even more hopeless for natural languages, where it’s often not even possible to say whether two samples express the same information. Hence you also can’t judge whether language A has more ambiguity than language B, or just more subtle nuances.

Obligatory XKCD:

XKCD comic on Kolmogorov complexity in natural language


It is possible to measure the total information content of a binary data stream, of of a physical signal channel. But this has little relation to the actual information content in languages, except by giving an absolute upper bound to what amount of information can theoretically be contained in printed text. Actual language gets nowhere close to that bound.

74: Evidence for age cutoff in foreign accent acquistion (score 11524 in 2012)

Question

Steven Pinker in “The Language Instinct” claims that there is strong psychological evidence for the existence of a sharp age cutoff for the ability to acquire a flawless foreign accent (I may dig up the exact reference, if needed). In other words, there is a fairly narrow age threshold (around 20-odd years) below which exposure to a foreign accent is more or less sufficient to acquire it and above which it’s physiologically impossible. Has this phenomenon been confirmed or refuted by any systematic study?

Answer accepted (score 8)

One of the reasons the feral children data are difficult to assess is that the brains of these children are often underdeveloped or have developed differently because they weren’t stimulated with language at a young age. (For a more scientific explanation of the how of this, see Curtiss et al 544-545+.) As I mentioned in my comment to Askalon, while that data supports the idea that there is a critical period for acquiring a faculty for language, it doesn’t say much about what factors influence how successful you will be in acquiring native-like competence in a second language. Although age is often speculated to be a factor in acquiring native competence, estimates on the cutoff have ranged from as low as 3-4 to as high as puberty.

Infant Studies of Phoneme Perception
An important question to keep in mind is, why should skill in acquiring a new phonological system decrease because of age? Most cognitive abilities get remarkably better with age, assuming typical development. Of course, there are obvious factors like motivation (generally not as high for L2 learners), phonological interference, or different method of acquisition (generally by analytical study and reflection than immersion for L2 learners), but the biggest theory seems to be currently that it is because, while young infants are like a phonological tabula rasa, after exposure to their native language, they narrow their focus to only pay attention to the contrasts which differentiate meaning – what are traditionally called phonemes. They lose the ability to learn how to produce and differentiate any sound.

Even very young children show decreased sensitivity to contrasts not present in their native language. For example, Hindi has many contrasts that English doesn’t, for example differentiating aspirated and unaspirated stops like [pʰal] and [pal]. Werker and Tees (1984) trained infants to expect a reward (a little puppet show) when they heard aspirated syllables (this is called a conditioned head turn technique). Then, they tested them, playing aspirated and unaspirated stops and measuring how often the infants turned at the appropriate times. If they failed to turn after hearing an aspirated stop, this was counted as a miss, and if they turned after hearing an unaspirated sound, this was counted as a false alarm. At 6-8 months, infants scored well – they had fewer than 2/10 misses or false alarms – while those in the 10-12 month group did remarkably poorly – approximately only 2/10 correct hits. I cannot find the citation, but I have heard vocalic categories are cemented even earlier. And keep in mind that this is before these infants are even babbling!

Immigrant and Foreign Language Learner Studies
There have been studies looking at the acquisition of foreign language by people of varying ages, or by immigrants with varying age-of-arrival, and some (Johnson and Newport 1989) support the idea of a critical period and others (Snow and Hoefnagel-Höhle 1978) do not. It is important to note that these studies often look at both knowledge of grammar and pronunciation and I have unfortunately not found any studies specifically related to accent – though I will add them if I come across them. These studies generally rely on a battery of tests to assess grammatical competence (e.g. grammaticality judgments) and native speakers to rate the degree of the subjects’ accent. In general, it seems that a critical period for phonology is much more supported than a critical period for grammar – acquisition of grammar is much more linked to cognitive ability in adults, although in young children, cognitive ability has no effect (e.g. Flege et al 1999, DeKeyser 2000).

Conclusion
So yes, there does seem to be evidence for a cutoff in acquiring native-like phonology, from immigrant studies and SLA studies backed up by infant perception studies. What I would take from this, though, is that in addition to lost competence in implicit learning mechanisms, when we already have one language to work with, we will rely more on overt analysis and comparison to learn a foreign language. And while grammars are very good at detailing correct syntax and morphology, they are often abysmally poor and lazy at instructing learners in pronouncing and distinguishing foreign phonemes. (This is because they are often not written by linguists.) Add to that the fact that having an accent is not terribly stigmatized (but poor grammar is) and it’s no surprise that L2 learners will often not develop a great accent.

Answer 2 (score 7)

Native “accent acquisition” means successful acquisition of the language’s phonology (i.e. the system of sounds in a language). The cutoff you’re referring to is the critical period hypothesis, which claims that there’s a critical period (from birth up to around 7-ish or puberty, give or take) during which a person is capable of acquiring a language and achieving native-like competence, but after this critical period it becomes very difficult or impossible to achieve fully native-like competence. It’s thought that there isn’t one single cutoff point though, but rather that it varies depending on what part of language you’re talking about: phonology is thought to have the earliest cutoff age, while syntax is thought to have the latest (it’s much more common to see non-native speakers with excellent syntax but errors in their phonology than vice versa). The phonology is therefore most often the place where unsuccessful native-like acquisition most often manifests itself, in the form of a non-native accent.

So to get to your question about evidence for this. There is some evidence, but it’s limited, and some of the proposed evidence has been criticized as not valid. There are some cases of feral or deaf children/adults who grew up without a language (surpassing at least part of the critical period before they were discovered). These people are often used as evidence that the difficulty with successfully acquiring a language (including phonology) is biologically linked to age. It’s rather clear that normal adults have trouble acquiring a new language and gaining native-like competence as well, but one could claim that that’s perhaps because of interference caused by the language they already know. With feral or non-lingual deaf people though, they don’t know any language (so there’s clearly no interference from a native language), so some linguists claim that their lack of or limited success in acquiring a language must be linked to their age.

A frequently mentioned case is that of Genie, a girl that was brought up in isolation by abusive parents and never learned a language as a child. She was found at age 13, after which a linguist worked with her and she started to learn English. She was never able to learn English beyond a rudimentary level. Studies with deaf children who started learning a sign language later in their childhood (and having never learned the spoken language of their parents) have also shown a correlation with age and the level of their success in their acquisition of the language.

Some of this evidence has been criticized. For example, feral children such as Genie have also obviously suffered traumatic experiences that have affected them psychologically, which could be a factor in their unsuccessful language acquisition. Cases of late learners of sign language provide better evidence because they were generally well cared for, despite lacking a language.

Answer 3 (score 2)

I believe that the existance of a cut off period is imposible to measure due to the large number of factors affecting language accuisition. As an English language teacher and a person who has very near native competence in a foreign language, I believe that older people are less likely to achieve native competence in pronunciation because to mimick a native accent perfectly, in the majority of cases, would necessitate presenting oneself as having a different identity to that you have been raised with.

Children are less concerned about changing their identity than adults are and will more willingly mimick a native accent.

That is why they aquire native pronunciation much more efficiently - and the same is true for adults who are not shy or ashamed to assume a different identity - they also master foreign accents.

75: When does copula absence occur in African-American Vernacular English? (score 11399 in 2011)

Question

In what contexts can the zero copula occur in African-American Vernacular English? What rules govern its use—for example, what makes she runnin' more likely to be acceptable than ?she a runner? Some of what I know:

  • Some (I think Labov, notably) have proposed that copula absence in AAVE occurs where SAE would allow contraction, but there are counterexamples (*"How old you think his baby ’s", How old you think his baby is, how old you think his baby), so this does not fully explain the context
  • As with copula absence in other languages like Russian, AAVE zero copula depends on tense (copula absence usually occurs in present tense: *she home yesterday, she home today)
  • I’ve read that copula absence is much more common if a VP follows than a NP (perhaps Labov, again); a paper on AAVE origins mentions this, that zero copula is:

    most frequent with a gon(na) future or a progressive (she Ø gon tell him; she Ø walking), least frequent before a noun phrase (he Ø a man)…
    The general pattern for AAVE is given in (1), with predicate types listed according to increasing rates of copula absence:
    (1) NP < Loc < Adj < V-ing < gonna

  • Copula absence may not occur for habitual actions because habitual be, replacement of is with be, may occur. Habitual be is used for things that are ongoing or usual as opposed to a moment in the present (Cookie Monster be eating cookies, even if he isn’t at this moment).

Are there other predictions we can make about when copula absence can occur in AAVE? Does any cross-linguistic approach to copula constructions provide clues?

Answer accepted (score 12)

Apart from the “present tense” rule you mentioned, I found this page where you can see a rather good explanation about AAVE and its rules. I’ll paste the part regarding the copula (as usual the * indicates something ungrammatical):

The AAVE copula follows complex rules:

  1. (Rule 1) Unstressed copula is never omitted:

    • “There already IS one!” | “There already one!” | “There already BE one!”
  2. (Rule 2) Copula (like other auxiliary verbs) is never omitted at the end of a phrase:

    • “Couldn’t nobody say what color he is.” *“Couldn’t nobody say what color he.”
  3. (Rule 3) The copula in the special AAVE remote present perfect (not found in Standard English) is never omitted:

    • “I BEEN know that guy” (this means `I’ve known that guy for a long time’)
  4. (Rule 4) The copula is never omitted when negated:

    • “I ain’t no fool” | “I no fool” | “I be no fool”
  5. (Rule 5) The copula is never omitted in its infinitival form “be”:

    • “You got to be strong” | *“You got to strong”
    • “Be nice to him!” | "*Nice to him!" 1
  6. (Rule 6) The copula in the special AAVE habitual aspect (not found in Standard English) is never omitted:

    • “He be singing” (this means `he habitually sings but he is not necessarily singing right now’ )
  7. (Rule 7) The copula is never omitted in the past tense:

    • “I was cool”
  8. (Rule 8) The first singular form of the copula (“am” or “`m” as in Standard English) is never omitted:

    • “I’m all right” | “I all right” | “I be all right”
  9. (Rule 9) The copula is never omitted in a confirmatory tag question:

    • “I don’t think you ready, are you?” | “I don’t think you ready, you?” | “I don’t think you ready, be you?”

You can also check the Grammar section in this page, or these documents: “Copula Variation in African American Vernacular English” and “African American vernacular English: features, evolution, educational implications”.


1: As noted in the comments, this is not an infinitive, but an imperative. However, I guess the author was referring to the form and word “be” in general and not to the particular tense.

76: dear, ear, fear, gear, hear, near … why are bear/pear pronounced differently? (score 11381 in 2017)

Question

In class last week we were looking at pronunciation … and something caught me out. Why are some words spelt very similar to multiple others, yet pronounced so differently?

Is it because of their origin, or because they are different types of thing (pear and bear are living, the rest aren’t), or something else?

Are there any strange rules I could learn to help correctly pronounce a new word?

Answer accepted (score 15)

This question isn’t just about spelling, because when these spellings were standardised, it is highly likely that all these words ending with “-ear” were pronounced in the same way. However, gradually between the 15th and 17th centuries as standardisation was setting in, the Great Vowel Shift occurred, changing the pronunciation of the vast majority of the vowels in English.

This group rhyming with “ear” is usually called the NEAR lexical set in modern English, and was pronounced /eːr/ in early modern English; in Received Pronunciation this is usually /ɪə̯/, often smoothed to [ɪː]; in the General American standard, it’s usually /iɹ/.

A set which was distinct but similar in early modern English was that pronounced with /ɛːr/, as exemplified by the spelling “-are” as in the lexical set SQUARE. This is /ɛə̯/ in Received Pronunciation (smoothed to [ɛː]), whilst in General American it is /ɛɹ/.

If this is what time did to sounds of NEAR, why were words like bear, pear, swear, wear, as well as werewolf, ere, there, where (compare with the spellings of here, sphere and mere) so different, ending up joining the SQUARE set?

In fact, something analogous happened to great, break and steak. Were they to follow the regular pattern set down by eat and meat, they would have merged into the same set as meet /eː/ by the end of the 16th century, and then changed into /iː/, modern lexical set FLEECE by the end of the Great Vowel Shift. Instead, they only went as far as /ɛː/ by the end of 16th century, then went to /eː/ and then to modern /ei/, lexical set FACE.

These are all exceptions to the Great Vowel Shift. Notice that these exceptions represented by bear and by great appear to be closer to their older pronunciations than their lookalikes near and meat. It appears that the former pair resisted the Great Vowel Shift to some degree. But why they were able to resist it, and why they resisted it, are questions as yet unanswered. On a practical level then, these ones should be seen as exceptions and memorised (one particularly relevant one is the dual pronunciation of tear, which splits according to meaning).

One paper from 1962 complains specifically about how divergent the pronunciation of the spelling ea is in modern English, despite the fact that most of the words with this pronunciation are of Anglo-Saxon origin.

Answer 2 (score 2)

Michaelyus has given a good answer to the “why” part of your question. To reply only to the last part of the question: there are not any “strange” rules you could learn to help you to pronounce a new word. There are not even any non-strange rules. You do have to look up new words in a dictionary. Some things in life require effort.

77: What are the fundamental differences between Natural Language Processing and Computational Linguistics? (score 11175 in )

Question

I have a vague knowledge regarding those two fields, but I admit there are some fundamental concepts that I lack.

So, if we had to write down the actual differences between these two fields, what would they be?

I’ll suggest some points that I think the answers should cover for successfully describing each field in a complete and comprehensive way:

  • What it does and what it is about (also what it’s not);
  • Common misconceptions of the field (with consequent debunking);
  • Aims/objective of the field;
  • Tools/instruments and methods adopted by the field;
  • Subfields of each field (if any);
  • Any other points I might have forgotten.

Answer accepted (score 25)

I have a PhD in computational linguistics. I can tell you that NLP and CL are not two separate fields. Rather, CL is the superset that encompasses NLP.

In everyday CL practice, NLP focuses on the building of NL parsers and as such it is central to the CL field. CL as a field includes a lot more than NLP. For instance, you can study machine translation, knowledge representation, ontology engineering, text mining, information extraction, etc. all within the CL field. CL is a pretty broad thing and (unlike CS) is not primarily focused on theory. It is highly hands-on. Most theories in CL come from theoretical CS. When it comes to the nitty-gritty CL is the practical application of various algorithms for purposes of natural language processing.

You may occasionally encounter a reference to NLP (sans CL) within the field of CS. This is due to the fact that -originally- the generation of parsers served purposes beyond the confines of natural language (the way we mean “natural language” within CL). So, one could argue that NLP within CS is a slightly different animal than NLP within CL. In essence, it’s the same kind of object seen under slightly different light.

Answer 2 (score 16)

The above answers are all good. I’d like to offer another perspective that I learned while teaching digital libraries that draws on the analogy used in biology:

Computational biology = the study of biology using computational techniques. The goal is to learn new biology, knowledge about living sytems. It is about science.

Bioinformatics = the creation of tools (algorithms, databases) that solve problems. The goal is to build useful tools that work on biological data. It is about engineering.

To make the analogy for any field X, we thus have “Computational X” and “X-omatics”. In NLP/CL, NLP is the equivalent of “Linguamatics”.

I don’t really subscribe to the notion that CL encompasses NLP or vice versa. They both have a purpose. CL studies human language to computationally understand how we as humans have the capacity to produce and understand language. NLP takes a more pragmatic perspective and says that we wish to build systems that facilitate some language interface.

Answer 3 (score 12)

The difference is that Computational Linguistics tends more towards Linguistics, and answers linguistic questions using computational tools. Natural Language Processing involves applications that process language and tends more towards Computer Science.

However, the distinction between the two terms is fading and they are being used more and more interchangeably.

78: always | never | “all the time” - what kind of words are these? (score 11078 in 2016)

Question

always never “all the time”

They aren’t ‘expletives’, but they express a non-expiry. What word would describe this type of word?

Context : he never brings me flowers; he’s always late; you criticise me ‘all the time’.

Answer accepted (score 5)

I would say those are temporal adverbs.

To be precise, all the time can not be an adverb since “adverb” is a word class but this phrase is not a single word. And you asked about the “kinds of words” (i.e. word classes), but you can not assign a word class to a phrase.
Syntactically, this phrase, just like always or never (which are syntactically single-word phrases), functions as an adverbial, a phrase which modifies a verb or clause - but here we are at a different level, because this is a grammatical function, not a word class.

As you said, they are not expletives, because they are an inherent part of the sentence’s meaning.

Edit: You mentioned non-expiry - I don’t know of any specific term for that.
The classification among temporal adverbs is rather between time point/span - duration - frequency, possibly others, but apparently not whether the adverb expresses expiry, non-expiry, having expired (i.e. past events) or whatever.
I assume the reason is that such a classification wouldn’t be very meaningful, as there are types of temporal adverbs such as those expressing frequency that could not at all be classified w.r.t expiry - take the word sometimes; this is neither something that expires, nor something that doesn’t expire nor something that has already expired. So I don’t think there is an actual terminological classification w.r.t. expiry because this would only be applicable to a rather small set of adverbs.

Answer 2 (score 3)

Always and never are adverbs of frequency. The phrase all the time is a noun phrase.

We use all three of these items as (temporal) adjuncts, a term which refers to their syntactic function, in other words what job they are doing in the sentence rather than what word or phrase category they are.

In terms of semantics and their being ‘non-expiry’ terms, I think one term that is sometimes used to denote this is that they represent unbounded periods of time.

Answer 3 (score 1)

I think your question is not really about linguistics, but about rhetoric, especially after this comment:

I added more context, which will make it more apparent that, ‘exaggerations’, was the classification I was looking for.

Syntactically, exaggerations can be almost any part of speech, any class of word or phrase, any syntactic position. Your examples include two adverbs and one noun phrase. They’re all used as adjuncts modifying the verb or verb phrase—but an idiot can come up with an example where the subject of the sentence is an exaggeration, as I just did.

Semantically, exaggerations can likewise come from almost any category. Your examples are all temporal/frequentive quantifiers, but again, that isn’t true for “an idiot” above.

But rhetorically, there is a good answer: All of your examples are cases of hyperbole: the use of exaggeration as an intensifier.

What you literally mean is that the friend you’re discussing is often late. But you also want to get across the fact that you have strong feelings about him being often late. Saying that he’s “always” late when that isn’t literally true is one good way to do that; your listener will understand that you’re complaining about your friend’s habitual tardiness rather than just stating it as a fact.

To see how hyperbole works as an intensifier, compare “He’s late so damn OFTEN”. Adding “damn”, stressing “often”, and focus postposing are all intensifiers, and the rhetorical effect is similar to “He’s always late.”

And when I say “an idiot can come up with an example”, I don’t literally mean that; only a normally-intelligent native speaker with at least a bit of relevant education could do so. But using hyperbole intensifies the point: it’s not even close to true that only temporal-adverb-like adjuncts can be exaggerations.1


1. But, as with all rhetorical devices, there’s a risk of being misconstrued. The exact same exaggeration could be used to imply that you should have figured this out for yourself and not asked this question. I obviously didn’t intend at all—if I didn’t think you had a good question that deserved an answer, I wouldn’t have written one—but without that context, you might not have had any way of guessing that.

79: Why did early Indo-European languages seem to be morphologically complex? (score 11063 in 2017)

Question

Apparently there is a general trend that languages lose morphological marking over time. For example, according to this question PIE had 8 noun cases (nominative, accusative, genitive, etc), Latin 5, Romance 2 or even 1. Doesn’t this show that less inflection is more natural?

My question is why the early languages had many morphological distinctions at the first place. I mean, why did the old Proto-Indo-European folks invent 8 different words to call “a lion”, or dozens of words for “to eat”, etc. To me, it sounds more natural to call a lion a lion regardless of whether it’s nominative or accusative. What analyses have been made about this?

Answer accepted (score 40)

Really, it depends a lot on the region of the world you’re from.

The isolating nature of the English language and Chinese, and, to a slightly lesser extent, the rest of the Germanic languages, as well as the Romance languages, is certainly not a norm worldwide. Living languages, in fact, on average incorporate about as much information into their words as old indo-european languages did. Check out polysynthetic languages to see extreme examples of this.

Even so, old indo-european languages do seem to have rather complex rules, incorporating many seemingly different lexical forms, about noun inflection and verb conjugation, making them perhaps in a real sense “harder” to learn than agglutinative languages. I’ll try to address why by pasting together some hypotheses and some facts.

  1. PIE may have resulted largely from the influence of a Uralic superstrate on a NW or NE Caucasian or related language, or from the influence of both on a third language. Source.

  2. PIE quickly spread far and wide, probably multiple times, leading to

    1. a premium on a modicum of mutual intelligibility for trade, concomitant with

    1. a bunch of L2 (second-language) speakers exhibiting substrate pressure in the form of phonetic changes (e.g., loss of the laryngeals) and, more importantly, grammatical changes.

    2. At the same time, there is possibly a natural cycle of isolating->agglutinative->inflecting->isolating in languages. (Every step of this cycle has been observed, although the entire cycle has never been observed.) PIE was past an agglutinative stage and fit quite squarely within the “inflecting” paradigm when it spread.

I’ll mention one, perhaps rather conservative, deduction that can be drawn from this: verbal paradigms which once made sense (3) started making less sense (2b) but were kept (2a). Hence, we have to learn the principal parts of a verb when learning Greek or Latin. Most contemporaries of Proto-Indo-European probably had even richer inflecting/agglutinating structure, but probably fewer irregularities.

So many other conjectures can be formed, so I’ll leave the rest of the deductions to you, since it’s really hard to say that this results in any one particular effect rather than another.

Answer 2 (score 40)

Really, it depends a lot on the region of the world you’re from.

The isolating nature of the English language and Chinese, and, to a slightly lesser extent, the rest of the Germanic languages, as well as the Romance languages, is certainly not a norm worldwide. Living languages, in fact, on average incorporate about as much information into their words as old indo-european languages did. Check out polysynthetic languages to see extreme examples of this.

Even so, old indo-european languages do seem to have rather complex rules, incorporating many seemingly different lexical forms, about noun inflection and verb conjugation, making them perhaps in a real sense “harder” to learn than agglutinative languages. I’ll try to address why by pasting together some hypotheses and some facts.

  1. PIE may have resulted largely from the influence of a Uralic superstrate on a NW or NE Caucasian or related language, or from the influence of both on a third language. Source.

  2. PIE quickly spread far and wide, probably multiple times, leading to

    1. a premium on a modicum of mutual intelligibility for trade, concomitant with

    1. a bunch of L2 (second-language) speakers exhibiting substrate pressure in the form of phonetic changes (e.g., loss of the laryngeals) and, more importantly, grammatical changes.

    2. At the same time, there is possibly a natural cycle of isolating->agglutinative->inflecting->isolating in languages. (Every step of this cycle has been observed, although the entire cycle has never been observed.) PIE was past an agglutinative stage and fit quite squarely within the “inflecting” paradigm when it spread.

I’ll mention one, perhaps rather conservative, deduction that can be drawn from this: verbal paradigms which once made sense (3) started making less sense (2b) but were kept (2a). Hence, we have to learn the principal parts of a verb when learning Greek or Latin. Most contemporaries of Proto-Indo-European probably had even richer inflecting/agglutinating structure, but probably fewer irregularities.

So many other conjectures can be formed, so I’ll leave the rest of the deductions to you, since it’s really hard to say that this results in any one particular effect rather than another.

Answer 3 (score 15)

This puzzling mystery is completely resolved when you realize that recursive embedded grammar is a feature that is not present in ancient languages, and appears only well after the evolution of writing. When you need to handle recursion the case-systems and complex morphology of pre-literate language becomes unnatural.

All modern fully embedded grammars are essentially the same— they are described by a context-free replacement grammar which allows adjectives, adverbs, and verb arguments to be replaced by multi-word phrases which serve the same role. The reason is not that this is a fundamental defining property of language. The reason is because the qualitative ideas behind context-free grammars were invented in Greek and Roman times, and Cicero and Aristotle explicitly prescriptively advocated writing this way.

This type of embedded recursive grammar is extremely successful at producing convenient written expressions of complex ideas in a short, but not unduly taxing, form. Due to its convenience, all of the old-world languages adopted the recursive grammar of Cicero et al., one by one, as they acquired bi-lingual speakers of European languages and translations of European recursive works. Once you have multiple embedding, it is very difficult to stop doing it, and you can easily invent a way to do it in any language.

This is the reason that in modern European languages, and in those of India, Asia, or Africa, recursive clausal embedding works in almost the exact same way, a way described well by a context free grammar, with potentially unlimited center, initial, and final embedding. This is like a virus, spreading via bilingual speakers, and only languages which were isolated from Europe by oceans were immune.

Thankfully, a few languages maintained their non-embedded form, due to cultural isolation, most notably Piraha (which has no center embedding, as described in the revolutionary work of Everett) and Warlpiri (which has no full recursive grammar as well). The Native American language as a rule did not have a full context-free structure, and neither do ancient Sanskrit, ancient Chinese, ancient Hebrew, or any ancient language other than (remarkably) ancient Greek and Latin.

This idea is explicitly described and argued by Fred Karlsson in “Constraints on multiple center-embedding of clauses”.

Cicero’s Remarkable Invention

Embedding with context-free recursive structure became so ubiquitous, that every literate person learns this structure before adolescence, and forgets that it did not come naturally. This structure was invented, not discovered, and it was invented by structure-conscious writers in Greek and Roman times. It spread by emulation to other languages, sometimes by conscious effort of literary folks to popularize this form of expression.

This means that scholars, who for obvious reasons structurally tend to be the most highly literate members of society, all see that every language that they learn has a roughly isomorphic recursive grammar that describes how to produce complex sentences, a grammar which is fundamentally based on a context-free replacement generative grammar. This comes as a shock— it is a jarring realization which begs for an explanation

I am a native Hebrew speaker, and I remember learning English as a child. I remember that I was miserable for a while, because everything was new. When I finally learned enough vocabulary to make complex sentences, I was immediately struck by the fact that these sentneces, unlike simple constructions, are word-for-word identical to Hebrew complex sentences. I didn’t have to learn anything more! I knew immediately how to produce any complex sentence without effort.

It is the same as you learn a new computer language. After learning a few function words and idioms, the structure of the complex expressions is immediately apparent, if you already know another programming language. The reason is that computer languages are all based on the notion of context-free grammars, explicitly abstracted from natural language by Chomsky and Schutzenberger. In the 1950s, Noam Chomsky gave a definition of a language grammar which made the embedding structure the primary ingredient. A language grammar is context free when it allows an arbitrarily deep center-embedding, and Chomsky hypothesized that all the world’s languages are described by context free grammars because the original human language was described by a context free grammars.

This is true of all old-world languages, and without a historical appreciation, just by looking at the structure of the languages, one can mistakenly come to believe that this structure is very ancient, and the common source is in prehistoric times. This fallacy is so compelling, that it was unchallenged dogma until Everett’s work of 2005.

A Language evolution fallacy

If you see that all birds share a hole in the hip-bone, and all dinosaurs do, you are justified in concluding that birds and dinosaurs have a common ancestor which had a hole in the hip-bone. The reason is Darwin’s evolution— this was the main prediction of the theory. The characteristics of the common ancestor are preserved by all descendents, and if you see two species with a common trait, you can be pretty sure that it was because they evolved in the same family.

This explains why life-forms organize in a heirarchical cladistics tree. Languages also come in a cladistics-like tree, and this is because the transmission of language is much like the transmission of genes, it preserves certain word-sounds and structures in a diverging evolving form.

But unlike evolution, bilingual speakers can transmit nontrivial structure horizontally between very distantly related languages. So that in languages, you find creoles, which in biology would be like an oak-tree/lizard hybrid. You find languages like English whose vocabulary is split almost 50/50 between Germanic and Latin roots, and which are clearly Germanic with enormous Latin influence. You find completely alien loan-words in English like “Kimono” and “Feng-Shui” which come from some of the most distantly related languages in the world.

But most significantly, grammatical constructions are also shared. The fact that all languages recurse the same way suggests one of two things:

  1. The common ancestor of all languages recursed this way
  2. Recursion was invented at one spot, and spread horizontally.

Experience with Darwinian evolution suggests the first option, and this is Chomsky’s hypothesis. It’s dead wrong. The correct answer is number 2.

This means that every one of the world’s languages (except for Greek and Latin and their descendents) has a grammatical discontinuity, the moment when it became recursive. This is usually something you can see— it is a sharp revolutionary advance on the past, and it leads to a golden-age of literature in the coming centuries.

Morphological pressures in pre-recursive and post-recursive languages

In pre-recursive languages, there is no fundamental reason to put the preposition marker before the word, and a very good reason to put it after— there is already a definite/indefinite marker before the word taking up space.

If you say, in Hebrew “the mountain”, you say “ha-har”, which in syllable terms, puts a syllable before the word. Now if you say “I walked to the mountain” “halachti la-har, you are putting two syllables before the word. In Hebrew, the two syllables are merged to one,”la" which is much like “to-the” becoming “t’a”, as in “I walked t’a mountain”. But ignore that.

A word has two ends, and it is much clearer to put the definite marker on one end, and the case-marker (the preposition) on the other end, so that they don’t have to fight. This makes best use of the phoneme space, and this is the preferred solution.

“I walked the-mountain-ward” (Halachti ha-har-a)

But this solution is the casing solution, and it interferes with embedding in a way described in the body of this question: Did case systems dissappear to make embedding easier? . When you replace the mountain by an embedded phrase, it puts a syllable in the middle of the embedded phrase in such a way that it is difficult to shear off.

This puts pressure on languages to shed case systems and other morphological transformations in favor of stand-alone function words with a common syntax, beginning at the date that recursive embedding becomes common among all speakers of the language.

Ancient Embedding styles

Just so that I am clear— all languages embed conceptually, they only don’t embed grammatically. The concepts in a non-recursive language are not simpler than in a recursive language, they are just expressed more verbosely.

I want something. Namely, to be clear. All languages have conceptual embedding. They do not need grammatical embedding, not necessarily. If a language has no embedding, then the concepts are not simpler. You just say things more verbosely.

So there is no implication that speakers of Piraha are somehow less than human, or not fully capable of philosophizing, or anything like that. These ideas only come if you associate grammatical recursion with language, an association which is false.

80: Is there a language without gender in third person pronouns? (score 10975 in 2015)

Question

English (as most Indo-European languages) has a gender-neutral third person pronoun, it, but it is typically not used for people; if one wants to be gender neutral, one is often stuck using he or she.

Is there group of languages which make no distinction between gender in third person pronouns, and has no “gendered” pronouns?

That is, instead of saying “Talk to him.” they would instead say “Talk to that/it.” If they wanted to make a distinction between gender, they would need to define the subject/target rather than use a pronoun, e.g. “Talk to the man.”

Answer accepted (score 29)

The World Atlas of Language Structures has a feature about gender distinctions in personal pronouns. According to it, there are at least 254 languages without gender distinctions and even 2 with gender distinctions in 1st and 2nd, but not 3rd person pronouns (Iraqw and Burunge).

Answer 2 (score 22)

There are many such languages. Examples include Turkic languages (as kiyoshigaang’s answer mentions), Uralic languages (such as Finnish, Hungarian, Estonian), spoken Mandarin and Cantonese, and doubtless many others. Languages which lack grammatical gender generally will usually lack gendered third-person pronouns specifically (although there are exceptions to this, such as English).

Answer 3 (score 7)

Turkish doesn’t have gender in third person pronouns. For example, if one says “Onu, okulda gördüm.”, it can interpreted either “I saw her at school” or " I saw him at school".

81: What is markedness? (score 10968 in 2013)

Question

I am confused about the meaning of markedness. From the Wikipedia page I read:

The dominant term is known as the ‘unmarked’ term and the other, secondary one is the ‘marked’ term. In other words, it is the characterization of a “normal” linguistic unit (i.e. the unmarked term) compared to the unit’s possible “irregular” forms (i.e. the marked term).

I believe that “dominant” means more frequent, but is it dominant inside one language or considering all natural languages together?

Answer accepted (score 14)

An example, though probably not a very good one, is “lioness” vs. “lion”. “Lion” can refer to either male or female lions, whereas “lioness” refers to only female lions. In this example “lioness” is marked and “lion” is unmarked. This is because “lion” is the more general term.

Another example is “young” vs. “old”. Here “old” is less obviously unmarked, while “young” is marked. This is a better example. Normally, when I ask a question, I ask “How old is Jim?”, because “old” is the unmarked term. This is rather general. I do not appear to be making a presumption about Jim. Conversely, if I asked “How young is Jim?”, that presupposes that Jim is young. This is because “young” is the marked term.

So marked vs. unmarked means that two terms with contrasting meaning are asymmetrical in their usage and meaning, and that one of them is more general and dominant.

Of course markedness varies across languages, there are surely some in which “young” is unmarked.

Answer 2 (score 6)

In 2005, Martin Haspelmath published a paper called “Against Markedness (and what to replace it with)” which defines twelve different senses of marked, markedness and arguing that this polysemy obfuscates rather than helping and should be eliminated. Unfortunately, he seems to have been a voice crying in the wilderness so far.

The twelve senses:

  1. Trubetskoyan specification for a phonological distinction
  2. Specification for a semantic distinction
  3. Overt coding as opposed to zero
  4. Phonetic difficulty
  5. Morphological difficulty / unnaturalness
  6. Conceptual difficulty
  7. Rarity in texts
  8. Rarity in the world
  9. Restricted distribution
  10. Deviation from a default parameter setting
  11. and a multidimensional correlation between any or all of these.

Answer 3 (score 3)

There is a forthcoming volume edited by B. Samuels entitled Beyond markedness in formal phonology which addresses the question. Basically, this is a case where the term is taken to be primary, and the referent is taken to be “open to discovery” – which is a nice way to say that it doesn’t have a fixed meaning, and depending on school of thought, it refers to unrelated facts. Trubetzkoy’s imported the concept of a “mark” into linguistics, and used it to refer to a distinguishing property that a linguistic unit has. In phonology, it was intimately tied to privative features, where a voiced consonant might “have the mark” and voiceless ones might lack it. Subsequent developments retained the term but radically altered what it was about. This was especially necessary when Trubetzkoy’s privative analysis was replaced with Jakobsonian binary features, whereby all oppositions become equipollent. The nature of being “marked” then had to change, and it changed in the direction of being “more basic” versus “less basic”, with a presumed acquisitional bias in favor of the “more basic” value.

Greenberg explicitly tied the notion of “marked” with frequency of occurrence, and Chomsky and Halle in SPE followed up on this assumption by creating a formal theory of “markedness” whereby rules were simpler to express if they produced unmarked results – consequently, the thinking was, unmarked outputs will be more frequent. Generally speaking, since that time the term has been taken to mean “happens most often”, and the puzzle then is, what is the nature of the fact that causes something to be more frequent (i.e. “unmarked”). SPE held that it was a list of context-sensitive specifications that come “for free”. More recently, “markedness” has been taken in OT to mean “a configuration that is to be avoided”.

It is some note, IMO, that there is negligible interaction between the concept of markedness in phonology, and its use in semantics.

82: What is the difference between a diphthong and a glide? (score 10697 in 2011)

Question

It’s easy for me to imagine the difference, but hard for me to conceptualize it. I guess one involves two vowels and the other involves a consonant, right? Am I on the right track, or is there a more precise definition?

Answer accepted (score 5)

If a language has a sequence of two vocoids, and one of them is high, there will be nothing in the signal to tell you whether you are dealing with a vowel+glide sequence or a vowel+vowel sequence. Three common sources of supplementary information (in decreasing order of empirical weight) are:

  1. Cases where there is a lexical contrast between the vowel+vowel and the vowel+glide sequence. In Vietnamese, for example, there are lexical contrasts where the relevant difference is /-ai/ vs. /-aj/, or /-ou/ vs. /-ow/.

  2. Morpho-phonological alternations or selectional restrictions that are sensitive to a vowel-consonant distinction. Two examples are given in Dixon (2010:196–9).

  3. Phonotactic considerations. Is a statement of the overall phonotactic patterns of the language simplified by choosing one analysis over the other?

There will also be cases where no evidence is available. In these cases it will have to be admitted that there are no language-internal grounds for classifying the pattern in question.

Dixion, R.M.W. (2010) Basic Linguistic Thoery, vol.1. OUP

Answer 2 (score 4)

The simplest distinction is

  • a glide is a single phoneme that is somewhere in the middle of the continuum between consonant and vowel, but is non-syllabic (by itself).

  • a diphthong is a sequence of two vowels, where one of them is often articulated just like a glide. It is questionable whether this two-vowel combination is a single phoneme or two separate phonemes in sequence, but there are two ‘things’, one of which is glide-like and the other a full vowel.

I hesitate to call the glide-like phone in a diphthong a full fledged glide only out of tradition, even though they are seemingly identical articulations.

The primary difference then is that a diphthong includes a glide as one of two constituent parts.

Answer 3 (score 3)

Rather than a “vowel + consonant”, a glide is a semivowel (it’s a synonym). I posted once about them on here, for the question “Are there semivowels besides /w/ and /j/ and which are most common?. You can read more in the links

A diphthong is obtained by the combination of two vowels occurring in the same syllable. This combination varies according to the language and the rules that belong to the language itself.

  • In Spanish for example, a diphthong is obtained by joining a closed vowel (i, u) with an open vowel (a, e, o).
  • In Italian it’s obtained by joining the vowels (i, u) in unstressed position to other vowels in stressed position, or by joining the same vowels, (/i/ and /u/ together), where one of them — in this case — can be in the stressed position.

And so on… You can read about more languages by clicking on the link.

83: Why are many ancient languages so complicated compared to many modern languages? (score 10684 in 2014)

Question

Many ancient languages have a structure that is more complex than that of the “respective” modern languages. Modern languages like English have simpler structure, without case, gender or declination, compared to ancient languages spoken in the same area, such as Latin. Reading Latin texts, I wonder if the ordinary Roman really communicated using such complex constructions. Or perhaps everyday language was simpler than the written one?

The same is true for ancient Greek and Sanskrit. So why have these languages become simpler over time? Were the constructions we find in ancient written texts usually used in spoken language?

Answer accepted (score 16)

No language is “more simple” than other languages. Old English had just 2 tenses, present and past, now there are 16 of them, future and future-in-the-past forms developed over the time, the continuous aspect appeared, the perfect appeared, so the verbal system acquired much more forms than it used to have. On the other hand, the nouns lost the gender and cases. It is always like that, if something is lost, some new features appear to compensate the loss.

A good example of a language that gets more and more complicated over the course of time is Chinese. The Old Chinese had no parts of speech, no number, no tense, it was a monosyllabic isolating language. Now Chinese is developing in the direction of getting more complicated, its words are mostly two-syllable now, parts of speech appeared in it, tenses begin to appear, etc.

And some languages can become more simple during some period, and then again get more complicated. Hindi is like that, first it lost all the cases which were in Sanskrit, but later it developed a new system of cases.

Answer 2 (score 9)

Many modern European languages are as complex as Latin, Ancient Greek, or Sanskrit. I’d point out Lithuanian but most Slavic languages are typologically similar to the mentioned ancient ones. And yes, native speakers use all constructions their language provides (all languages change, of course, so there are archaic constructions but it has nothing to the with complexity).

BTW no human language is primitive, English has simpler morphology than Latin but it’s more complex elsewhere.

Answer 3 (score 4)

We don’t know why Latin, ancient Greek, and Sanskrit had the grammatical systems that they did, or why modern languages related to these have developed different grammatical systems.

It’s difficult to measure the overall “complexity” of a language. The other answers give some idea of how people have tried to address this question, but as far as I know there is still no consensus between all linguists about whether we can meaningfully compare the complexity of different attested natural human languages. Some people have argued that all human languages are of similar complexity; other people have argued that some languages such as pidgins and creoles actually are less complicated overall than other languages.

As other people have pointed out, complexity is not only a matter of conjugations and declensions; we can also see complexity in areas like syntax or vocabulary, which might be harder to measure. However, the idea that languages that are simple inflectionally “compensate” by having more complex syntax is as far as I know not particularly well-supported, either theoretically or empirically. (If anyone does know of a compelling argument for this hypothesis, please let me know—I am certainly not an expert.)

As Atamiri pointed out, there are languages spoken in the present day that have systems of inflection that look as complicated, or more so, than the ancient Latin systems of inflection. So it does not seem to be impossible for ordinary people to speak inflectionally complex languages in the present day. This suggests that it is also not impossible that ordinary Latin speakers in the past used without difficulty some of the constructions that seem “complex” to Latin learners today. (We know that there were differences between refined and everyday Latin, but the things in refined Classical Latin that native Latin speakers had trouble with aren’t necessarily the same as the things that modern non-native learners of Latin have trouble with.)

There is no obvious universal (by which I mean, worldwide and exceptionless) trend of languages developing simpler inflectional systems, although there might be some kind of non-obvious or non-universal trends along these lines. I don’t think we have any strong and well-supported explanations of the possible causes of these possible trends. I think people fairly often point to things like contact between different languages or different varieties of a language, and incomplete acquisition as potential causes of “simplification” in the grammar of a language.

84: Why is “ß” not used in Swiss German? (score 10643 in 2018)

Question

What are some of the historical reasons why the orthographic symbol ß is not used in Swiss Standard German and “ss” is used instead?

Answer accepted (score 52)

It is because of the typewriter. A Swiss typewriter needs to support three languages: German, French, and Italian. Therefore on the Swiss typewriter, there was no ß key. It also has only lowercase umlauts ä, ö, and ü. A picture of a Swiss typewriter can be seen here.

The lack of that key has led to a subsequent deprecation of the ß overall.

Answer 2 (score 26)

The Swiss government has an explanation on p. 18. One contributing factor is typography, namely the rise of use of the Antiqua font, which was claimed to not include ß. I have no evaluation of the truthiness of that claim, for the relevant historical period, i.e. prior to 1901. It is certainly the case that its shape in Antique was not uniform.

The rules for using the letter have been complicated and much of the 1996 German spelling reform was about rules for s. As to why Switzerland was earlier and more radical in eliminating ß, this may be a cultural matter. Pairs like Flosse (fin), Floße (rafts), Buße (penance), Busse (buses) are rare and contextually not likely to lead to confusion. One predicts that Masse (mass), Maße (dimensions) might still be distinguished with ss/ß.

85: What are the chief advantages & disadvantages of describing sentences with dependency vs. phrase structure trees? (score 10523 in 2014)

Question

What are the chief advantages & disadvantages of describing sentences with dependency vs. phrase structure (aka. constituency) trees?

From what I’ve read, dependency grammar trees lack phrase nodes and mark everything as dependent on the verb.

Phrase structure trees start with the highest constituent then analyzes it into phrases like NP, VP, etc., as I’m sure we all know.

Under what circumstances would it be preferable to use one type of tree diagram over the other?

Answer accepted (score 13)

Seeing this question reminded me of a section in Peter Matthews book ‘Syntax’ (1981) (it’s meant to be a textbook, but it’s more like a monograph really). In Chapter 4, p84-93, there’s an explicit comparison and evaluation of dependency grammars vs. constituency grammars.

Matthews shows that for any d-grammar, there is a ps-grammar which will generate the same set of sentences, and vice versa - that is to say that d-grammars and ps-grammars are weakly equivalent.

Matthews goes on to discuss a different and more interesting way in which the two can be compared: Can an analysis in one framework be shown, in each instance, to be isomorphic to an analysis in the opposing framework? In other words, can we say everything using d-grammar that we can using a ps-grammar, and vice versa? If the answer is yes, then the two are strongly equivalent (Chomsky ’63, formal properties of grammar). As far as i understand it, strongly equivalent grammars are essentially notational variants.

In comparing d-grammars and ps-grammars, we shouldn’t necessarily pay too much heed to the variable simplicity of the notation - we can only use this as an argument if it’s already been established that the two are strongly equivalent.

Matthews pretty convincingly shows that there is no mechanical procedure from which d-trees can be derived from ps-trees, and no mechanical procedure from which ps-trees can be derived from d-trees across the board. There are things of descriptive value that can be said in d-grammars that can’t be said in ps-grammars, and vice versa, and therefore they aren’t strongly equivalent. You could entertain the idea of combining the two representations (which is something like what we have in, e.g. modern minimalism), although this is something that Matthews ultimately rejects in favour of his own d-grammar, for interesting and unrelated reasons.

Dick Hudson is a proponent of ‘word grammar’, a modern theory in the d-grammar school, and he has some powerful arguments in his various papers (available here: http://www.phon.ucl.ac.uk/home/dick/wg.htm#main_ideas) that a d-grammar is to be preferred to a ps-grammar. Given that ps-grammars and d-grammars aren’t mere notational variants, it seems to me that one or the other will ultimately turn out to be a more accurate abstraction over speakers’ mental representations. Ultimately, that’s still an open question.

Answer 2 (score 13)

Seeing this question reminded me of a section in Peter Matthews book ‘Syntax’ (1981) (it’s meant to be a textbook, but it’s more like a monograph really). In Chapter 4, p84-93, there’s an explicit comparison and evaluation of dependency grammars vs. constituency grammars.

Matthews shows that for any d-grammar, there is a ps-grammar which will generate the same set of sentences, and vice versa - that is to say that d-grammars and ps-grammars are weakly equivalent.

Matthews goes on to discuss a different and more interesting way in which the two can be compared: Can an analysis in one framework be shown, in each instance, to be isomorphic to an analysis in the opposing framework? In other words, can we say everything using d-grammar that we can using a ps-grammar, and vice versa? If the answer is yes, then the two are strongly equivalent (Chomsky ’63, formal properties of grammar). As far as i understand it, strongly equivalent grammars are essentially notational variants.

In comparing d-grammars and ps-grammars, we shouldn’t necessarily pay too much heed to the variable simplicity of the notation - we can only use this as an argument if it’s already been established that the two are strongly equivalent.

Matthews pretty convincingly shows that there is no mechanical procedure from which d-trees can be derived from ps-trees, and no mechanical procedure from which ps-trees can be derived from d-trees across the board. There are things of descriptive value that can be said in d-grammars that can’t be said in ps-grammars, and vice versa, and therefore they aren’t strongly equivalent. You could entertain the idea of combining the two representations (which is something like what we have in, e.g. modern minimalism), although this is something that Matthews ultimately rejects in favour of his own d-grammar, for interesting and unrelated reasons.

Dick Hudson is a proponent of ‘word grammar’, a modern theory in the d-grammar school, and he has some powerful arguments in his various papers (available here: http://www.phon.ucl.ac.uk/home/dick/wg.htm#main_ideas) that a d-grammar is to be preferred to a ps-grammar. Given that ps-grammars and d-grammars aren’t mere notational variants, it seems to me that one or the other will ultimately turn out to be a more accurate abstraction over speakers’ mental representations. Ultimately, that’s still an open question.

Answer 3 (score 4)

I think there will be an increasing amount of debate about this in the next few years. The above answers from Tim Osborne and P. Elliot give a lot of general discussion about how to compare these two forms of grammar, but no comparison with respect to specific constructions. I provide a lot of this in my article in the proceedings from the second Depling conference held in Prague last August. Those interested in more details can see these proceedings at a website called Proceedings.com. It invites you to buy these proceedings for $95, but I think you can also read individual articles online for free. Mine is on pages 197-207.

My own answer is that DG is better, partly because it is simpler (many fewer nodes), but also because of the specific difficulties for constituent grammar discussed in that article. At the instigation of PElliot, I have found another source which is free. The link is: http://ufal.mff.cuni.cz/project/depling13/proceedings/pdf/W13-3722.pdf

More recent posts on this topic show that computational linguists have come to the same conclusion. Other things being equal, less is better!

Dan Maxwell

86: What’s the difference between recursion and embedding? (score 10507 in 2013)

Question

  1. Chains of relative clauses and strings of attributive adjectives are both examples of recursion–Correct?

  2. Chains of relative clauses have each non-initial relative clause embedded within the previous one:

    [the cat [that killed the bird [that ate the rat [that ate the cheese]]]]
  3. As far as I know, in a string of attributive adjectives, all the adjectives modify the same noun(s); there’s no embedding:

    solitary, poor, nasty, brutish, and short lives
  4. I’m hoping that the experts on the list can give me a clearer idea of the difference between recursion and embedding. For example, is all embedding an example of recursion?

Answer accepted (score 6)

(my slightly incoherent ramblings on recursion, Merge, and embedding)

Recursion as self-embedding

In some generative theories of syntax, recursion is usually understood as self-embedding, in the sense of putting an object inside another of the same type (Fitch 2010, Kinsella 2010, Tallerman 2012). However, Tallerman 2012 argues that HFC 2002 used recursion in the sense of phrase-building or the formation of hierarchical structure generally (p. 451).

Recursion is not Merge

cf. Berwick 1998’s observation that recursive generative capacity is an inherent property of Merge (p. 332). They are both concatenative (or combinatorial) operations; however, Merge involves hierarchy.

Merge vs. iteration

Chomsky says that Merge is putting alpha and beta together. If you want to add gamma to it, then you add gamma to a object [alpha+beta]. The issue here is that for [alpha+beta] to be different from a mere sum of alpha and beta, some new property must arise, otherwise we’ll end up with iteration. What is this new property?

Nevins et al. argue that Merge is recursive because it can combine “lexical items and phrases” of any type (p. 367, n.11), cf. Zwart 2011 “[t]he operation Merge is standardly taken to be recursive in that the output of Merge may be subject to further operations Merge” (p. 116).

When alpha and beta are Merged, either alpha or beta projects its property (N, D, V, v, T, C etc.) onto this whole new phrase. This is what Horsntein 2010 calls Label (and Chomskyan Merge is Concatenate under his proposal; also notice that some generativists don’t do labels, like Chris Collins). However, this information about the head isn’t necessarily preserved through several instances of Merge, e.g. V selects a DP and we end up with a VP. What makes Merge truly recursive is that “the output of one derivation functions as a single item in the next derivation” (Zwart 2011, p. 116). I think this is very important, otherwise Merging gamma with [alpha+beta] will lead to simple iteration. [alpha+beta] must be treated as a single unit when it is Merged with gamma.

Answer 2 (score 6)

(my slightly incoherent ramblings on recursion, Merge, and embedding)

Recursion as self-embedding

In some generative theories of syntax, recursion is usually understood as self-embedding, in the sense of putting an object inside another of the same type (Fitch 2010, Kinsella 2010, Tallerman 2012). However, Tallerman 2012 argues that HFC 2002 used recursion in the sense of phrase-building or the formation of hierarchical structure generally (p. 451).

Recursion is not Merge

cf. Berwick 1998’s observation that recursive generative capacity is an inherent property of Merge (p. 332). They are both concatenative (or combinatorial) operations; however, Merge involves hierarchy.

Merge vs. iteration

Chomsky says that Merge is putting alpha and beta together. If you want to add gamma to it, then you add gamma to a object [alpha+beta]. The issue here is that for [alpha+beta] to be different from a mere sum of alpha and beta, some new property must arise, otherwise we’ll end up with iteration. What is this new property?

Nevins et al. argue that Merge is recursive because it can combine “lexical items and phrases” of any type (p. 367, n.11), cf. Zwart 2011 “[t]he operation Merge is standardly taken to be recursive in that the output of Merge may be subject to further operations Merge” (p. 116).

When alpha and beta are Merged, either alpha or beta projects its property (N, D, V, v, T, C etc.) onto this whole new phrase. This is what Horsntein 2010 calls Label (and Chomskyan Merge is Concatenate under his proposal; also notice that some generativists don’t do labels, like Chris Collins). However, this information about the head isn’t necessarily preserved through several instances of Merge, e.g. V selects a DP and we end up with a VP. What makes Merge truly recursive is that “the output of one derivation functions as a single item in the next derivation” (Zwart 2011, p. 116). I think this is very important, otherwise Merging gamma with [alpha+beta] will lead to simple iteration. [alpha+beta] must be treated as a single unit when it is Merged with gamma.

Answer 3 (score 5)

Things are rendered a bit murky by the fact that the notion of ‘category’ began to get a bit fuzzy with the introduction of features into trees with Aspects of the Theory of Syntax (1965) and especially Remarks on Nominalization (1971), where NP, VP etc were treated as [+N, +‘double bar’], [+V, +‘double bar’], etc, and people were also treating case, etc as features on NPs. So how many features do we pay attention to? The minimalist doctrine says ‘ignore all of them’, but there could also be a ‘maximalist doctrine’ that says ‘pay attention to all of them’, in which case many instances of recursion would go away due to feature differences between the levels of the embedding. So for example ‘John’s picture of Mary’ would not be recursive if you supposed that ‘of’ assigned a kind of abstract genitive case to Mary that was not present on John.

In practice, most linguists (eg Carnie in his textbook) seem to have assumed that standard PS category features count for recursion and others don’t, but there is no satisfactory delineation of what the category features are, so the Minimalist view is probably the most coherent one out there.

87: Why did English lose declensions while German retained them? (score 10327 in 2019)

Question

Why did (or more specifically what caused) English lose declensions whilst they were retained in German? I ask as I have recently been reading into the various Germanic languages and it struck me that Old English had an equally if not more (in that Old English still retained an instrumental case - albeit largely only for preposition) developed case system and yet it has completely vanished from modern English today.

In fact, both Middle English and Middle High German had a very similar case system (although both has undergone syncretism and simplification of their case endings) as is demonstrated by comparing the two tables of strong nouns:

Middle English:

┌───────┬────────┬────────┐
│       │singular│plural  │
├───────┼────────┼────────┤
│nom    │engel   │engles  │
├───────┼────────┼────────┤
│acc    │engel   │engles  │
├───────┼────────┼────────┤
│gen    │engles  │engle   │
├───────┼────────┼────────┤
│dat    │engle   │engle(n)│
└───────┴────────┴────────┘

Middle High German:

┌───┬────────┬──────┐
│   │singular│plural│
├───┼────────┼──────┤
│nom│tac     │tage  │
├───┼────────┼──────┤
│acc│tac     │tage  │
├───┼────────┼──────┤
│gen│tages   │tage  │
├───┼────────┼──────┤
│dat│tage    │tagen │
└───┴────────┴──────┘

However by the Early Modern English period the English language has all but abandoned cases (possibly only retaining a form of the genitive), whereas Early New High German retained them much as they do today (most significantly in articles and adjectives).

I understand that in linguistics why is a difficult question but are there any sources on this or hypotheses?

Answer accepted (score 25)

“The loss and weakening of unstressed syllables at the ends of words….. had disastrous effects on the inflectional system, since many endings now became identical.” — (Barber, 1993: 157)

There is no simple answer to the question, why exactly English has lost the majority of its inflections. Here’s just one idea:

  1. The articulatory stress began to fell on the first syllable;
  2. The (various) inflection endings began to be uniformly reduced to schwa;
  3. The inflections have been subsequently unused, obsolete, and abandoned;
  4. The inflection markers have been replaced by more complex syntax features of the language;

TL;DR
At the end of the Old English period (end of the 11th century), the word endings (containing inflectional markers) became less articulated:

  • Inflection vowels such as -a, -e, -u, and -an appeared to be uniformly reduced (weakened) to -e, (pronounced [ə], or schwa).
  • Word-final -n after -e apparently lost in unstressed syllables. With the course of time, the remaining -e was abandoned as well.

For example, Middle English drinken (from OE drincan) became first of all drinke and then drink (ref. Baugh and Cable, 1993; Burrows and Turville-Petre, 1992).

The same source claims that the example word drink got to its final stage in the North by the 13th century, but spread to other regions by the 15th century.

It is said that the same effect has occurred to in most dialects of Old English.

David Crystal (ref. 1995:32) suggests a possible explanation why such reduction may have occurred. Through the evolution of the Germanic languages, in most words the articulatory stress fell on the first syllable. He suggested that such stress pattern may introduced some difficulties to the audibility of the inflectional endings, especially in day-by-day conversations, especially with phonetically close -en, -on, and -an.

One should also consider one of the potential implications of Chomskyan linguistics, suggesting that all human languages are equally complex. This means, in particular, that a language with a simpler morphology would more likely to have a more complicated and non-straightforward syntax. This implies that OE has gradually developed a more complex syntactic structure to replace the obsolete inflectional endings.

The principle of constant complexity is widely disputed, however.


Other theories well may worth your further research, including:

Leith (1996) draws attention to the argument that the OE inflectional system was inefficient. For example, In the case of mann (man) and hand (hand) there is little distinction between the cases:

            Sing    Plural  |  Sing   Plural
Nominative  mann    menn    |  hand   handa
Accusative  mann    menn    |  hand   handa
Genitive    mannes  manna   |  handa  handa
Dative      menn    mannum  |  handa  handum

…hence, the entire inflectional system may become abandoned due to its incomplete usefulness.

Others observe that OE could significantly reduce its inflections because it became a trade language after the Anglo-Saxons defeated a Norse invasion in 878, leading to its simplification — as suggested in “Adventure of English” by Melvin Bragg, 2004;

As the other answers suggest, pidgin and creole languages often appear to be simpler than “local” languages. David Crystal (see above) considered this factor important;

Answer 2 (score 11)

English has not lost its cases completely yet. The distinction between nominative, oblique case (result of the merger of accusative and dative) and genitive has survived in the personal pronouns, e.g. he / him / his. The general genitive has turned into the clitic ’s.

Dutch, which branched off German relatively recently, has reached a similar stage recently. Not too long (less than a hundred years) ago, Dutch still had cases. But nowadays it looks like in contrast to English even the genitive is going to be lost completely.

There are clear indicators of case loss in German as well.

  • In some constructions traditionally requiring a genitive the dative is now more common, at least in colloquial speech (e.g. after wegen). Even for indicating a possessive relation the genitive sometimes just doesn’t sound right any more and is avoided.
  • In most German dialects dative and accusative have been fused into a single oblique case. A notable exception is that in some Alemannic dialects nominative and accusative have been fused instead. The city dialect of Berlin is famous for the ‘confusion’ of dative and accusative pronouns. Which isn’t actually confusion. It’s just that in that dialect the dative and accusative pronouns exist in parallel and have exactly the same meaning. Consequently dialect speakers select between them based on criteria that have nothing to do with the case system.

I agree with Carsten that language change appears to happen slower in Standard German in general, when you compare it to English and Dutch. Standard German is the result of a conscious effort of scholars to forge a common language for a huge dialect continuum at a time when the masses became literate because the salvation of every peasant appeared to depend on the correct choice between Rome, Luther, Calvin etc. As these scholars were guided by Latin, they picked variants from various dialects to strengthen distinctions that were already on their way out.

In the German-speaking area the approach to the standard language is more similar to that in France than to that in England or the Netherlands. It is not so much seen as something that emerges from how the language is used. It is more like a foreign language that dialect speakers have to learn at school. (Many German dialects are not really mutually intelligible with Standard German.) Consequently it is somewhat detached from the dialects and has its own separate dynamic. It is influenced by the corpus of existing literature as much as by modern colloquial speech, so naturally it moves rather slowly.

Answer 3 (score 8)

Well, maybe because Standard German is an artificial construct based on the Middle High German language of the Bible and many Germans learn it as sort of a second language (after their native dialect). So it’s deemed to be pretty conservative. English speakers, on the other hand, seem to adopt language changes into their literary language faster than speakers of other languages. If you look at German dialects, some of them have lost cases almost like English, for example Low German almost doesn’t have cases, except for the oblique case for masculine singular.

88: What is the relation between the words “Cossack” and “Kazakh”? (score 10275 in 2013)

Question

These two words in English would appear to refer to foreign peoples / cultures known to the Rus within recorded history.

The Russian wikipedia pages indicate a surface similarity in spelling:

Are the etymologies (and, by extension, the current meaning) of these two terms terms referring to similar, or related, groups, or is this a false connection?

Answer accepted (score 16)

You should not confuse the two terms:

  1. The Cossacks: are a group of predominantly East Slavic people who became known as members of democratic, semi-military and semi-naval communities,[1] predominantly located in Ukraine and in Southern Russia.
  2. The Kazakhs: are a Turkic people of Eastern Europe and the northern parts of Central Asia (largely Kazakhstan, but also found in parts of Uzbekistan, China, Russia and Mongolia).

On top of that, The Russian Empire recognized the ethnic difference between the two groups; it called them both Kyrgyz to avoid confusion between the terms Kazakh and Cossack (both names originating from Turkic “free man”.)

the noun qazğaq derives from the same root as the verb qazğan (“to obtain”, “to gain”). Therefore, qazğaq defines a type of person who seeks profit and gain.

Vassmer’s etymological dictionary traces the name to an Old East Slavic козакъ, kozak, originally from Cuman Cosac - a free man (in the Latin translation of this word in the Codex) or a freed man (in the Arabic translation)

In a nutshell, they have the same meaning but each group refers to a specific area.

Source: Wikipedia

I hope my answer would be of help. Thanks!

Answer 2 (score 3)

There’re 2 completely different groups, just their name is of same etymology. It comes from ancient Turkic ‘kazak’ - “free man, wanderer, rambler”.

Answer 3 (score 2)

I believe, a single quote from Vasmer’s dictionary will answer the question:

Слово: казаґк,
Ближайшая этимология: аґ-, укр. козаґк, др.-русск. козакъ “работник, батрак”, впервые в грам. 1395 г.; см. Срезн. I, 1173 и сл. Из укр. заимств. польск. kоzаk “казак”. Ударение в форме мн. ч. казаґки – результат влияния польско-укр. формы; оренб. казаки говорят: казакиґ; см. Зеленин, РФВ, 56, 239. Заимств. из тур., крым.-тат., казах., кирг., тат., чагат. kаzаk “свободный, независимый человек, искатель приключений, бродяга” (Радлов 2, 364 и сл.); см. Бернекер 1, 496; Мi. ТЕl. I, 330. Сюда же казаґки мн., соврем. казаґхи – тюрк. народ. Этноним касоґг не родствен казаґк, вопреки Эльи (505).

A loose translation:

Word: kazak [qɑzɑq]
Nearest etymology: ag-, Ukrainian козаґк, Old Russian козакъ “worker, farmhand”, in grammar since 1395; see Sriezniewski, 1173 and a dictionary. […] Borrowed from Turkish, Crimean-Tatar, Kazakh, Kyrgyz, Tatar, Chagatai kаzаk “a free, independent person, adventurer, tramp” (Radlov 2, 364 and dictionary); see Berneker 1, 496; Мi. ТЕl. I, 330. Cognates казаґки plural, modern Kazakh казаґхи – a Turkic nation. […]

89: What’s the largest dictionary in the world? (score 10262 in 2019)

Question

I’m curious to know: what is the largest dictionary in the world? The English and Malgache Wiktionaries are surely not far off with 3.8M and 3.5M entries but I found a blog post talking about a Chinese-Korean dictionary with nearly half a million entries composed of 55,000 different characters. Is there something larger?

Just like with cities there are various possible metrics. But metrics for dictionaries are much less fluid than for cities. I’m mainly thinking about number of entries (i.e. words) but it’d also be fun to know about the physically largest dictionary.

(I answered the question myself once I found some more information but I’d still like to see an answer with authoritative sources.)

Answer accepted (score 7)

First of all, I assume that with “largest dictionary” you mean the dictionary with the largest number of entries (lemmas), not the one that fills the largest number of pages. In this form, the question probably cannot be answered, given the fact that different languages have different systems of lemmatisation.

To begin with, Wiktionary includes proper nouns, while traditional dictionaries exclude proper nouns. The Dutch WNT covers texts down to 1921 and has 430000 entries. The second edition of the OED (1989) has 291000 entries, but the current on-line version claims to have 600000 words.

Just to problematize the question I would like to mention Arabic, a language considered to have a very large vocabulary. Arabic dictionaries are arranged according to roots. The numerous verbal derivatives of any given root are all counted as a single lemma, while the singular nouns derived from a given root are treated as separate lemmas. The largest dictionary of Classical Arabic, the Tāj al-ʻarūs, is said to contain 120000 “words”. This looks a lot less than the WNT and the OED. However, the Tāj was compiled in the 18th century, the other two in the 20th; the vocabularies of Dutch and English have obviously increased enormously in the last two centuries. Moreover, the Tāj contains in principle only the words found in the Qur’an (not many) and in the early Arabic poets, so let’s say words attested in texts from before about 700 AD. I doubt whether there are many other languages that clock up such a big number by such an early date.

Answer 2 (score 1)

Svenska Akademiens ordbok is an ongoing work that started in the late 19th century. It lists Swedish words from around 1520 until today. That covers modern and current Swedish. On the Swedish Academy’s homepage, SAOB is said to be comparable to WNT, OED and DW (though I do not know if they are referring to size, quality or both). To date it has 36 volumes and covers A through UTSUDDA. I haven’t been able to find any statistics beyond what I’ve given here, but I have sent e-mail to the editor, requesting further information.

http://www.saob.se/ (This page is in Swedish)

Answer 3 (score 1)

Svenska Akademiens ordbok is an ongoing work that started in the late 19th century. It lists Swedish words from around 1520 until today. That covers modern and current Swedish. On the Swedish Academy’s homepage, SAOB is said to be comparable to WNT, OED and DW (though I do not know if they are referring to size, quality or both). To date it has 36 volumes and covers A through UTSUDDA. I haven’t been able to find any statistics beyond what I’ve given here, but I have sent e-mail to the editor, requesting further information.

http://www.saob.se/ (This page is in Swedish)

90: What do WordNet::Similarity scores mean? (score 10122 in 2014)

Question

I am using WordNet Interface in NLTK, which facilitates computation of a number of similarity metrics:

  • Path similarity
  • Leacock-Chodorow Similarity
  • Wu-Palmer Similarity
  • Resnik Similarity
  • Jiang-Conrath Similarity
  • Lin Similarity

I tried computing all of them on words w1 = ‘love’, w2 = ‘hate’, w3 = ‘romance’. On all the metrics, the similarity scores obtained were higher for the pair ‘love’ and ‘hate’ rather than the ‘love’ and ‘romance’. Shouldn’t it have been the opposite case?

Answer accepted (score 16)

The most accessible resource that explains the difference between each of these word similarity metrics would be Dan Jurafsky and James H. Martin’s ubiquitous Speech and Language Processing 2nd Edition. Specifically, pages 652-667 in chapter 20 (Computational Lexical Semantics) briefly and comprehensively cover each metric/algorithm in a way that anyone with just a basic understanding of math, language, and graphs can understand.

I will do my best to summarize each metric, using Jurafsky and Martin as my primary citation (attribute all my summaries to that book), annotated with my own understanding / insight where relevant / useful.

Broadly we can group the metrics based on what parameters they operate on. Roughly there are two groups: (1) metrics which use only a thesaurus (e.g. WordNet) and (2) which use a thesaurus and probabilistic information from distributions in corpora.

These metrics belong to the thesaurus-based ones:

  • Path similarity
  • Leacock-Chodorow Similarity (Leacock and Chodorow 1998)
  • Wu-Palmer Similarity (Wu and Palmer 1994)

How do these work?

Path similarity computes shortest number of edges from one word sense to another word sense, assuming a hierarchical structure like WordNet (essentially a graph). In general, word senses which have a longer path distance are less similar than those with a very short path distance, e.g. man, dog versus man, tree (expectation is that man is more similar to dog than it is to tree).

The path similarity can be defined as:

simpath(c1, c2) = pathlen(c1, c2)

where c1, c2 are word senses, and pathlen(c1, c2) is the shortest number of edges between those two word senses in a given thesaurus like WordNet.

Leacock-Chodorow Similarity, or LCH, is practically the same thing, except it uses the negative logarithm of the result of path similarity.

simpath(c1, c2) =  − logpathlen(c1, c2)

The negative logarithm is in the domain of information theory.

The Wu-Palmer metric (WUP) is very similar to LCH, except it weights the edges based on distance in the hierarchy. For example, jumping from inanimate to animate is a larger distance than jumping from say Felid to Canid. In some sense we can think of it as sort of edit distance, assigning type changing operations a higher cost the higher they are in the hierarchy.

These metrics belong to the thesaurus- and corpus-based ones (also called the Information Content metrics):

  • Resnik Similarity (Resnik 1995)
  • Lin Similarity (Lin 1998b)
  • Jiang-Conrath distance (Jiang and Conrath 1997)

Basically each of these algorithms center around calculating the probability of the lowest common subsumer between two word senses c1 and c2, which is the lowest node in the hierarchy that is the parent of both c1 and c2. The probability comes from the distribution sampled from, i.e. the corpora used. Resnik is the simplest implementation of this, while Lin expands it by considering similarity as both the information content shared between two senses, and the difference. Jiang-Conrath is actually a distance function, best summarized from the chapter:

Jiang-Conrath distance (Jiang and Conrath, 1997), although derived in a completely different way from Lin and expressed as a distance rather than a similarity function, has been shown to work well or better than all the other thesaurus-based methods (p. 656)


To address the question:

I tried computing all of them on words w1 = ‘love’, w2 = ‘hate’, w3 = ‘romance’. On all the metrics, the similarity scores obtained were higher for the pair ‘love’ and ‘hate’ rather than the ‘love’ and ‘romance’. Shouldn’t it have been the opposite case?

Jurafsky and Martin answer this question directly in the chapter:

Word relatedness characterizes a larger set of potential relationships between words: antonyms…have a higher relatedness but low similarity…Word similarity is thus a sub case of word relatedness. In general, the five algorithms we describe in this section do not attempt to distinguish between similarity and semantic relatedness. (p. 653)

Using this (and the basic understanding of each similarity metric), the word senses of love and hate, while antonyms, are very related, since they essentially belong to the same semantic type (one’s feeling for something else). Thus, it would be expected that the metrics give a higher similarity to them, than say love and romance; romance is very similar to love, but its type is not as close as say hate or dislike. This hypothesis is quickly confirmed by testing the WUP metric:

>>> wn.wup_similarity(dislike, love)
0.7692307692307693
>>> wn.wup_similarity(romance, love)
0.25

I would leave it as an exercise to compute this for other metrics, but I am 100% certain you will get similar results for any other metric. What you could do is investigate metrics that consider similarity and not relatedness, i.e. would rank the similarity of love, romance higher than love, hate. Jurafsky and Martin don’t seem to give any references to papers about this, however.

Hope this helps.

91: Is a diphthong one phoneme or two, or does it depend? (score 10062 in 2017)

Question

In Mitch’s answer to “What is the difference between a diphthong and a glide?” and its comments it seems more than one of us is at least a bit confused as to how many phonemes a single diphthong represents:

  1. The two vowel sounds in a diphthong combine to make a single phoneme.
  2. A diphthong is a sequence of two phonemes of a certain kind in a certain relationship.
  3. Sometimes the two sounds of a diphthong are considered to be one phoneme, but sometimes they are each considered separate phonemes, depending on context or something else.

So which of these possibilities is true? And if 3. is true, what are the things that make it one or two phonemes?

Answer accepted (score 3)

  1. A diphthong is one sound segment created by a smooth transition between two targets within the same syllable. As a phonetic definition, this makes no theoretical claims about which phoneme(s) represent the articulation in the mind of the speaker.
  2. This statement is a bit vague, but is, as far as I can tell, true in some situations; see below.
  3. The number and character of underlying phonemes that a diphthong corresponds to varies by language and which researcher you talk to. Different kinds of evidence for a particular interpretation of field data are evaluated differently by each specialists, and mainstream phonology has yet to produce a theoretical framework that gives one clear answer to this question for each language.

    In English, for example, the diphthong [e͡ɪ] could be represented underlyingly as either the single phoneme /e͡ɪ/ or a sequence of two phonemes /ej/, where /j/ is the same segment that appears in /jɛs/ “yes”. The latter analysis is tempting, since it would reduce the size of the phonemic inventory (all possible underlying segments) of English, but would have to explain the phonetic differences between the /j/ in “yes” and the /j/ in “made”, which is much more like an [ɪ]. Of course, the addition of a transformation rule would be standard practice when faced with this situation, but we are then left with the basic question of which is simpler (and thus, more likely to be adopted as a strategy by native speakers): fewer phonemes, or fewer rules?

Answer 2 (score 2)

http://en.wikipedia.org/wiki/Diphthong

Here is what I used the last time I searched for the same thing before this website. Diphthongs are considered one phoneme with two targets, meaning the tongue moves during the pronunciation of the vowel.

Answer 3 (score 1)

I’d go with no.1 because you can find the following minimal pairs in English (RP):

could - cowed /ʊ/ versus /aʊ/

cheese - cheers /i/ versus /ɪə/

dead-dared /e/ versus /eə/

bell-bail /e/ versus /eɪ/

buy - boy /aɪ/ versus /ɔɪ/ etc.

For a comprehensive list of minimal pairs in English (RP) compiled by John Higgins, see http://myweb.tiscali.co.uk/wordscape/wordlist/index.html

Thus, a single diphthong in English represents one phoneme.

This is traditional phonology, not generative.

Naturally, other analyses have been proposed, too. For example, a diphthong is treated as a vowel plus a semivowel or another vowel (McCarthy 1952). Thus, you end up having a diphthong consisting of two phonemes under those approaches. Peter Roach (2009) mentions that this view was “almost universally accepted by American (and some British) writers from the 1940s to the 1960s, and still pervades contemporary American descriptions” (p. 104). Roger Lass (1984) argues that analyzing a diphthong as a vowel plus a semivowel /j/ or /w/ is a “shaky position” (p. 138). He also convincingly dismisses the approach proposed in SPE (Chomsky and Halle), characterizing it as “seem[ing] close to lunacy” (p. 138). Bruce Hayes (2008) concludes that the “fewer-phonemes-the-better” analysis is not an iron-clad argument and that the segment/sequence problem is still “an unsettled one in phonology” (p. 57).

93: diphthong vs. digraph (English) (score 9766 in 2013)

Question

I want to check my understanding of these 2 terms:

diphthong (concerned with sound; 1 sound; represented 2 letters; not long or short)
digraph (concerned with graphemes; 2 letters; can be long or short)

Is this accurate? If not what is the similarities/differences? And can a diphthong be represented by a digraph/ is it always represented by a digraph?

Answer accepted (score 4)

You are right about their being concerned with sound and graphemes, respectively.

A diphthong is concerned in particular with vowels. The term refers to a combination of two vowels characterized by a sort of glide from one vowel to the other. English is full of them, and particularly in English it is not true that a diphthong is always represented by two letters. Here are some examples:

a in “gate” /eɪ/

i in “bite” /aɪ/

o in “alone” /əʊ/ (BE) or /oʊ/ (AE)

ow in “cow” /aʊ/

A digraph is any two-letter representation of a single sound. Examples:

th in “another” /ð/ or in “thanks” /θ/

sh in “English” /ʃ/

oo in “root” /u/

The diphthong /aʊ/ in “cow” is represented as the digraph “ow”. This is not unusal, but as you can see in the examples, not always the case.

Answer 2 (score 1)

A diphthong is a single vowel sound which is not a single ‘pure’ vowel quality, but is composed of a transition from one vowel target to another. Some examples (using RP English) are the vowel sounds in the following:

lied [laɪ̯d]

low [ləʊ̯w]

A digraph is a combination of two graphemes to represent a single sound. Thus the digraph <ph> is composed of two graphemes, <p> and <h>, but as <ph> it represents a distinct sound, the labiodental fricative [f]. Digraphs may be used to represent diphthongs.

Answer 3 (score 1)

A diphthong is a single vowel sound which is not a single ‘pure’ vowel quality, but is composed of a transition from one vowel target to another. Some examples (using RP English) are the vowel sounds in the following:

lied [laɪ̯d]

low [ləʊ̯w]

A digraph is a combination of two graphemes to represent a single sound. Thus the digraph <ph> is composed of two graphemes, <p> and <h>, but as <ph> it represents a distinct sound, the labiodental fricative [f]. Digraphs may be used to represent diphthongs.

94: Functionalism vs Formalism? (score 9702 in )

Question

I’m trying to really understand the difference between linguistic functionalism and formalism, but I can’t find any good concrete examples of either.

From what I can tell, functionalism is a sort of lower-level analysis of language, while formalism takes a high-level, idealistic approach. But if I were asked the question: “Is theory X an example of formalism or functionalism?”, I wouldn’t really know how to answer.

Is there a simple example of a functional theory vs. a formal one?

Answer accepted (score 3)

It’s hard to nail down a scientific difference between functionalist and formalist approaches, because the goals and domains of investigation are usually disjoint. If you want some opposite ends of the spectrum, you could compare David Stampe’s dissertation on Natural Phonology with this paper. The main question is whether there is an autonomous computational “thing” that we call a grammar. A formalist will say yes, and studies the nature of that computational system. While generativists additionally claim that this computational object is an aspect of the mind, there are non-generativist formalists (certain HPSG practicioners, for example) who make no such claim about the mind, i.e. they just look at the system as a Platonic abstraction. A functionalist, on the other hand, cannot be a Platonist (of course, I may now learn that somehow that has actually happened).

A functionalist focuses on why language behavior is the way it is, attempting to reduce language facts to being a result of more general cognitive properties. Some functionalists don’t care if there is a small autonomous faculty for grammatical computation, they are just uninterested – others (e.g. Robert Port, see his Language paper “Against Formal Phonology”) are opposed to the concept. Formalists are less interested in functional (non-grammatical) aspects – they don’t deny that there are non-grammatical aspects to language, they are just focused on understanding the grammar part of language. So good formalists have to know how to weed out the functional chaff, and unfortunately, sometimes that doesn’t happen and you end up with “formal” theories that basically reify functional expectations (for instance, SPE introduced a formal mechanism of “markedness” into phonology, which reifies various phonetically-based functional tendencies).

Fritz Newmeyer is well-known for his investigations into formalism vs. functionalism, and one would be well-served by reading most of what he has written.

Answer 2 (score 0)

Perhaps a better way to approach the difference between functionalist and formalist approaches would be to explore the history and personalities involved.

In principle, there’s no opposition between functionalism and formalism. Units of language are used for certain functions and these functions can be described using formal methods. The question really is what are those functions and what are the formalisms used.

Historically, functionalism was closely associated with the Prague structuralist school of linguistics (known as the Prague Circle of Linguistics - full disclosure, I’m a member). The functionalist theme developed in the 1930s (later put into in opposition to the functionalism of the Copenhagen school approach of Hjelmslev who understood function more in the mathematical sense). The Prague school was revived in the 1960s and tried to blend functionalist methods into the then increasingly popular formalist approach (inspired for instance by formal semantics of people like Montague).

Formalism, on the other hand, is associated with Chomsky (although there were others and even earlier schools) who developed mathematical theories of the combinatorics of grammar combined with the claim that they are facts of language without any reference to their function. But other formal approached to language were also being developed about the same time. I already mentioned Montague, but there was also Bar-Hillel with his categorial grammar. Both of these were much more compatible with functionalist approaches in that they are concerned with formalizing language units that are put to real uses.

Today, functionalism is most closely associated with the British Firthian school of linguistics - it’s most famous proponent being MAK Haliday (it is now particularly popular in Australia and New Zealand). This school had the most impact on wide areas of language description from text and discourse analysis to pedagogic grammar (most foreign language textbooks and grammars will be beholden to some version of functionalism). Most of the work on language corpora has come out of the functionalist tradition (in the broadest sense). I think Haliday’s three meta functions of language: 1. Ideational, 2. Interpersonal and 3. Textual are still the best delineation of what dimensions a linguistic theory needs to account for.

Ultimately, the difference between formalist and functionalist linguistics is not the irreconcilable rift between linguistic theories it is thought of as. I’d suggest any one linguistic theory needs to be judged on a range of issues and whether it is labelled or labels itself as functionalist or formalist, is probably not that important.

95: What is a mora? (score 9571 in 2017)

Question

What is a mora? I tried to read the Wikipedia article that answers this question, but found it difficult to understand.
Ditto with the related LSE question: Is the concept of syllables pronunciation-relevant in languages with mora-based pronunciation?

That’s why I decided to ask the question here.

Answer accepted (score 5)

I recently took a phonology course where Mora Theory was breifly mentioned but we didn’t go too indepth so your question piqued my interest. I did some light googling and I found this set of lecture notes which I think does a pretty good job explaining it:

Ladefoged (1982: 226 in Vance 1987) states that “A mora is a unit of timing. Each mora takes about the same length of time to say” (p.62)

So, if I am understanding the issue correctly, onsets are not assigned a mora because they do not significantly add to the pronunciation time. That is, the time required to pronounce “ah” and “bah” are not significantly different. That amount of time is defined as one mora.

I’m sitting at my computer trying various onset/vowel combinations to see if this is true and anecdotally it seems to check out. (Incidently, it would be a good idea to make sure no one else is home before trying this yourself)

Now, some things take extra time to pronounce. Long vowels are longer than short vowels. Codas (post-vowel consonants) take more time to pronounce. Thus, such syllables are 2 moras because the syllable can be divided into two basic units, each taking time to pronounce:

  1. the onset/vowel
  2. the vowel continuation or the coda

I hope this helps clear up your issue.

Answer 2 (score 6)

I’m not sure what aspect of the Wikipedia explanation you find unclear, but here’s an analogy:

Imagine the government asks a research group to figure out “how much the Lincoln Tunnel gets used by people to enter Manhattan from New Jersey in a 24-hour period”. There are two obvious ways to obtain this measurement; one could count the number of vehicles that enter the tunnel in a 24-hour period, or one could count the number of people riding in the vehicles that enter the tunnel during a 24-hour period. Either metric could be relevant, depending on what the government is trying to focus on. If they care about factors that contribute to street traffic, then the vehicle-counting metric might be more appropriate. If they are more concerned with the actual population density of Manhattan during the day and how much of it is affected by out-of-towners, the person-counting metric might be more appropriate.

In phonology, syllables are like cars and morae are like people. Every car on the road must contain at least one person (the driver), but a car can also contain multiple people. Likewise, every syllable contains minimally one mora, but it may contain two or even three. Certain phonological processes are dependent on syllable count and some are dependent on mora count; some are dependent on both.

Studying Japanese is a great way to become familiar with the concept of a mora. As the section on Japanese in the Wikipedia article mentions, most hiragana symbols correspond to a single mora (an exception is when one symbol is followed by one of the “little” gliding kana ゃ, ゅ and ょ–in this case the two symbols together correspond to a single mora), so that Nippon, which is written with the four hiragana symbols にっぽん, contains four morae. Basically, because the hiragana system gives syllable-final consonants their own symbols (ん for a syllable-final nasal and っ for a geminate consonant that anticipates the initial consonant of a following syllable) but syllable-initial consonants are just encoded as part of a CV syllable represented by a single symbol (な for na, か for ka, etc.), the writing system encodes an asymmetry that is quite common cross-linguistically: nuclear vowels and coda consonants contribute to syllable weight in Japanese but not onset consonants. So na, a, and ka are “light”/monomoraic syllables, and nai, an, kan, and kat (as in the geminated katta) are “heavy”/bimoraic syllables. (At the risk of stretching the analogy too far, onset consonants, when they are present, are kind of like pets that are riding in the car. They don’t get counted as people entering the tunnel!)

For an example of a process that is sensitive to both the syllable and the mora in Japanese, consider pitch accent assignment. In the “standard” Japanese dialect, a pitch accent is associated with the initial mora of a syllable. So with three-syllable words, there are three possible accent patterns for accented words–initial, medial, and final accent (of course words can be unaccented as well). It doesn’t matter if the syllables are light or heavy. But if the accented syllable is heavy, the high “tone” (trying to stay pre-theoretical here) of the accent is constrained to associate with the initial mora of the accented syllable.

I hope at least part of this explanation clears things up for you a bit! Let me know if there is anything in particular that is still unclear and I will try to improve my answer.

UPDATE TO ORIGINAL ANSWER:

Just a point of clarification–the system described above for Japanese is a language-specific one. For example, in some languages that count morae, coda consonants do not get counted as being moraic. And, as @jlawler points out in his comment under @acattle’s answer, the phonetic realizations of syllable weight contrasts are not the same in all languages (i.e. a heavier syllable–one with more morae–may not take longer to pronounce than a lighter one), so the definition of a mora cannot be dependent on its phonetic behavior. Below is a list of possible moraic segments (in the rime of the syllable) and some example languages that use those criteria for moraicity (taken from Morén’s 1999 dissertation, p.16):

Vowels (Khalkha Mongolian, Yidiɲ)

Vowels + Glides (Gumbaynggir)

Vowels + Non-glottal Sonorants (Kwakwala)

Vowels + All Sonorants (Lithuanian, Tiv)

Vowels + All Consonants except Plain Stops (Metropolitan New York English)

Vowels + All Consonants except Aspirated Stops (Icelandic)

All Segments (Latin, Arabic dialects, Aklan, Koya, Imdlawn Tashlhiyt Berber)

Answer 3 (score 5)

I recently took a phonology course where Mora Theory was breifly mentioned but we didn’t go too indepth so your question piqued my interest. I did some light googling and I found this set of lecture notes which I think does a pretty good job explaining it:

Ladefoged (1982: 226 in Vance 1987) states that “A mora is a unit of timing. Each mora takes about the same length of time to say” (p.62)

So, if I am understanding the issue correctly, onsets are not assigned a mora because they do not significantly add to the pronunciation time. That is, the time required to pronounce “ah” and “bah” are not significantly different. That amount of time is defined as one mora.

I’m sitting at my computer trying various onset/vowel combinations to see if this is true and anecdotally it seems to check out. (Incidently, it would be a good idea to make sure no one else is home before trying this yourself)

Now, some things take extra time to pronounce. Long vowels are longer than short vowels. Codas (post-vowel consonants) take more time to pronounce. Thus, such syllables are 2 moras because the syllable can be divided into two basic units, each taking time to pronounce:

  1. the onset/vowel
  2. the vowel continuation or the coda

I hope this helps clear up your issue.

96: What are the arguments against Chomsky’s Universal Grammar? (score 9565 in 2012)

Question

What are the most convincing and most popular arguments against the Innateness Hypothesis of Universal Grammar or Universal Grammar as described by Chomsky?

Answer accepted (score 9)

Obviously there is much written on this topic. A good place to start reading might be Evans & Levinson’s (2009) article in Behavioral and Brain Sciences which is accompanied by responses.

See The myth of language universals: Language diversity and its importance for cognitive science by Nicholas Evans and Stephen C. Levinson.

Answer 2 (score 3)

You may be interested in A Thousand Plateaus, a book Felix Guattari co-authored with Gilles Deleuze, in which they discuss Chomskyian linguistics in some depth and with a great deal of care.

Some caution may be warranted: they may seem to take Chomsky rather lightly, and their position may appear to be antithetically opposed in many ways – you will find them, for instance, arguing against ‘tree-based’ sentence analysis, promoting in its stead a kind of ‘an-hierarchical’ or ‘rhizomatic’ language analysis emphasizing the pragmatic and collective aspects of discourse. However, a careful reading of this work will find a unique encounter with Chomsky’s linguistics that is well worth the time to unpack.

They also offer a detailed reading of the Labov-Chomsky debate, which may be of some interest in this context.

Answer 3 (score 2)

The UG proposed by Chomsky has to be one that is consistent with transformational grammar, because Chomsky proposed TG. Chomsky does not like to be wrong.

But TG is wrong – there are no transformations. Once we give up on transformations, we can see that the answer is at hand. There is a UG, which was originally proposed by Chomsky himself, and that is CFG (context free phrase structure grammar). The basic problem with Chomsky’s latter day attempts to find a satisfactory UG theory is that he is looking for one that is consistent with some form of TG. Such a UG theory will never be found, not because the UG hypothesis is wrong, but because there are no transformations.

Well, then, if the UG problem was solved over 50 years ago, why are we still suffering over whether there is a UG theory? It’s because there are some problems with CFG to get past. I think they’re all soluble, but it takes some work. Here’s a list:

  1. Describing constructions by appealing to movement of constituents is part of traditional grammar. How can there not be movement?
  2. It seems evident that sometimes word choice is not independent of context, so CFG is doomed from the outset, because it is context free.
  3. In his profoundly influential classic Syntactic Structures, Chomsky introduced CFG only to knock it down. He offered there several empirical arguments that natural languages cannot be described by CFGs. Other such evidence has been found since, notably the cross-serial constructions investigated by Stuart Shieber.
  4. Once Gerald Gazdar had shown how it was possible to get past the difficulty of describing apparent movement in a CFG, the descriptive grammar of English that emerged from applying the theory, GPSG (Generalized Phrase Structure Grammar), had too many rules and categories. It seemed obvious that mere humans could not cope with the tremendous complexity of detail that seemed necessary to describe a natural language within the confines of CFG.

97: What is the difference between assertive and non-assertive words? (score 9545 in )

Question

What is the difference between assertive and non-assertive words?

I haven’t been able to find an answer in my online linguistics sources such as the SIL Glossary of Linguistics Terms.

The only source I could find was an English Grammar site, namely http://www.englishgrammar.org/nonassertive-words/#mjsDTl7shKvqoAB0.99

This source states that assertive words are generally used in declarative sentences.

Examples of assertive words include “some, once, already, somebody, something, sometimes, somewhere, someone.”

The source also explains that non-assertive words are used " in questions and negatives. They are also used in if-clauses and with adverbs, adjectives, verbs, prepositions and determiners that have a negative meaning."

Examples of non-assertive words include “any, anything, anybody, ever, & yet.”

However, as I read the source, I wasn’t able to determine exactly what the semantic or pragmatic difference between assertive and non-assertive words is. I’m hoping that someone can spell this out for me, and/or recommend a source that explains the difference.

Also, the link provides an example is given of an assertive word used in a question, namely “Did you want something? (Suggests ‘I think you want something’.”

Answer accepted (score 3)

Other terms for assertive and non-assertive in this context are realis and irrealis as well as positive and negative polarity.

When using some, somebody etc. the existence of the entity in question is asserted.

There is somebody hiding behind the box means that the speaker assumes that there is a person hiding there, although they are not aware who it is. The existence of the person is real/realis. By contrast, Is there anybody behind the box? does not make the assertion, but simply raises the possibility. The existence of the person is hypothetical/irrealis. The same is true of negation: There isn’t anybody behind the box.

A complication arises from the fact that, as you mentioned, assertive words can sometimes be used in questions or negated contexts. We can, for example, contrast:

  1. Is there anybody behind the box?
  2. Is there somebody behind the box?

Of these two, 1) is the more usual, or unmarked way of phrasing the question. 2) is special, in that the speaker assumes the answer to the question might by yes, and is perhaps troubled by the fact (such as in a horror movie).

Non-assertive words can also be used in assertive contexts, but this is again marked and used for a particular effect:

There must be something we can do. I would do anything to help him!

Assertive something is the default choice and asserts that there is an action that might be appropriate. Anything is marked in this assertive context and stresses the emotional involvement of the speaker and the force of the statement.

Answer 2 (score 3)

Other terms for assertive and non-assertive in this context are realis and irrealis as well as positive and negative polarity.

When using some, somebody etc. the existence of the entity in question is asserted.

There is somebody hiding behind the box means that the speaker assumes that there is a person hiding there, although they are not aware who it is. The existence of the person is real/realis. By contrast, Is there anybody behind the box? does not make the assertion, but simply raises the possibility. The existence of the person is hypothetical/irrealis. The same is true of negation: There isn’t anybody behind the box.

A complication arises from the fact that, as you mentioned, assertive words can sometimes be used in questions or negated contexts. We can, for example, contrast:

  1. Is there anybody behind the box?
  2. Is there somebody behind the box?

Of these two, 1) is the more usual, or unmarked way of phrasing the question. 2) is special, in that the speaker assumes the answer to the question might by yes, and is perhaps troubled by the fact (such as in a horror movie).

Non-assertive words can also be used in assertive contexts, but this is again marked and used for a particular effect:

There must be something we can do. I would do anything to help him!

Assertive something is the default choice and asserts that there is an action that might be appropriate. Anything is marked in this assertive context and stresses the emotional involvement of the speaker and the force of the statement.

98: Why do most words for “mother”, across languages, start with an [m], and for “father” with [p]/[b], but not vice versa? (score 9543 in 2012)

Question

It has been observed that in general, a word for “mother” tends to be based on a bilabial nasal [m] or similar consonant, and for father it tends to be [b] or [p]. This is found in many language families, so they can’t be considered cognates. This is usually explained by the easiness of babies making this sound early in their babbling, so the parents just use those sounds to refer to themselves. However why is it that few languages have used the reverse? (i.e. [p]/[b] for mother and [m] for father)? Wikipedia lists only Georgian, where “father” is /mama/ and “mother” is /deda/.

Answer accepted (score 65)

This issue is discussed in some detail in “Where do mama/papa words come from?” by the late, great linguist Larry Trask. The paper also gives a very nice introduction to argumentation in historical linguistics.

The answer is that these terms are based on the earliest ‘intelligible’ babble of babies. The most common first syllable produced by babies is [ma], with [pa/ba/ta/da] following soon after. These earliest articulations are probably just play for the child, but are interpreted by parents as attempts by the baby to address them. As mothers tend to be the main early caregiver the earliest-occurring syllable, typically [ma], is interpreted as referring to them, while the next-occurring syllables are very commonly interpreted as referring to the father. This gives rise to the strong, but not invariant, association of [ma] with ‘mother’ and [pa/ba/ta/da] with ‘father’.

Answer 2 (score 16)

Although it is a strong tendency it is still only a tendency. The prevalence of these two sounds in the names of parents is not surprising, given that they’re two of the easiest sounds to make regardless of the sound system of your language. Think about English - we have ‘mother’ and ‘father’ but we’ll accept ‘mama’ or ‘dada’ as a first word, because they’re easy to make - open CV structure and basic stop sounds.

Basically, across the world caregivers have an incentive to hear what they want to hear - and they want to hear their children say their names. Given that cross-linguistically mothers are generally caregivers, the easiest sound ‘ma’ usually is used first and the harder sound ‘pa’ or ‘da’ somewhere around second. Children are already very aware of who their caregivers are before they can articulate that, so as soon as they can articulate something, even something as simple as ‘ma’ this gets attributed as a name.

So it’s less that these words are a strong cross-linguistic tendency and more that parents are anxious to hear meaning in a child’s early babbling-like word production!

Answer 3 (score 1)

Let me add to these already great answers what I read in a paper (I cannot recall, which): m is a nasal sound that can be produced while suckling on the breast, and suckling involves both lips (hence m is a bilabial nasal). I am not sure if the argument goes “They can utter the word for ‘mom’ while suckling” (unlikely) or “They associate the movement and the sound they produce while suckling with the person”. Anybody came across this paper?

99: In Turkish, how exactly does “ğ” affect the vowel it follows? (score 9535 in 2013)

Question

In Standard Turkish, “ğ” is explained as having no sound of its own but instead lengthens the previous vowel.

So would “aa” and “ağ” sound alike? What about “â” and “ağa”? Can there sometimes be three vowel length distinctions in Turkish?

(This is a reworded version of an example question I put up on the Turkish Language & Usage proposal but I’ve wondered about it since before that.)

Answer accepted (score 7)

ğ is a symbol used in writing Turkish. When word or syllable final, it indicates a preceding back vowel is lengthened and is typically silent otherwise. In some dialects it may be realized as a velar (or uvular) approximant, fricative or plosive. A velar approximant is an acceptable pronunciation in standard Istanbul dialect too, but it’s becoming increasingly rare. Following a front vowel it may manifest as a palatal glide.

So “aa” and “ağ” are identical. When intervocalic, the preceding and following vowels belong to different syllables, meaning that “ağa” is a long syllable-final vowel followed by a syllable-initial vowel (so Turkish does not have a three-way length contrast). I don’t know “â”??

There are some other complexities, discussed in the pdf here (pp 7-8).

Answer 2 (score 7)

ğ is a symbol used in writing Turkish. When word or syllable final, it indicates a preceding back vowel is lengthened and is typically silent otherwise. In some dialects it may be realized as a velar (or uvular) approximant, fricative or plosive. A velar approximant is an acceptable pronunciation in standard Istanbul dialect too, but it’s becoming increasingly rare. Following a front vowel it may manifest as a palatal glide.

So “aa” and “ağ” are identical. When intervocalic, the preceding and following vowels belong to different syllables, meaning that “ağa” is a long syllable-final vowel followed by a syllable-initial vowel (so Turkish does not have a three-way length contrast). I don’t know “â”??

There are some other complexities, discussed in the pdf here (pp 7-8).

Answer 3 (score 2)

When ğ is between back vowels: “provides a smooth transition between vowels, since they do not occur consecutively in native Turkish word”. (wikipedia)

As for â and ağa - they denote different historical evolution rather than different lengths. The soft g has, at some point, existed as a separate sound between vowels. For example in Turkmen (another language of the same group) the word for “onion” is “sogan”. Loanwords in other languages, like Bulgarian or Bulgarian dialects also have the g - “sugan” for “onion”. On the other hand â is used only for load words.

100: What is word order used for in “free word order” languages? (score 9505 in 2016)

Question

Consider languages whose case-systems allow the order of arguments to be changed without changing the arguments’ grammatical relations. (Note the 189 languages noted as having “no dominant word-order” at The World Atlas of Language Structure Online: Feature 81A: Order of Subject, Object and Verb .) In such languages, what information is word order typically used to convey?

Answer accepted (score 26)

As a native speaker of Polish (and a linguist), I can say this: in Polish, word order is almost absolutely free; the only constraint I can think of is that prepositions (e.g. w ‘in’, z ‘from’) can’t be placed after the word they refer to. Apart from that, pretty much any ordering is always understandable. It is true, however, that in some sentences some orders will sound more natural, while others will be considered emotional or emphatic, as Aspinea explained above for Latin. An example:

Jaś poszedł do kina.
Johnny went to cinema.
The first association: natural, affirmative, simply stating a fact.
But could be a question if so intoned, or perhaps anything else, too.

Poszedł Jaś do kina.
The first association: half a sentence, expecting the second part (e.g. ... and he saw a ghost there.)
But could as well be a confirmation that he did indeed go, a question, or perhaps anything else, too.

Jaś do kina poszedł.
The first association: answer to the question "Where did John go?".
But again, could be used in plenty of other contexts, too.

Poszedł do kina Jaś.
A tiny bit odd. The first association is that "Jaś" was added as an afterthought (could have been without the tiniest pause, though).

Do kina Jaś poszedł.
The first association: answer to the question "Who went to the cinema?"
But again...

Do kina poszedł Jaś.
The first association: expect an enumeration who else went to the cinema.
But again...

That’s all the possibilities, right? You might think SVO would be the most frequent one. Well, that’s the first thing that comes to one’s mind when they hear the English sentence “Johnny went to the cinema”, but I really don’t think that in real life it is so much more frequent.

At any rate, take a look at the above examples and imagine that essentially any sentence in Polish can be reordered in an equally free way, with next to no consequences at all. In more complex sentences with subordinate clauses and such, certain orders will be more natural as easier to understand, but that’s it: more natural, definitely not the only ones. I think this goes a long way to explain why the majority of linguists who work with languages with more inflection than English, do not share the Anglo-Saxon obsession with syntax and object to using ‘syntax’ and ‘morphology’ interchangeably. This includes generativism. I’m quite convinced the only reason it won any following at all in continental Europe is the power of the US and anything coming from there being automatically fashionable.

PS. A very distant afterthought: a mediaeval Polish shibboleth to distinguish Germans was miele młyn [ḿjεlε mwɨn] lit. ‘grinds mill’. In theory, it could be SV just as well but somehow this sounds just fine in modern Polish, too.

Answer 2 (score 21)

As Cerberus hints in his comment on Aspinea’s answer, it’s quite common to use word order to convey information structure. (Sadly, Wikipedia doesn’t have an article on the relevant sense of the phrase “information structure,” and I can’t find a good nontechnical introduction elsewhere. The article Basic Notions of Information Structure by Manfred Krifka is a good survey of the subject for linguists.)

Information structure is a pretty broad subject, but there are two basic ideas that are especially important when you’re talking about word order: topic and focus. The topic of a sentence is, roughly speaking, “what the sentence is about”. Usually, a sequence of sentences will have the same topic, or at least closely related topics. But sometimes you’ll want to change topics between sentences, or to contrast two topics with one another — and speakers will tend to mark the new topic in some way when they do this.

English actually does mark topic changes using word order, at least some of the time. In passages like these…

Mary likes beans. Me, I can’t stand them.

Mary likes beans. Lentils, she doesn’t like so much.

…the left-dislocated word in the second sentence (“me” or “lentils”) is marking a change in topic: “Okay, we were talking about beans before, but now we’re gonna switch gears and talk about lentils for a while.”

The other important idea is focus. Focus actually has a bunch of different uses (see the Krifka article I linked up above for the gory details) but one of the biggest uses is to mark the answers to questions. In English we use prosody to mark focus, like so:

“Who likes beans?”
“MARY likes beans.”

“How does Mary feel about beans?”
“Mary LIKES beans.”

“What does Mary like to eat?”
“Mary likes BEANS.”

But some languages use word order to mark focus too.

So let me give some examples from K’ichee’, which is a “free word order” language that I’ve studied for a while. Default word order in K’ichee’ is VOS, so the standard way of saying “Mary likes beans” would look like this:

Karaj kinaq’ ra’l Mari’y “Mary likes beans” (literally “Likes beans miss Mary”)

In K’ichee’, both topics and foci can move to the left of the clause. After a topic, you get a pause; there’s no pause after a focus; and if both the topic and focus move, the topic goes first. You can also move words to the end of the clause, which makes them sound a bit like an afterthought. So the “real” word order in K’iche’ is more like “Topic (pause) focus V normal-O normal-S (pause) afterthought,” though of course you won’t get all those slots filled in a single clause.

Anyway, this lets you say things like these:

Kinaq’ karaj ra’l Mari’y “Mary likes BEANS”

(This answers the question “what does Mary like?”)

Ra’l Mari’y (pause) karaj kinaq’ “As for Mary, she likes beans.”

(This changes the topic: we were talking about someone else, now we’re talking about Mary)

Ra’l Mari’y (pause) kinaq’ karaj “As for Mary, she likes BEANS.”

(This changes the topic and answers a question. For instance, maybe you asked a minute ago about what someone else likes, but now you’re asking what Mary likes, and I’m answering the second question)

Karaj ra’l Mari’y (pause) kinaq “Mary likes ’em…. likes beans, that is.”

(This makes “beans” an afterthought. You might use it when you’ve been talking on the topic of beans for a while, but you want to remind your listener that that’s what you’re talking about.)

And so on. The upshot is, all six permutations of subject, verb and object are possible in K’ichee’, given the right combination of topic, focus and afterthought.

For what it’s worth too, I don’t speak Polish but it looks like something similar is going on in Kamil Stachowski’s Polish examples. Note that sometimes he says that a particular word order answers a specific question. This suggests to me that those word orders are being used for focus-marking in Polish — and a little Googling turns up this article on Polish focus-marking claiming that word order is indeed relevant.

Answer 3 (score 19)

In Slavic languages (I am a native speaker of Russian) the relatively free words order is often used to convey the information that otherwise would have been rendered using articles (which we don’t have).

Here are some examples:

Мальчик вошел в комнату.
(literal) Boy entered room.
(meaning) The boy entered a room.

В комнату вошел мальчик.
(literal) Room (object) entered boy (subject).
(meaning) A boy entered the room.

So, as you can see, we use the word order as a sort of mild emphasis that is rendered in other languages using definite and indefinite articles.