Matsés - Are languages providing epistemic certainty of statements not of the interest of the EA community?

By Miquel Banchs-Piqué (prev. mikbp) @ 2021-06-08T19:25 (+15)

Hi, I just read that Matsés, a native language in Perú, obliges the speakers to indicate the certainty of information provided in a sentence. It seems that Tuyuca and other indigenous south-american languages (mostly amazonic languages) do that to certain degree as well. I thought this would be a big topic in the EA forum, but when I searched for the term "Matsés" in the forum I only found this. Are such languages not an interesting phenomenon worth study to explore ways to do the same in other languages? I though the EA community would be fascinated by such languages and boiling with ideas on how to apply something similar in English and other languages.

Linch @ 2021-06-09T09:05 (+20)

Hi mikbp! Thank you for bringing this up. I think this is a really cool idea/concept.

To answer why EAs aren't currently interested in Matsés, I think the real answer is that most/almost all EAs, myself included, just haven't heard of it! I think people from outside our movement vastly overestimate the size of the EA movement and amount of knowledge we have access to. The world is big, and there is much more knowledge "out there" than within our community.

In particular, I think it's fairly likely that zero EAs have done academic or advanced amateur studies into linguistics of South American native languages, for example.

I would certainly be interested/excited to see more people investigate the Matsés language further to see if there are practical things we can learn from them.

That said, my personal view (as a novice/amateur to linguistics) is that I'm generally skeptical of claims on the rough contours of "high-level language study/use being an important bottleneck in understanding/communication of concepts."

Evidence vaguely gesturing at this skepticism:

Strong Sapir Whorf/linguistic determinism is a linguistic hypothesis that language determines our thoughts and constricts our ability to think of concepts that we don't have the vocabulary, etc for. This view seemed intuitively plausible but is soundly rejected in empirical data
- My impression is that a weaker version, that language "influences" thought, has some empirical backing.
It was ex ante plausible that different languages can convey information at very different rates. This also seems empirically untrue, across very different language families and surface features like syllables uttered/minute.

So at a high-level, I'm moderately pessimistic that there are "technology"-like advances of natural spoken language that speakers/designers of other languages can learn from, rather than understanding natural language development as a combination of random factors + "fit to context/purpose."

Inside views aside, I'd still be excited to see somebody dive for 5 - 40 hours into Matsés + brainstorm ways EAs can learn from the ways they speak.

mikbp @ 2021-06-09T12:29 (+1)

I though the point would be that right now English and other "western" languages are not very fit for purse of the EA community in terms of conveying the certainty of our statements -it can be done but it sounds artificial and, therefore, it is now quite tiring to always point to how sure you are about what you say. So we may learn something.

But you know much more about this than me! :-)

Linch @ 2021-06-09T17:42 (+2)

But you know much more about this than me! :-)

Probably not by much! :) I'm certainly not an expert in linguistics, and I would not consider myself a talented amateur here either.

mikbp @ 2021-06-09T12:37 (+1)

Ah, and I've read that these people put a lot of importance on giving true statements. So I guess this is one reason why this feature evolved (although it might be the other way around, as it is easy to be accurate, they give more importance to it).

Linch @ 2021-06-09T17:42 (+3)

Ah, and I've read that these people put a lot of importance on giving true statements

All else equal, this does seem like a very useful trait to emulate.

michaelchen @ 2021-06-11T21:50 (+12)

I'm pretty interested in linguistics so after reading gavintaylor's comment which you linked to, I decided to read part of David Fleck's doctoral thesis from 2003 on the Matsés language. For distinguishing levels of certainty, Matsés has a particle ada/-da and a verb suffix -chit which mean something like "perhaps" and another particle, ba, that means something like "I doubt that...", and English speakers already naturally express these distinctions of certainty. Something distinctive that Matsés speakers do though is that they always mark statements about the past with an evidentiality suffix to show whether the proposition is known by direct experience, inference, or conjecture. By the way, it is not true that Matsés "obliges the speakers to indicate the certainty of information provided in a sentence".

The constructed language Lojban has a richer set of evidentials, derived from a variety of American Indian languages. It might be useful to think about including phrases like these in one's writing.

ja'o [jalge] I conclude
ca'e I define
ba'a [balvi] I expect I experience I remember
su'a [sucta] I generalize I particularize
ti'e [tirna] I hear (hearsay)
ka'u [kulnu] I know by cultural means
se'o [senva] I know by internal experience
za'a [zgana] I observe
pe'i [pensi] I opine
ru'a [sruma] I postulate
ju'a [jufra] I state

DataPacRat from Less Wrong proposed Lojban particles to express subjective probabilities:

ni'uci'ibei'e: 0%
ni'upabei'e: 44.3%
ni'ubei'e: <50%
nobei'e: 50%
ma'ubei'e: >50%
pabei'e: 55.7%
etc.
ci'ibei'e: 100%
xobei'e: asking listener their level of belief

These words are based on logarithmic probability, which I don't really understand. I personally would find it a lot easier to just write a linear probability or percentage such as 80% confidence. It might be useful to include subjective probabilities for various sentences in a document, like Technicalities used to do according to their comment on this post, although they later quit.

Matses language evidentials and epistemic modality

Here, I've summarized and quoted key relevant sections from the Fleck's thesis which are relevant to our discussion of epistemic certainty. It's possible that there are important parts of Fleck's thesis that I've missed.

There's also a more recent paper by Fleck from 2007 specifically focusing on evidentials: https://www.jstor.org/stable/40070903. I haven't read this.

The remainder of this post is fairly long, so if you're not interested in learning more about Matses, don't feel like you have to read this.

5.5.10 Derivational Suffix -chit 'Uncertainty' (page 388, page 410 in the PDF):

This suffix expresses uncertainty, and cannot be used with the uncertainty particle ada/-da. Some example sentences:

It is possibly in tree cavities that the porcupine gives birth.
But others probably drink mashed fruit drink.
Perhaps he died.

5.6.1 Past tenses and evidentiality - page 397 (page 419 in the PDF):

Evidential devices in languages often code multiple types of information with respect to the reliability of the information being related by the speaker. These include: i) source of knowledge (e.g., hearsay); ii) degree of precision or truth, iii) probability of the information being true, and iv) the speaker's expectations concerning the probability of the statement.

What is different about Matses is that it code evidentiality (source of knowledge) in one set of verbal inflectional suffixes, and epistemic modality (certainty/validation/commitment to truth of statement) by various other means, such as a verbal suffix that comprises a different position class, an uncertainty particle, and a dubitative particle (inflectional suffixes only code uncertainty in nonpast and future tenses). Thus, there is no need for Matses speakers to extend the function of evidentials to code epistemic modality because there are other markers that do this, which are outside the Matses evidential system. (page 398, 420 in the PDF)

Evidentiality is coded only in verbal suffixes that also mark past tense. Nine different tense-evidentiality inflectional suffixes mark a combination of one of three evidential distinctions (direct experience, inference, and conjecture) and one of three past tense distinctions (recent past, distant past and remote past). There are no past tense inflections besides these nine suffixes, so every time a speaker reports a past event, he must reveal the source of knowledge. This contrasts with epistemic modality, which is not an obligatory category in any tense (e.g., the absence of -chit 'Uncertainty' does not necessarily imply certainty; section 5.5.10). There is no morphological means of marking hearsay--it must be reported via direct quotation with the quotative verb inflected with one of the nine tense-evidentiality suffixes. Mythical and historical past is marked with the recent past inferential suffix, often reported in a quotative sentence.

Experiential suffixes

The three experiential suffixes are:

-o 'Recent Past: Experiential': immediate past to about one month ago
-onda 'Distant Past: Experiential': about one month to about 50 years ago
-denne 'Remote Past: Experiential': about 50 to about 100 years ago (approximate maximum human life span)

experiential refers to a situation where the speaker detects the occurrence of an event at the time that it transpires (or a state at the time that it holds true).

Inferential suffixes

The four inferential suffixes are (page 406, page 428 in the PDF):

-ac 'Recent Past: Inferential': immediate past to about one month ago
-nëdac 'Distant Past: Inferential': about one month ago to speaker's infancy
-ampic 'Remote Past: Inferential': before speaker's infancy
-nëdampic 'Remote Past: Inferential': before speaker's infancy. It's unclear if this differs from -ampic or is used for events even further in the past.

Some examples of inferential statements (page 406, 428 in the PDF):

Davy (evidently) left to go to sleep.
Davy (evidently) left, probably because he got bored.

with first-person forms, narrative and mythical past, ancient habitual activities, etc., where evidential distinctions are difficult or not relevant, inferential suffixes are used as a default. (page 424, page 446 in the PDF)

Conjecture suffixes

The two conjecture suffixes are (page 417, page 439 in the PDF):

-ash 'Recent Past Conjecture': immediate past to about one month ago
-nëdash 'Distant Past Conjecture': more than about one month ago

What is meant by "conjecture" here is that the speaker wishes to report the occurrence of an event or state that he did not witness, did not hear about from somebody else, and for which there is no resulting evidence. For example, if a dog is missing, the owner might conjecture that a snake bit it or a jaguar ate it.

Examples of conjectural statements:

Davy probably slept.
I suppose Davy got bored.

History/mythology

Statements about history/mythology from the "inaccessible past" use the suffix (-pa)-ac (cadennec) (page 424, page 446 in the PDF).

Hearsay

Hearsay is marked by a direct quote + quotative (ca or que) + any experiential or conjecture suffix (page 424, page 446 in the PDF).

9.4 Clause-level particles - page 724 (page 745 in the PDF):

ada/-da: a clause-level particle to express uncertainty. E.g., "I doubt if you're going to come here".
ba: a clause-level particle to express doubt. E.g., "ada debi cho-pa-ash ba" "I'm not sure if Davy has arrived."

mikbp @ 2021-06-14T12:35 (+1)

You blew my mind, thanks!

technicalities @ 2021-06-09T09:39 (+2)

That's cool! I wonder if they suffer from the same ambiguity as epistemic adjectives in English though* (which would suggest that we should skip straight to numerical assignments: probabilities or belief functions).

Anecdotally, it's quite tiring to put credence levels on everything. When I started my blog I began by putting a probability on all major claims (and even wrote a script to hide this behind a popup to minimise aesthetic damage). But I soon stopped.

For important things (like Forum posts?) it's probably worth the effort, but even a document-level confidence statement is a norm with only spotty adoption on here.

https://hbr.org/2018/07/if-you-say-something-is-likely-how-likely-do-people-think-it-is

michaelchen @ 2021-06-23T16:04 (+2)

Something similar to explicit credence levels on claims is how Arbital has inline images of probability distributions. Users can vote on a certain probability and contribute to the probability distribution.

michaelchen @ 2021-06-11T23:29 (+2)

Anecdotally, it's quite tiring to put credence levels on everything. When I started my blog I began by putting a probability on all major claims (and even wrote a script to hide this behind a popup to minimise aesthetic damage). But I soon stopped.

Interesting! Could you provide links to some of these blog posts?

technicalities @ 2021-06-12T18:53 (+2)

Seems I did this in exactly 3 posts before getting annoyed.

mikbp @ 2021-06-09T12:36 (+2)

I think one very cool feature of having something like this embedded in the language is that you learn to do it automatically. I can think about a couple of examples now:
- Cases: in English, Catalan or Spanish, one does not indicate the case of a substantive, but in German it is done. This makes learning German more difficult, but if you are native or after practising a lot, it becomes automatic.

- Directions: I recall having read about a language that does not give relative directions (right, left) but absolute ones (East, West). That sounds like a very difficult thing to do for us, but for the people who speak that language it comes natural.

My guess is that if we'd fluently speak Maltés, it would be just as natural for us to indicate the degree of certainty. And that would be very cool :-)