Wikipedia and the “Dialects of Italy”

The Italian Wikipedia has reached a number of 1,215,574 articles, with 1,247,172 registered users (source: Italian Wikipedia Statistics). That’s more or less one user per article. The Italian encyclopaedia counts an average of 358,814 views per hour (source: Wikipedia Statistics).

Nevertheless, the linguistics presence suffers from a variety of cyber-diseases. According to the figures described on the page of the Italian Linguistics WikiProject, out of 890 articles, 250 don’t cite proper sources for their information and about 270 are stubs that need to be expanded.

Map showing the dialects of Italy.
Dialects of Italy (source: Wikimedia Commons)

But one topic in particular has been recently called to our attention (I’m a Wikipedia editor myself): the “dialects of Italy.

The problem is twofold: on the one hand, “dialects of Italy” refers to heterogeneous entities and, on the other, there is no agreement on how to define a “dialect” (nor a “language,” for that matter; see Cysouw and Good’s article about this). This is further complicated by the rooted misunderstanding of linguistics facts and concepts that the general public usually has.

All of this, together with the “poor sourcing” problem, led to chaos: see the lengthy discussion about what to call a dialect and what a language (the title Un argomento cruciale, ‘A critical topic,’ should be enough for grasping the seriousness of the matter).

Some time ago, a decision has been made to use the term “language” for any linguistic entity that had a code from the ISO 639-3 (to be sure, that’s from Ethnologue, by SIL), and “dialect” for entities without a code. This created an inconvenient precedent: since no Italian scientific source uses labels like “Piedmontese language” (lingua piemontese) or “Calabrian language” (lingua calabrese), Wikipedia came to be a primary source of this use, thus contradicting one of the main tenets of the open encyclopaedia. That is: no original research (NOR).

The reaction of the laypeople is: “Wikipedia calls it a language, so it is indeed.” This is backed up by the (misled but widespread) notion of dialect as a “lower and/or corrupted variety of Italian,” further strengthened by the idea that Wikipedia is authoritative with no exceptions. But let me stress it again, no academic source labels the mentioned linguistic entities of Italy as “languages.” Sorry folks.

But why is that? Well, tradition. The dialects of Italy have been called that since long and it would be extremely difficult to ask Italian linguists to stop calling them that way. However, no (sensible) linguist would consider them as corruptions of the Italian language.

As an emblematic example, Loporcaro (2009:4–5) succinctly gives us an interesting perspective: “Derivando indipendentemente dal latino, i dialetti come il padovano, il napoletano ecc. sono lingue sorelle dell’italiano.” (trad: “Independently deriving from Latin, the dialects such as Paduan, Neapolitan, etc. are sister languages of Italian.”). So, Loporcaro is aware (no doubt he would) that what are normally labelled as “dialects” are languages. What else could they be? Technically speaking, any linguistic system is a “language”. However, he goes on calling them “dialects.” The point is, no one has ever called the dialects of Italy using the apposition “language”: so no “Paduan language,” nor “Neapolitan language.” Hence, this use in Wikipedia is totally unjustified.

Luckily, the trend in the encyclopaedia of abusing of the ISO 639 is in the process of being counteracted and some Wikipedians are making an effort to take the situation back under control. In the meantime, read the Cysouw and Good’s article mentioned early if you didn’t already.


Cysouw, Michael & Jeff Good. 2013. Languoid, doculect, and glossonym: Formalizing the notion ‘language’. Language Documentation and Conservation 7. 331–359.

Loporcaro, Michele. 2009. Profilo linguistico dei dialetti italiani. Bari: Laterza.