The Pangloss Collection: an archive of the world's languages
The Collection Pangloss[1] (https://pangloss.cnrs.fr) is an online multimedia archive of texts (sound or video recordings, transcriptions) from 170 languages around the world, including a majority of languages spoken in small communities, often threatened with abandonment by their speakers, and for which no other testimony exists. Today, almost 3,000 documents can be freely consulted online. In 2020, the interface will be overhauled to facilitate access for the widest possible audience.
These testimonials are destined to play a major role in safeguarding the world's linguistic heritage. Less spectacular and less publicized than the degradation of biodiversity, the degradation of linguistic diversity - it is estimated that 50% of the languages spoken today will have disappeared by the end of this century, due to the adoption of more widely-spoken languages - threatens to drastically reduce the diversity of linguistic forms on which any linguistic generalization can be built. This linguistic diversity is also one of the richest and most complex testimonies to human cognition. The Pangloss Collection therefore plays a major role in the race against time to document endangered languages before they become extinct, and relies on the work of numerous linguists who document contemporary languages through so-called "field" surveys in linguistic communities.
.
Created in 1995, renamed Collection Pangloss[2] in 2012, this archive has experienced vigorous growth in recent years. Between 2012 and 2017, the number of resources (audio or video files) more than doubled. In 2016, the Pangloss site saw its ergonomics redesigned: creation of an interactive map, new search engine, translation of the interface into English, archiving of video files. Finally, the archive now offers, alongside textual corpora, electronic dictionaries (Lexica project[3] - http://lacito.vjf.cnrs.fr/pangloss/dictionaries). The Pangloss Collection relied early on technologies - such as the XML ecosystem - that are now at the heart of the digital humanities.
The data integrated were initially those of researchers at LACITO (UMR 7107, CNRS/Sorbonne Nouvelle-Paris 3/Inalco), where the collection was founded, but it now welcomes (and encourages) deposits from researchers of all affiliations. An effort is now being made to base the collection more firmly on all the laboratories contributing to the documentation of linguistic diversity in France. Other large compendia exist abroad, aggregating the results of linguistic research. The originality of the Pangloss Collection lies in the fact that it is freely accessible, with no restrictions whatsoever, and offers both multimedia testimonials (recordings, video recordings) and interlinear transcriptions (morphemes by morphemes) of entire texts. Because of this ease of access to transcribed data, the Pangloss Collection is used in numerous scientific publications.
Sylvain Loiseau
Lecturer at the University of Paris XIII
Member of Lacito (Langues et civilisations à traditions orales) - UMR 7107
https://lacito.vjf.cnrs.fr/membres/loiseau.htm
Published texts
Michailovsky, Boyd, Martine Mazaudon, Alexis Michaud, Séverine Guillaume, Alexandre François & Evangelia Adamou. 2014. "Documenting and Researching Endangered Languages: The Pangloss Collection". Language Documentation & Conservation, 8 (2014), 119-135.
[http://hdl.handle.net/10125/4621]
Michaud, Alexis, Séverine Guillaume, Guillaume Jacques, Dang-Khoa Mac, Michel Jacobson et al. 2016. "Contributing to the united progress of research and documentation: the Pangloss Collection and the AuCo Collection". In Journées d'Etude de la Parole 2016, July 2016, Paris, France. 1, 155-163, 2016, Proceedings of the JEP-TALN-RECITAL 2016 joint conference, volume 1: Journées d'Etude de la Parole. [https://halshs.archives-ouvertes.fr/halshs-01341631/document]
[1]https://pangloss.cnrs.fr/
[2]The name of Voltaire's character is composed of two Greek words meaning "all" and "language".
[3]http://lacito.vjf.cnrs.fr/pangloss/dictionaries