Next session of the Linguistic Theories and Data seminar

5 October 2022
  • Doctoral school

  • SeDyL

  • Search

Building the first digitalised learner corpus for Romanian (LECOR). Difficulties and challenges.
Preparing a critical edition of a 19th-century Romanian-Romani dictionary.
Logo_TheoDon
Théories et données linguistiques © A. Donabédian‎
Contenu central

Building the first digitalised learner corpus for Romanian (LECOR).
Difficulties and challenges

Mihaela Cristescu, University of Bucharest
Carmen Mîrzea Vasile, University of Bucharest/"Iorgu Iordan - Al. Rosetti" Institute of Linguistics, Romanian Academy

A learner corpus for Romanian is about to be built at the University of Bucharest, through the project Learner Corpus of Romanian (LECOR). Collection, Annotation and Applications (PN-III-P1-1.1-TE-2019-1066, funded by UEFISCDI, 2022-2024). The main goal of the LECOR project is to build and exploit the first digitalised learner corpus for Romanian, scalable and available in open-access format. The presentation will include general administrative and scientific data about the project and the current status of the project activities. A special attention will be paid to various types of difficulties encountered: difficulties during gathering the learners' samples (written texts and spoken samples) and while recording learner- and task- variables (sociolinguistic information about the learner, the type of text and the circumstances in which it was produced); difficulties related to the morpho-syntactic and syntactic annotation (unclear text segments, errors falling into several categories, etc.); ethical and motivational issues. The presentation will contain various examples and will try also to show how profitable it is to have a corpus like LECOR for research and, finally, to improve teaching and learning Romanian as a foreign language.

Elaboration of the critical edition of a nineteenth-century Romanian-Romani dictionary

Julieta Rotaru, INALCO
Aurore Tirard, INALCO

We present the work that led to the edition of a monograph on the beginnings of Romani lexicography in Romania (Rotaru, Tirard and Shapoval 2022). This is a critical edition of a Romanian-Romani dictionary written in the 1870s by Vasile Pogor (1833-1906), a descendant of an ancient Moldavian aristocratic family. Our edition includes a grammatical description of the linguistic material, biographical notes on the author and an extensive bibliography of his works. All entries in the Romanian-Romani dictionary have been translated into English. Two reverse dictionaries (Romani-English and English-Romani) have been added as dictionaries of the entries under review. The volume's authors have chosen to reject many entries copied by the Moldavian author from other, less credible sources, and to retain certain dubious entries, indicating them with a question mark. From a linguistic point of view, the preparation of the critical edition posed a number of problems of a graphmatic and phonological nature. It was very difficult to determine the dialect described by the dictionary, due to the diversity of the author's sources: first-hand field data, but also copies of other authors who had worked on very diverse dialects, such as Grellmann (1783), de Réart (1835), de Rochas (1876), Vaillant (1844, 1868).