A Structured Review of the Validity of BLEU – Ehud Reiter,

Posted Online September 21, 2018 https://doi.org/10.1162/coli_a_00322 © 2018 Association for Computational Linguistics Published under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license Abstract The BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems, especially in machine translation and natural language generation. I present a structured review of the evidence on whether BLEU is a valid evaluation technique—in other words, whether BLEU scores correlate with real-world utility and user-satisfaction of NLP systems; this review covers 284 correlations reported in 34 papers. Overall,…

Read More

A free encyclopedia of linguistics modeled on Wikipedia

URLGlottopedia is a freely editable encyclopedia for linguists by linguists that is currently being built up. It will contain dictionary articles on all technical terms of linguistics and is multilingual. In addition, there are survey articles, biographical articles and language articles, potentially on all linguists and all languages. Glottopedia articles also exist in German, Spanish, Italian, French, Russian, Danish, Swedish, Chinese, Japanese, Norwegian (Nynorsk), and in the future hopefully also in many other languages.

Read More

Lexicon Enhancement via the GOLD Ontology

URLLexicon Enhancement via the GOLD Ontology (LEGO) is a project funded by the National Science Foundation (BCS-0753321) to establish tools and standards to facilitate the sharing and interoperation of lexical data. It is implemented jointly by The LINGUIST List (currently at the Department of Linguistics at Indiana University, formerly at Eastern Michigan University) and The University at Buffalo. Read more… By default, all the lexicons added to the database after February 2015 are covered by the following Creative Commons license: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0, see the full license).…

Read More

ODIN

URLODIN, the Online Database of Interlinear text, is a repository of Interlinear Glossed Text (IGT) automatically extracted from scholarly linguistic papers. The repository is both broad-coverage, in that it contains data for a variety of the world’s languages (limited only by what data is available and what has been discovered), and rich, in that all data contained in the repository has been subject to linguistic analysis. IGT is a standard method of presenting language data, generally including at least a phonetic transcription of the language in question (line 1), a…

Read More