TuDeT: Tupían Dependency Treebank

TuDeT (Tupían Dependency Treebank) is a collection of treebanks of Tupían languages originally part of Universal Dependencies. This database displays the sentences, visualization of the dependencies, glossings, and translations in English and Portuguese. Some of the main goals include: application of treebanks in the development of pedagogical materials (also digital) together with indigenous teachers, documentation, maintenance, and revitalization of the languages, as well as the use of the treebanks for linguistic analyses.

The treebanks are in their initial phase of development, in which sentences are being manually annotated. Tools are being developed to speed up the POS-tagging and to allow for partial automatic annotation. For detailed information about each treebank, the annotations, and sources, refer to the links to the GitHub repositories.

Language Treebank in TuDeT Number of sentences Repository on GitHub
Akuntsu Akuntsu Treebank 243 https://github.com/UniversalDependencies/UD_Akuntsu-TuDeT/
Guajajara Guajajara Treebank 1126 https://github.com/UniversalDependencies/UD_Guajajara-TuDeT/
Ka'apor Ka'apor Treebank 83 https://github.com/UniversalDependencies/UD_Kaapor-TuDeT/
Karo Karo Treebank 674 https://github.com/UniversalDependencies/UD_Karo-TuDeT/
Makurap Makurap Treebank 31 https://github.com/UniversalDependencies/UD_Makurap-TuDeT/
Mundurukú Mundurukú Treebank 158 https://github.com/UniversalDependencies/UD_Munduruku-TuDeT/
Teko Teko Treebank 100 https://github.com/UniversalDependencies/UD_Teko-TuDeT/
Tupinambá Tupinambá Treebank 546 https://github.com/UniversalDependencies/UD_Tupinamba-TuDeT/