Page cover image

Welcome

Welcome to FormosanBank, a large-scale data-driven project dedicated to the preservation and revitalization of the Indigenous Formosan languages of Taiwan. These languages, which form a significant part of the Austronesian language family, are endangered, with some facing the risk of extinction. Our mission is to create a comprehensive, machine-readable corpus of these languages to support linguistic research, language education, and revitalization efforts.

Here, you'll find a description of the corpus collected and processed across the 16 official Formosan languages, which includes over 8 million tokens and over 730 hours of audio across the languages (detailed breakdown can be found here). You'd further find a detailed description of how the data is structured and the various way to access it. You can also access the github with all the work and data related to FormosanBank here and the Drive folder with all the audio files from here.

The large-scale nature of FormosanBank would not have been possible without the collaborative efforts of numerous individuals and organizations.

Principal Investigators

Advisory Board

And our many contributors.

Last updated