October 2024
Welcome to the October 2024 edition of the FormosanBank newsletter! We’ve made significant strides in our mission to preserve and revitalize the Indigenous Formosan languages. The FormosanBank project officially began September 1, 2023, and here’s a quick summary of our latest developments:
Project Progress We’re excited to report that the FormosanBank corpus now contains over 6 million words and 360 hours of audio, covering a wide range of linguistic data. We’re in the final stages of quality control and are preparing for the first official release.
Advances in Machine Translation Our initial machine translation efforts between English, Amis, and Paiwan languages have yielded promising results. BLEU scores indicate usable translations, and we're working on acquiring more training materials to improve accuracy further.
Automatic Speech Recognition (ASR) ASR technology continues to progress, with preliminary models achieving an error rate of under 50% for several languages, a notable milestone in endangered language transcription. We’re working with additional data to refine and enhance these results.
Historical Resource Digitization We’ve made strides in digitizing historical records, including OCR adaptations for Siraya. These efforts are part of our mission to make archival materials accessible for both research and education.
Computational Linguistics Research We’ve started computational studies on sentence structures and grammatical voice systems across Formosan languages, with initial findings expected soon.
For a more detailed overview, please find the complete October 2024 newsletter PDFs below:
Stay tuned for more updates as we continue expanding FormosanBank and advancing language preservation through innovative technology!
Last updated