# ePark

### Overview

[The ePark corpus](https://web.klokah.tw/) is a comprehensive and interactive resource for the preservation, learning, and revitalization of Indigenous languages of Taiwan. Developed by the Indigenous Languages Research and Development Foundation ([ILRDF](https://www.ilrdf.org.tw/)), this digital platform caters to a wide audience, including preschoolers, students, adults, and language teachers, with resources and tools designed to support various learning levels and linguistic goals. The corpus is available in all recognized 42 official dialects across the 16 different Formosan language. The corpus is invaluable for documenting and preserving linguistic diversity, and it includes text, audio recordings, and translations, making it a comprehensive resource for research, education, and language revitalization.

***

### Corpus Statistics

|                           | <p>Amis<br>Coastal</p> | <p>Amis<br>Hengchun</p> | <p>Amis<br>Malan</p> | <p>Amis<br>Southern</p> | <p>Amis<br>Xiuguluan</p> | <p>Bunun<br>Junqun</p> | <p>Bunun<br>Kaqun</p> | <p>Bunun<br>Luanqun</p> | <p>Bunun<br>Tanqun</p> | <p>Bunun<br>Zhuoqun</p> | Kavalan | <p>Rukai<br>Dawu</p> | <p>Rukai<br>Dona</p> | <p>Rukai<br>Eastern</p> | <p>Rukai<br>Maolin</p> | <p>Rukai<br>Wanshan</p> | <p>Rukai<br>Wutai</p> | <p>Paiwan<br>Central</p> | <p>Paiwan<br>Eastern</p> | <p>Paiwan<br>Northern</p> | <p>Paiwan<br>Southern</p> | <p>Puyuma<br>Jianhe</p> | <p>Puyuma<br>Nanwang</p> | <p>Puyuma<br>Xiqun</p> | <p>Puyuma<br>Zhiben</p> | Thao   | Saaroa | Sakizaya | Yami   | <p>Atayal<br>FourSeasons</p> | <p>Atayal<br>Sekolik</p> | <p>Atayal<br>Wanda</p> | <p>Atayal<br>Wenshui</p> | <p>Atayal<br>YilanZeaol</p> | <p>Atayal<br>Zeaol</p> | <p>Seediq<br>DeluValley</p> | <p>Seediq<br>Duda</p> | <p>Seediq<br>Tegudaya</p> | Truku  | Tsou   | Kanakanabu | Saisiyat |
| ------------------------- | ---------------------- | ----------------------- | -------------------- | ----------------------- | ------------------------ | ---------------------- | --------------------- | ----------------------- | ---------------------- | ----------------------- | ------- | -------------------- | -------------------- | ----------------------- | ---------------------- | ----------------------- | --------------------- | ------------------------ | ------------------------ | ------------------------- | ------------------------- | ----------------------- | ------------------------ | ---------------------- | ----------------------- | ------ | ------ | -------- | ------ | ---------------------------- | ------------------------ | ---------------------- | ------------------------ | --------------------------- | ---------------------- | --------------------------- | --------------------- | ------------------------- | ------ | ------ | ---------- | -------- |
| Word count                | 58,782                 | 36,750                  | 36,137               | 36,328                  | 37,157                   | 52,951                 | 29,654                | 30,595                  | 29,721                 | 28,834                  | 55,427  | 29,992               | 28,565               | 30,151                  | 27,693                 | 24,350                  | 54,316                | 34,761                   | 35,999                   | 59,628                    | 38,127                    | 35,383                  | 65,251                   | 37,157                 | 39,643                  | 55,061 | 42,794 | 53,556   | 63,454 | 38,172                       | 59,304                   | 31,056                 | 37,004                   | 40,229                      | 39,479                 | 39,861                      | 41,788                | 57,748                    | 56,374 | 56,625 | 48,937     | 50,597   |
| Total audio               | 13.4h                  | 10.2h                   | 8.4h                 | 9.2h                    | 9.9h                     | 16.2h                  | 8.0h                  | 7.9h                    | 8.6h                   | 8.1h                    | 13.1h   | 9.5h                 | 8.9h                 | 9.4h                    | 8.8h                   | 8.4h                    | 15.9h                 | 9.4h                     | 10.0h                    | 13.9h                     | 10.5h                     | 9.4h                    | 19.2h                    | 9.9h                   | 10.8h                   | 11.5h  | 16.0h  | 14.0h    | 13.9h  | 9.2h                         | 14.0h                    | 8.8h                   | 9.5h                     | 10.3h                       | 9.9h                   | 11.7h                       | 10.1h                 | 12.4h                     | 13.7h  | 15.2h  | 16.1h      | 12.5h    |
| Transcribed               | 13.4h                  | 10.2h                   | 8.4h                 | 9.2h                    | 9.9h                     | 16.2h                  | 8.0h                  | 7.9h                    | 8.6h                   | 8.1h                    | 13.1h   | 9.5h                 | 8.9h                 | 9.4h                    | 8.8h                   | 8.4h                    | 15.9h                 | 9.4h                     | 10.0h                    | 13.9h                     | 10.5h                     | 9.4h                    | 19.2h                    | 9.9h                   | 10.8h                   | 11.5h  | 16.0h  | 14.0h    | 13.9h  | 9.2h                         | 14.0h                    | 8.8h                   | 9.5h                     | 10.3h                       | 9.9h                   | 11.7h                       | 10.1h                 | 12.4h                     | 13.7h  | 15.2h  | 16.1h      | 12.5h    |
| Untranscribed             | 0                      | 0                       | 0                    | 0                       | 0                        | 0                      | 0                     | 0                       | 0                      | 0                       | 0       | 0                    | 0                    | 0                       | 0                      | 0                       | 0                     | 0                        | 0                        | 0                         | 0                         | 0                       | 0                        | 0                      | 0                       | 0      | 0      | 0        | 0      | 0                            | 0                        | 0                      | 0                        | 0                           | 0                      | 0                           | 0                     | 0                         | 0      | 0      | 0          | 0        |
| Translated words          |                        |                         |                      |                         |                          |                        |                       |                         |                        |                         |         |                      |                      |                         |                        |                         |                       |                          |                          |                           |                           |                         |                          |                        |                         |        |        |          |        |                              |                          |                        |                          |                             |                        |                             |                       |                           |        |        |            |          |
| English                   | 9,729                  | 9,440                   | 9,306                | 10,325                  | 9,341                    | 8,074                  | 8,113                 | 8,824                   | 8,121                  | 8,540                   | 8,867   | 7,573                | 7,949                | 8,185                   | 7,318                  | 6,526                   | 7,547                 | 10,307                   | 9,304                    | 10,004                    | 9,833                     | 9,224                   | 9,436                    | 9,889                  | 10,142                  | 9,302  | 7,646  | 9,095    | 9,653  | 9,457                        | 10,216                   | 9,144                  | 10,613                   | 9,497                       | 9,717                  | 9,343                       | 10,020                | 8,909                     | 9,392  | 9,166  | 7,695      | 8,959    |
| Mandarin                  | 58,782                 | 36,750                  | 36,137               | 35,995                  | 36,815                   | 52,951                 | 29,654                | 30,595                  | 29,721                 | 28,812                  | 55,414  | 29,992               | 28,247               | 30,115                  | 27,529                 | 24,299                  | 54,302                | 34,761                   | 35,822                   | 59,417                    | 38,073                    | 35,365                  | 65,175                   | 37,149                 | 39,643                  | 55,061 | 42,794 | 53,356   | 63,454 | 38,138                       | 58,972                   | 31,008                 | 37,004                   | 40,229                      | 39,479                 | 39,740                      | 41,723                | 57,748                    | 56,374 | 56,621 | 48,931     | 50,581   |
| Morphologically segmented | 0                      | 0                       | 0                    | 0                       | 0                        | 0                      | 0                     | 0                       | 0                      | 0                       | 0       | 0                    | 0                    | 0                       | 0                      | 0                       | 0                     | 0                        | 0                        | 0                         | 0                         | 0                       | 0                        | 0                      | 0                       | 0      | 0      | 0        | 0      | 0                            | 0                        | 0                      | 0                        | 0                           | 0                      | 0                           | 0                     | 0                         | 0      | 0      | 0          | 0        |
| Glossed words             | 0                      | 0                       | 0                    | 0                       | 0                        | 0                      | 0                     | 0                       | 0                      | 0                       | 0       | 0                    | 0                    | 0                       | 0                      | 0                       | 0                     | 0                        | 0                        | 0                         | 0                         | 0                       | 0                        | 0                      | 0                       | 0      | 0      | 0        | 0      | 0                            | 0                        | 0                      | 0                        | 0                           | 0                      | 0                           | 0                     | 0                         | 0      | 0      | 0          | 0        |

***

### **Access Details**

* Visit the ePark corpus online platform at <https://web.klokah.tw/>
* The repo containing the ePark corpus in FormosanBank as well as the code to reconstruct the corpus can be found [here](https://github.com/FormosanBank/FormosanBank/tree/main/Corpora/ePark).

***

### Acknowledgments

The ePark corpus was developed through collaboration between ILRDF, educators, linguists, and Indigenous communities. Without the tremendous effort and collaborations of these entities, it wouldn't have been possible to have such a valuable resource in FormosanBank.

***

### **Corpus Notes**

The corpus appears to use the standard orthography ("Ortho113" in FormosanBank nomenclature), with some exceptions:

* Ortho113 specifies that only the Nanshi dialect uses `u`, while the other dialects use `o`. However, there are a fair number of `o`s and `u`s throughout the dialect, irrespective of dialect. The `standard` tier normalizes these to match the standard orthography. For the IPA transcriptions of the `original` tier, we recognize either `o` or `u` as referring to the same phoneme. .

***

### Copyright

The content of the ePark corpus is released under the creative commons liscence (CC BY-NC-SA 4.0), and further info can be found here: <https://web.klokah.tw/creativeCommons/>

***

### Citation

In accordance with our [Terms of Use](https://ai4commsci.gitbook.io/formosanbank/additional-resources/terms-of-use), if you use this corpus or any product derived from this corpus in any publication, you must cite both FormosanBank and:

* Indigenous Languages Research and Development Foundation. (2020). *族語E樂園*. <https://web.klokah.tw/>
