FormosanBank
English
English
  • Welcome
  • Background
    • Formosan Languages
    • Why Formosan?
    • FormosanBank
    • Contributors
  • The Bank Architecture
    • FormosanBank XML Format
    • Formosan Dialects
    • Corpora
      • ePark
      • ILRDF Dictionaries
      • Wikipedias
      • Presidential Apologies
      • NTU Paiwan ASR
      • Virginia Fey's Amis Dictionary
      • Paiwan Stories
    • Developers
      • 🤗HuggingFace
      • Folder structure
  • Additional Resources
    • Newsletters
      • October 2024
      • Septemper 2023
    • Publications
    • Terms of Use
    • Contributing to FormosanBank
Powered by GitBook
On this page
  • Overview
  • Corpus Processing
  • Access Details
  • Copyrights
  • Citation
  1. The Bank Architecture
  2. Corpora

Paiwan Stories

Overview

The Paiwan Stories corpus is a small collection of children’s stories told in Eastern Paiwan. Drawn from three storybooks, these narratives provide valuable cultural and linguistic insights into the Paiwan language as it is used in accessible, community-oriented contexts. Their inclusion in FormosanBank broadens the range of available materials, offering a glimpse into traditional knowledge, everyday life, and childhood learning in a Paiwan-speaking environment.

Source Materials

  • giling, gesi & giling tjaiwan (2020) vuvu 的寶物 / kavatjes ni vuvu

  • giling, gesi & giling tjaiwan (2021) dingding蝸牛

  • giling, gesi & giling tjaiwan (2022) maljialjian a qaciljay


Corpus Processing

The Paiwan Stories were integrated into FormosanBank’s standardized XML format to ensure consistency and ease of access.

Processing Notes

  • Manual Conversion: Due to the small size and narrative complexity of these texts in addition to having the source material in PDFs which isn't the easiest to parse, the XML was created by hand rather than through automated scripts.

  • Cleaning and Standrdization: This isn't necessary because everything was already standardized when it was processed manually.

  • Minimal Editing: Orthographic and formatting adjustments were kept to a minimum, preserving the authors’ original representations of the language. Light cleaning, such as removing empty elements and standardizing punctuation, was applied to maintain data consistency.


Access Details


Copyrights

Stories can be used under Creative Commons CC-BY-NC license


Citation

  • Juan, T. F., & Ruan, X. (2024). Corpus of Paiwan stories [Electronic resource].

PreviousVirginia Fey's Amis DictionaryNextDevelopers

Last updated 4 months ago

The repo containing the original stories, the XML corpus in FormosanBank as well as the code to reconstruct the corpus can be found .

In accordance with our , if you use this corpus or any product derived from this corpus in any publication, you must cite both FormosanBank and:

here
Terms of Use