> For the complete documentation index, see [llms.txt](https://ai4commsci.gitbook.io/formosanbank/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai4commsci.gitbook.io/formosanbank/the-bank-architecture/corpora/yeddapalemeqblog.md).

# Yedda Palemeq Blog

### Overview

Yedda Palemeq, a Paiwan-speaker and linguist, maintained a [blog](https://yeddapalemeq.blogspot.com/) for several years in which she recorded and glossed short Paiwan texts, usually just one or two sentences. In aggregate, this is a non-trivial amount of material. She drew many of the examples from texts that are already included in FormosanBank, so de-duplication of text will remove some. However, the recordings are all novel and do not appear elsewhere.

***

### Corpus Statistics

|                           | <p>Paiwan<br>Southern</p> |
| ------------------------- | ------------------------- |
| Word count                | 5,878                     |
| Total audio               | 1.2h                      |
| Transcribed               | 1.2h                      |
| Untranscribed             | 0                         |
| Translated words          |                           |
| English                   | 5,863                     |
| Mandarin                  | 0                         |
| Morphologically segmented | 5,641                     |
| Glossed words             | 0                         |

***

### **Access Details**

* The repo containing the ePark corpus in FormosanBank as well as the code to reconstruct the corpus can be found [here](https://github.com/FormosanBank/FormosanBank/tree/main/Corpora/YeddaPalemeqBlog).

***

### **Corpus Notes**

* The scrape was not completely successful and a few blog posts are not included.
* Sometimes the same word is spelled differently in the main text and the glosses. We have assumed that the main text is correct and the gloss is incorrect. In those cases, the gloss is ignored.
* Examples are not necessarily glossed word-by-word; repeated words and some common function words are not always listed. We did our best to automatically match glosses within that specific example; we do not reuse glosses from other examples, since we cannot be sure that they are contextually correct. If no gloss can be found, the element'sis just a copy of the word in the example and no gloss is provided (there is no element).
* Words are segmented in the glosses. These segments are preserved in the elements. Note that the author did not provide morpheme-by-morpheme glosses, so no glossing is provided for individual morphemes. Not also that if we could not find a gloss for a word, it is assumed to be monomorphemic. This is almost certainly incorrect in some cases.

***

### Copyright

CC BY-NC 4.0

***

### Citation

In accordance with our [Terms of Use](/formosanbank/additional-resources/terms-of-use.md), if you use this corpus or any product derived from this corpus in any publication, you must cite both FormosanBank and:

* Palemeq, Y. (2021). Yedda Palemeq. Retrieved May 19, 2026, from <https://yeddapalemeq.blogspot.com/>