> For the complete documentation index, see [llms.txt](https://ai4commsci.gitbook.io/formosanbank/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai4commsci.gitbook.io/formosanbank/the-bank-architecture/corpora/whitehorn-collection.md).

# Whitehorn Collection

## **Overview**

These are audio recordings contributed by John Whitehorn to the World Oral Literature project: <https://www.repository.cam.ac.uk/collections/1240188b-2f4c-401e-9372-58177440d9c6>. The recordings were mostly collected in the 1960s. Dates are available in the PDFs (found in the repository) but have not yet been incorporated into the XMLs.

The recordings are not transcribed, and many are songs.

They are mostly Paiwan but also some Atayal, Seediq, and Amis.

***

## Corpus Statistics

|                           | <p>Amis<br>unknown</p> | <p>Paiwan<br>North Western</p> | <p>Paiwan<br>Southern</p> | <p>Paiwan<br>unknown</p> | <p>Atayal<br>unknown</p> | <p>Seediq<br>unknown</p> |
| ------------------------- | ---------------------- | ------------------------------ | ------------------------- | ------------------------ | ------------------------ | ------------------------ |
| Word count                | 0                      | 0                              | 0                         | 0                        | 0                        | 0                        |
| Total audio               | 0.2h                   | 7.9h                           | 0.1h                      | 4.5h                     | 0.5h                     | 0.1h                     |
| Transcribed               | 0                      | 0                              | 0                         | 0                        | 0                        | 0                        |
| Untranscribed             | 0.2h                   | 7.9h                           | 0.1h                      | 4.5h                     | 0.5h                     | 0.1h                     |
| Translated words          |                        |                                |                           |                          |                          |                          |
| English                   | 0                      | 0                              | 0                         | 0                        | 0                        | 0                        |
| Mandarin                  | 0                      | 0                              | 0                         | 0                        | 0                        | 0                        |
| Morphologically segmented | 0                      | 0                              | 0                         | 0                        | 0                        | 0                        |
| Glossed words             | 0                      | 0                              | 0                         | 0                        | 0                        | 0                        |

***

## **Corpus Processing**

N/A

***

## **Corpus Notes**

* Recording quality is about what you would expect, given the age.
* Whitehorn sometimes specifies dialects, but not using the modern dialect distinctions. This will need to be standardized if possible.

***

## **Access Details**

* The repo containing the Whitehorn Collection in FormosanBank as well as PDFs of Whitehorn's notes can be found [here](https://github.com/FormosanBank/FormosanBank/tree/main/Corpora/Whitehorn_Collection).

***

## **Copyright**

CC BY-NC 4.0

***

## Citation

In accordance with our [Terms of Use](/formosanbank/additional-resources/terms-of-use.md), if you use this corpus or any product derived from this corpus in any publication, you must cite both FormosanBank and:

* Whitehorn, J. (n.d.). Whitehorn: Paiwan collection. Retrieved February 24, 2025, from <https://www.repository.cam.ac.uk/collections/1240188b-2f4c-401e-9372-58177440d9c6> and <https://www.repository.cam.ac.uk/collections/dc1f2cd8-36da-48e1-979b-fe478eff6a91>
