Abstract
There is a small but growing literature on large-scale statistical modeling of Chineselanguage texts. Ouyang analyzed a corpus of over 40,000 ancient documents downloadedfrom multiple sources. This was used to plot the temporal distributions of word frequenciesand geographic distributions of authors. Huang and Yu modeled the SongCi poetry corpus,first converting it to tonally marked pinyin to conserve poetically important pronunciationinformation. Nichols and colleagues reported initial modeling of the Chinese Text Projectcorpus1 in a conference paper. (Further below, we describe differences between this corpusand the Handian.) With additional collaborators, this group has now conducted two studiesthat are currently unpublished but under review. In the first, they apply topic models toaddress scholarly questions about the relationships among important texts of AncientChinese philosophy. In the second, they use topic modeling to investigate the concepts ofmind and body in ancient Chinese philosophy. Although we share similar scholarlyobjectives with these researchers, our approach in this paper is unique in that for the firsttime anywhere we bring the benefits of computational modeling of ancient Chinese texts to a robust public platform that is mirrored on both sides of the Pacific. Besides being just auseful portal to the texts, our approach foregrounds the interpretive issues surrounding topic models, and makes more sophisticated exploration and analysis of interpretive questions possible for experts and novices alike.
How to Cite:
Allen, C., Luo, H., Murdock, J., Pu, J., Wang, X., Zhai, Y. & Zhao, K., (2017) “Topic Modeling the Hàn diăn Ancient Classics (汉典古籍)”, Journal of Cultural Analytics 2(1). https://doi.org/10.22148/001c.11882 (external link, opens in new tab).