Title page for etd-0624118-155432


[Back to Results | New Search]

URN etd-0624118-155432
Author Tou-Hsiang Hsu
Author's Email Address No Public.
Statistics This thesis had been viewed 5392 times. Download 498 times.
Department Information Management
Year 2017
Semester 2
Degree Master
Type of Document
Language English
Title A Research On Cross-Lingual Topic Analysis
Date of Defense 2018-07-23
Page Count 63
Keyword
  • Cross-lingual topic model
  • Topic modeling
  • Polylingual topic model
  • Parallel corpus
  • word vector space
  • LDA
  • Abstract Most of cross-lingual topic models in the previous work rely on the parallel or comparable corpus. The polylingual topic model (PLTM) proposed by Mimno et al (2009) is the most representative one. However, parallel or comparable corpus like Europarl and Wikipedia are not available in many cases. In this thesis, we propose a method combining the techniques of mapping word vector spaces between languages and topic modeling (LDA). The cross-lingual word vector mapping enables us to map word vector spaces, and LDA helps us group words into topics. Thus, we combine two techniques to construct the cross-lingual topic model.
    In contrast to PLTM, our proposed approach does not need the comparable or parallel corpus to construct the cross-lingual topic model and identify the topics discussed only in a single language.
    We compare the performance of PLTM and our approach using UM-corpus (Tian, L et al., 2014), an English-Chinese bilingual corpus. The results of the evaluations show that our proposed approach could align the topics across languages properly and the performance is comparable with the PLTM.
    Advisory Committee
  • Chih-Ping Wei - chair
  • Wen-Chun Ni - co-chair
  • San-Yih Hwang - advisor
  • Files
  • etd-0624118-155432.pdf
  • indicate access worldwide
    Date of Submission 2018-07-24

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys