Responsive image
博碩士論文 etd-0118121-163425 詳細資訊
Title page for etd-0118121-163425
論文名稱
Title
基於在線式深度非負變分自編碼的主題演進探索
Topic Diffusion Discovery based on Online Deep Non-negative Variational Autoencoder
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
55
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2021-01-28
繳交日期
Date of Submission
2021-02-18
關鍵字
Keywords
網路分析、主題演進、主題模型、主題擴散、深度學習、變分自編碼器
Network Analysis., Topic Evolution, Topic Modeling, Topic Diffusion, Deep learning, Variational Autoencoder
統計
Statistics
本論文已被瀏覽 470 次,被下載 171
The thesis/dissertation has been browsed 470 times, has been downloaded 171 times.
中文摘要
現今資訊科技已改變人們的生活習慣,電腦及手持式行動裝置的普及讓我們可以隨時憑藉網路傳遞汲取大量的資訊,然而這類行為的改變,也意味著人們每天必須消化網路上難以負荷的龐大資料,當然不可能完全瞭解這些資料的內容,仰賴資料分類與搜索關鍵字的方式,可過濾出使用者想要的資料,然面對日益膨脹的資料量,日復一日更新的資料內容,單以人工方式進行資料分群與分類,不僅更為艱難,也無法達成目標,透過機器學習的方法協助進行相關工作也日漸普及。以文本而言,主題模型是著名的分類方式,運用文章的近似分佈或矩陣分解,將大量資料轉換成主題,成熟地幫助分類文章內容產生主題,但現實情況是資料或主題會隨著時間推進出現、更新或消失,如何完整地解釋主題改變的過程,即為本文所要探討的主題模型技巧。
本篇論文提出深度非負變分自編碼(Deep Non-negative Variational Autoencoder ,DNVAE)演算法,結合在線式模型,用以探索隨時間改變的主題,使用的文本係以機器學習為內容範疇的論文,實驗結果表明,透過我們提出的方法可以快速的找到各個時間點的主題,更透過主題網路圖、熱點圖及計算距離的等方法,進而達到解釋及探討主題演進的目標。
Abstract
Today, the storage type of books, newspapers, and magazines has changed from tangible papers to digital documents. A large number of documents are stored digitally, and it is time-consuming to classify documents/texts manually. Consequently, topic modeling techniques are commonly used to deal with this problem. However, topics are changing over time. Therefore, how to properly classify these documents with the diffusion of topics has been an important issue in recent years.
In this thesis, we propose a topic diffusion discovery approach able to deal with the evolutions/changes of topics. Considering that the inference method for the posterior probability is too complicated, for simplicity, we use a variational autoencoder variant to build the topic model with shared weights at different times, called Deep Non-negative Variational Autoencoder (DNVAE). Our proposed model with multi-layer structure is able to understand the evolution of topics. The generalized Jensen-Shannon divergence is to used to measure the magnitude of topic diffusion. And we present our approach with topic network diagrams to help understand the evolution of topics.
目次 Table of Contents
論文審定書 i
誌 謝 ii
摘 要 iii
Abstract iv
圖 次 vii
表 次 viii
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目的 2
第二章 文獻探討 3
2.1 主題模型Topic model 3
2.1.1時間序列主題模型Time series topic model 3
2.1.2非負矩陣分解Nonnegative Matrix Factorization(NMF) 4
2.1.3多層主題模型Multi-layer topic model 5
2.2深度學習Deep Learning 5
2.3 在線學習Online Learning 7
第三章 研究方法與步驟 8
3.1 研究方法 8
3.1.1 Topic model based on Variational Autoencoder 8
3.1.2 Online Deep Non-negative Variational Autoencoder(DNVAE) 11
3.2 評估標準 12
3.2.1評價詞彙擴散程度 12
3.2.2主題關聯性的可視化 13
3.3研究架構 14
第四章 實驗結果與討論分析 17
4.1資料整理 17
4.2 研究流程 18
4.3 研究過程 18
4.3.1 Raw data—Predict Topic and term 19
4.3.2 Visualization of Topic Relationship and Evolution 21
4.3.3 Term Evolution with DNVAE 23
4.4 研究分析 25
第五章 研究結論與建議 28
5.1 研究結論 28
第六章 參考文獻 29
參考文獻 References
Berthelot, D., Raffel, C., Roy, A., & Goodfellow, I. (2018). Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer. ArXiv:1807.07543 [Cs, Stat]. http://arxiv.org/abs/1807.07543
Blei, D. M. (2011). Introduction to Probabilistic Topic Models. 16.
Blei, D. M., Andrew Y. Ng, & Michael I. Jordan. (2003). Latent Dirichlet Allocation. 30.
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning - ICML ’06, 113–120. https://doi.org/10.1145/1143844.1143859
D. Falbel et al. (n.d.). keras: R Interface to “Keras". 2019.
Doersch, C. (2016). Tutorial on Variational Autoencoders. ArXiv:1606.05908 [Cs, Stat]. http://arxiv.org/abs/1606.05908
Dubey, A., Hefny, A., Williamson, S., & Xing, E. P. (2012). A non-parametric mixture model for topic modeling over time. ArXiv:1208.4411 [Stat]. http://arxiv.org/abs/1208.4411
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Greene, D., O’Callaghan, D., & Cunningham, P. (2014). How Many Topics? Stability Analysis for Topic Models. ArXiv:1404.4606 [Cs]. http://arxiv.org/abs/1404.4606
Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (n.d.). Hierarchical Topic Models and the Nested Chinese Restaurant Process. 8.
Grosse, I., Bernaola-Galvan, P., Carpena, P., Roman-Roldan, R., Oliver, J., & Stanley, H. E. (n.d.). Analysis of symbolic sequences using the Jensen-Shannon divergence. PHYSICAL REVIEW E, 16.
H. Wickham. (n.d.). ggplot2: Elegant graphics for data analysis. New York: Springer-Verlag New York, 2016.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Hoi, S. C. H., Sahoo, D., Lu, J., & Zhao, P. (2018). Online Learning: A Comprehensive Survey. ArXiv:1802.02871 [Cs]. http://arxiv.org/abs/1802.02871
Hung,S. (2020). Topic Evolution and Diffusion Discovery based on Online Deep Non-negative Autoencoder.
K. Karthik Ram, K. Karl Broman, and cre. (n.d.). aRxiv: Interface to the arXiv API. 2019.
Kang, Y., Cheng, I.-L., Mao, W., Kuo, B., & Lee, P.-J. (2019). Towards Interpretable Deep Extreme Multi-label Learning. ArXiv:1907.01723 [Cs, Stat]. http://arxiv.org/abs/1907.01723
Kang, Y., Lin, K.-P., & Cheng, I.-L. (2018). Topic Diffusion Discovery based on Sparseness-constrained Non-negative Matrix Factorization. ArXiv:1807.04386 [Cs, Stat]. http://arxiv.org/abs/1807.04386
Kang, Y., & Zadorozhny, V. (2016). Process Monitoring Using Maximum Sequence Divergence. Knowledge and Information Systems, 48(1), 81–109. https://doi.org/10.1007/s10115-015-0858-z
Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. ArXiv:1312.6114 [Cs, Stat]. http://arxiv.org/abs/1312.6114
Landauer, T. K. (Ed.). (2007). Handbook of latent semantic analysis. Erlbaum.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. https://doi.org/10.1038/44565
McCloskey, M., & Cohen, N. J. (n.d.). CATASTROPHIC INTERFERENCE IN CONNECTIONIST NETWORKS: THE SEQUENTIAL LEARNING PROBLEM. 57.
Ognyanova, K. (n.d.). Network visualization with R. 71.
Oring, A., Yakhini, Z., & Hel-Or, Y. (2020). Autoencoder Image Interpolation by Shaping the Latent Space. ArXiv:2008.01487 [Cs, Stat]. http://arxiv.org/abs/2008.01487
Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111–126. https://doi.org/10.1002/env.3170050203
Qin, Z., Yu, F., Liu, C., & Chen, X. (2018). How convolutional neural network see the world—A survey of convolutional neural network visualization methods. ArXiv:1804.11191 [Cs]. http://arxiv.org/abs/1804.11191
Roger, V., Farinas, J., & Pinquier, J. (2020). Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data. ArXiv:2003.04241 [Cs, Eess, Stat]. http://arxiv.org/abs/2003.04241
Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach (First edition). O’Reilly.
Song, H. A., & Lee, S.-Y. (2013). Hierarchical Representation Using NMF. In M. Lee, A. Hirose, Z.-G. Hou, & R. M. Kil (Eds.), Neural Information Processing (Vol. 8226, pp. 466–473). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-42054-2_58
Srivastava, A., & Sutton, C. (2017). Autoencoding Variational Inference For Topic Models. ArXiv:1703.01488 [Stat]. http://arxiv.org/abs/1703.01488
Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 952–961.
The R Core Team. (n.d.). R:A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing,2019.
Theis, L., Oord, A. van den, & Bethge, M. (2016). A note on the evaluation of generative models. ArXiv:1511.01844 [Cs, Stat]. http://arxiv.org/abs/1511.01844
Torfi, A., Shirvani, R. A., Keneshloo, Y., Tavaf, N., & Fox, E. A. (2020). Natural Language Processing Advancements By Deep Learning: A Survey. ArXiv:2003.01200 [Cs]. http://arxiv.org/abs/2003.01200
Tu, D., Chen, L., Lv, M., Shi, H., & Chen, G. (2018). Hierarchical online NMF for detecting and tracking topic hierarchies in a text stream. Pattern Recognition, 76, 203–214. https://doi.org/10.1016/j.patcog.2017.11.002
Wang, C., Blei, D., & Heckerman, D. (2015). Continuous Time Dynamic Topic Models. ArXiv:1206.3298 [Cs, Stat]. http://arxiv.org/abs/1206.3298
Wang, W., Gan, Z., Xu, H., Zhang, R., Wang, G., Shen, D., Chen, C., & Carin, L. (2019). Topic-Guided Variational Autoencoders for Text Generation. ArXiv:1903.07137 [Cs]. http://arxiv.org/abs/1903.07137
Wang, X., & McCallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’06, 424. https://doi.org/10.1145/1150402.1150450
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code