博碩士論文 etd-0619117-145750 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 顧哲文(Che-wen Ku) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Information Management)
畢業學位 碩士(Master) 畢業時期 105學年第2學期
論文名稱(中) 基於矩陣分解的主題推薦與發現
論文名稱(英) Topic Recommendation and Discovery based on Matrix Factorization
檔案
  • etd-0619117-145750.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:2 年後公開 (2019-07-19 公開)

    電子論文:使用者自訂權限:校內 1 年後、校外 2 年後公開

    論文語文/頁數 英文/49
    統計 本論文已被瀏覽 5573 次,被下載 44 次
    摘要(中) 隨著網路的發達,現在在網路上有越來越多的文件,因為很多的資訊都與文字息息相關,像是新聞或文章. 因此,有很多的學者利用這些文件來做文字分析. 而非負矩陣分解是一種用非機率的方法用來分解文集. 在這篇論文中,我們提出利用稀疏限制的非負矩陣分解來做k個主題的主題模型. 此外, 我們想在稀疏限制的非負矩陣分解中加入一個與作者相關的矩陣,並且找出在主題中隱藏的部分. 它可以給我們更多的資訊進而幫助我們找出來的主題更加集中.其中,決定主題個數k是一個困難但我們必須解決的問題,所以我們利用互資訊與穩定度來評估主題數k. 它可以提供對於主題數k的一個參考. 除此之外,我們想要利用Jensen-Shannon divergence來找出每個主題在不同時間裡面的詞語的改變. 他可以計算出主題之間的距離並且我們可以利用Hungarian algorithm找出不同時間中對應的主題.
    摘要(英) Nowadays, there are more and more text documents on the Internet with the development of the Internet, because much information is related to text. Thus, researchers have used these text documents for text analysis. Non-negative Matrix Factorization is a kind of non-probabilistic method to factorize the matrix. In this thesis, we propose to use sparse-constraint NMF to do topic modeling with k topics. Moreover, we want to incorporate author information into nsNMF and so as to find hidden parts in the topics. It can offer more information and make the topic more concentrated. Among it, how many topic k is also a critical but difficult issue. Here, we use the mutual information and stability to determine the number of topic k. Besides, we want to find the changes of terms in topics in different time using Jensen-Shannon divergence and use Hungarian algorithm to match the topics in different times.
    關鍵字(中)
  • 主題發現
  • 非負矩陣分解
  • 推薦
  • 主題模型
  • 關鍵字(英)
  • Non-negative Matrix Factorization
  • Topic Discovery
  • Recommendation
  • Topic Modeling
  • 論文目次 1. Introduction 1
    2. Background and Related works 4
    2.1 LDA 4
    2.2 SVD 5
    2.3 NMF 6
    3. Method 9
    3.1 How many topics k? 9
    3.2 Nonsmooth Non-negative Matrix Factorization (nsNMF) 12
    3.2 nsNMF with constraint 14
    3.3 Topic Discovery 15
    4. Experiment & Result 17
    4.1 Data and Preprocessing 17
    4.2 COOL3C news 19
    4.2.1 Document-Term Matrix 19
    4.2.2 TF-IDF 20
    4.2.3 SVD 21
    4.2.4 nsNMF 22
    4.2.5 How many topic k? 24
    4.2.6 Topic Modeling 25
    4.2.7 Article Recommendation 27
    4.3 arXiv.ML papers 29
    4.3.1 How many topic k? 29
    4.3.2 Topic Modeling 31
    4.3.3 nsNMF with constraint 32
    4.3.4 Topic Discovery 33
    5. Conclusion 35
    6. Reference 37
    參考文獻 Aggarwal, C. C., & Zhai, C. (2012). Mining text data. Springer Science & Business Media. Retrieved from https://www.google.com/books?hl=zh-TW&lr=&id=vFHOx8wfSU0C&oi=fnd&pg=PR3&dq=mutual+information+topic+modeling&ots=obag_JmIVy&sig=fQ_MXiuGSe8t_-QXuxA_1deQRg0
    Arora, S., Ge, R., & Moitra, A. (2012). Learning Topic Models - Going beyond SVD. arXiv:1204.1956 [Cs]. Retrieved from http://arxiv.org/abs/1204.1956
    Baker, L. D., & McCallum, A. K. (1998). Distributional clustering of words for text classification. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 96–103). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=290970
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
    Cai, D., He, X., Han, J., & Huang, T. S. (2011). Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1548–1560.
    Carmel, D., Yom-Tov, E., Darlow, A., & Pelleg, D. (2006). What makes a query difficult? In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 390–397). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1148238
    Choo, J., Lee, C., Reddy, C. K., & Park, H. (2013). Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Transactions on Visualization and Computer Graphics, 19(12), 1992–2001.
    Gillis, N. (2014). The why and how of nonnegative matrix factorization. Regularization, Optimization, Kernels, and Support Vector Machines, 12(257). Retrieved from https://www.google.com/books?hl=zh-TW&lr=&id=5Y_SBQAAQBAJ&oi=fnd&pg=PA257&dq=The+Why+and+How+of+Nonnegative+Matrix+Factorization&ots=nwGtxapMBn&sig=TnywuixkEgkwtbnH5t0n5wrj58Y
    Gong, L., & Nandi, A. K. (2013). An enhanced initialization method for non-negative matrix factorization. In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 1–6). https://doi.org/10.1109/MLSP.2013.6661949
    Greene, D., & Cross, J. P. (2016). Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. arXiv:1607.03055 [Cs]. Retrieved from http://arxiv.org/abs/1607.03055
    Greene, D., O’Callaghan, D., & Cunningham, P. (2014). How Many Topics? Stability Analysis for Topic Models. arXiv:1404.4606 [Cs]. Retrieved from http://arxiv.org/abs/1404.4606
    Grosse, I., Bernaola-Galván, P., Carpena, P., Román-Roldán, R., Oliver, J., & Stanley, H. E. (2002). Analysis of symbolic sequences using the Jensen-Shannon divergence. Physical Review E, 65(4), 41905.
    Langville, A. N., Meyer, C. D., Albright, R., Cox, J., & Duling, D. (2014). Algorithms, initializations, and convergence for the nonnegative matrix factorization. arXiv Preprint arXiv:1407.7299. Retrieved from https://arxiv.org/abs/1407.7299
    Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.
    Li, Z., Tang, Z., & Ding, S. (2013). Dictionary learning by nonnegative matrix factorization with 1/2-norm sparsity constraint. In Cybernetics (CYBCONF), 2013 IEEE International Conference on (pp. 63–67). IEEE. Retrieved from http://ieeexplore.ieee.org/abstract/document/6617435/
    Liu, J., Wang, C., Gao, J., & Han, J. (2013). Multi-view clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining (pp. 252–260). SIAM. Retrieved from http://epubs.siam.org/doi/abs/10.1137/1.9781611972832.28
    Pascual-Montano, A., Carazo, J. M., Kochi, K., Lehmann, D., & Pascual-Marqui, R. D. (2006). Nonsmooth Nonnegative Matrix Factorization (nsNMF). IEEE Trans. Pattern Anal. Mach. Intell., 28(3), 403–415. https://doi.org/10.1109/TPAMI.2006.60
    Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 952–961). Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=2391052
    Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp. 267–273). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=860485
    Zou, H., Zhou, G., & Xi, Y. (2011). Research on Modeling Microblog Posts Scale Based on Nonhomogeneous Poisson Process. In G. Zhiguo, X. Luo, J. Chen, F. L. Wang, & J. Lei (Eds.), Emerging Research in Web Information Systems and Mining (pp. 99–112). Springer Berlin Heidelberg. Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-24273-1_14
    機器學習中的數學(5)-強大的矩陣奇異值分解(SVD)及其應用- LeftNotEasy - 博客園. (n.d.). Retrieved November 18, 2016, from http://www.cnblogs.com/LeftNotEasy/archive/2011/01/19/svd-and-applications.html
    口試委員
  • 林耕霈 - 召集委員
  • 李珮如 - 委員
  • 康藝晃 - 指導教授
  • 口試日期 2017-07-13 繳交日期 2017-07-19

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫