博碩士論文 etd-0213112-123624 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 洪崇洋(Chung-yang Hung) 電子郵件信箱 hosoyu@gmail.com
畢業系所 資訊管理學系研究所(Information Management)
畢業學位 碩士(Master) 畢業時期 100學年第1學期
論文名稱(中) 以LDA和使用紀錄為基礎的線上電子書主題趨勢發掘方法
論文名稱(英) An Approach to eBook Topics Trend Discovery Based on LDA and Usage Log
檔案
  • etd-0213112-123624.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    電子論文:使用者自訂權限:校內 1 年後、校外 1 年後公開

    論文語文/頁數 中文/61
    統計 本論文已被瀏覽 5356 次,被下載 5096 次
    摘要(中) 網際網路的發展及科技的進步讓數位內容產業日漸蓬勃,出版業者紛紛開始提供線上電子書檢索、閱讀及下載服務,使用者不受地域或時間的限制,隨時隨地都能使用電腦來閱讀數位內容,另外一方面圖書館購買電子書做為館藏的比例亦逐年增加。使用電子資源的方式,可透過連線到電子書檢索平台或透過圖書館自動化系統檢索,由館藏目錄中直接鏈結至電子書平台進行使用。這一個方式相較於實體館藏來說沒有流通數量上的限制,同時提昇了圖書資源的利用率。
    提供電子書檢索服務的出版社或系統整合業者眾多,圖書內容包羅萬象,考量到有限的預算條件下,圖書館採購電子書除了參考讀者的推薦之外亦需要評估電子資源的使用率,做最有效率的投資。目前最普遍的方式是使用統計報表,其通常由出版社所提供。
    本研究使用Latent Dirichlet Allocation簡稱LDA的方法,基於圖書的內容來建置主題模型,然後結合電子書檢索平台的使用統計報表,運用主題模型的加權來發掘電子書讀者閱讀主題的變化,進而提供一個具參考價值的訊息。我們在實驗中並比較了其他兩種方式:美國國會分類法和主題標目法。實驗結果證實透過主題加權方法產生的主題模型與其他兩種方法顯著不同,可以提供另一方面的有用資訊。
    摘要(英) With the growth of digital content industry, publishers start to provide online services for ebook search, reading and downloading. Users can access to online resources from anywhere, any place with laptop or mobile devices at any time. Nowadays more and more libraries have purchased ebooks as an important part of the library collection. To access the online resources users can link directly to publisher's ebook portal or via the OPAC system. Compared to the library circulation process, ebooks are more convenient to patrons and improve the utilization of library online resources.
    There are various kinds of ebooks available in the market, so libraries have to focus their investment on the most valuable online resources. Usage statistics report plays an important role in providing valuable information to libraries. It is usually based on the standard of COUNTER to generate the statistic reports, although it provides when and where users access to specific ebooks, it fails show the general topics and how they change.
    In this study, we introduce a post process method to weighting the LDA topic model via the usage statistic report to emphasize the changes of topic and compare it to the classification method and subject heading method in the bibliographic, namely LCC and LCSH respectively. The result show that weighted topic model significantly affect the ranking of topics, and the topic model are independent from the classification method and the subject heading method in the bibliographic record.
    關鍵字(中)
  • LCSH
  • LCC
  • LDA
  • 主題
  • 使用記錄
  • 電子書
  • 主題模型
  • 關鍵字(英)
  • LDA
  • Topic Model
  • Topic
  • Usage Log
  • Ebook
  • LCC
  • LCSH
  • 論文目次 第一章 諸論 1
    1.1 研究背景 1
    1.2 研究動機 1
    1.3 研究目地 2
    1.4 論文架構 3
    第二章 文獻探討 4
    2.1 LDA主題模型 4
    2.2 LDA參數的選擇 6
    2.3 Collapsed Gibbs Sampler 7
    2.4 COUNTER統計報表 9
    2.5 美國國會圖書館分類法 10
    2.6 美國國會圖書館標題表 11
    第三章 主題模型建立的方法 13
    3.1系統架構 13
    3.2文字資料前置處理 15
    3.2.1 資料來源 15
    3.2.2資料處理方式 18
    3.3 使用記錄前置處理 19
    3.3.1資料來源 19
    3.3.2 資料處理方式 21
    3.4 LDA參數選擇 23
    3.5 主題模型建置 24
    3.5.1 工具的選擇 24
    3.5.2 輸入資料格式 25
    3.5.3 輸出資料格式 26
    3.5.4 主題模型的建置 27
    3.5.5 LDA資料庫的設計 28
    3.6 LDA主題加權 30
    第四章 實驗結果 34
    4.1 前言 34
    4.2 主題加權結果觀察 34
    4.3 LCC與主題模型關聯性 39
    4.4 LCSH與主題模型關聯性 41
    4.5 LCC、LCSH與主題相關性觀察 43
    第五章 結論與未來研究建議 47
    5.1 結論 47
    5.2 未來研究建議 47
    第六章 參考文獻 49
    參考文獻 AlSumait, L., Barbara, D., & Domeniconi, C. (2008, 15-19 Dec. 2008). On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking. Paper presented at the Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on.
    Anthes, G. (2010). Topic models vs. unstructured data. Commun. ACM, 53(12), 16-18. doi: 10.1145/1859204.1859210
    Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Paper presented at the Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania.
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3, 993-1022. doi: 10.1162/jmlr.2003.3.4-5.993
    Chang, J., Boyd-graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2010). Reading Tea Leaves: How Humans Interpret Topic Models %U http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.992.
    . Columbia University Press. from http://cup.columbia.edu/
    . COUNTER - Counting Online Usage of Networked Electronic Resources. from http://www.projectcounter.org/
    . COUNTER - Counting Online Usage of Networked Electronic Resources Home. from http://www.projectcounter.org/
    Darling, W. M. (2011). A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling.
    . Gibbs sampling. from http://en.wikipedia.org/wiki/Gibbs_sampling
    Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1), 5228-5235. doi: 10.1073/pnas.0307752101
    Hall, D., Jurafsky, D., & Manning, C. D. (2008). Studying the history of ideas using topic models. Paper presented at the Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii.
    Khosh-khui, S. A. (1987). Relationship Between LCSH and LCC Notationsin Different Classes of LCC. Staff Publications-Library, Texas State University. 
    . Library of Congress Classification. from http://www.loc.gov/catdir/cpso/lcc.html
    Magdy, W., & Darwish, K. (2008). Book search: indexing the valuable parts. Paper presented at the Proceeding of the 2008 ACM workshop on Research advances in large digital book repositories, Napa Valley, California, USA. http://dl.acm.org/citation.cfm?doid=1458412.1458429
    Maskeri, G., Sarkar, S., & Heafield, K. (2008). Mining business topics in source code using latent dirichlet allocation. Paper presented at the Proceedings of the 1st India software engineering conference, Hyderabad, India.
    Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007). Subject metadata enrichment using statistical topic models. Paper presented at the Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, Vancouver, BC, Canada.
    Noh, Y., Hagedorn, K., & Newman, D. (2011). Are learned topics more useful than subject headings. Paper presented at the Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries, Ottawa, Ontario, Canada.
    Shepherd, P. T. COUNTER: towards reliable vendor usage statistics. [Conceptual Paper]. VINE, 34(4). doi: 10.1108/03055720410570975
    Sun, Y., Han, J., Gao, J., & Yu, Y. (2009). itopicmodel: Information network-integrated topic modeling.
    Wang, X., & McCallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. Paper presented at the Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Philadelphia, PA, USA.
    . 國家圖書館編目園地全球資訊網. from http://catweb.ncl.edu.tw/
    口試委員
  • 陳嘉玫 - 召集委員
  • 張德民 - 委員
  • 黃三益 - 指導教授
  • 口試日期 2012-01-12 繳交日期 2012-02-13

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫