論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
利用主題模型擷取跨語言專利和趨勢分析之研究 Applying Topic Model for Cross-lingual Patent Retrieval and Trend Analysis |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
59 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2020-07-21 |
繳交日期 Date of Submission |
2020-08-02 |
關鍵字 Keywords |
專利分析、專利檢索、跨語言空間投影、跨語言主題模型、主題模型 Cross-Lingual Mapping, Cross-Lingual Topic Model, Patent Analysis, Topic Model, Patent Retrieval |
||
統計 Statistics |
本論文已被瀏覽 6164 次,被下載 129 次 The thesis/dissertation has been browsed 6164 times, has been downloaded 129 times. |
中文摘要 |
技術趨勢的檢視與預測協助企業做出在研發相關活動上的決策,而專利是技術能力的代理指標, 提供可靠的訊息來揭露技術資訊與發展。由於專利是屬地保護主義,在一個國家授予專利時,該專利只在該國家有效,在其他國家沒有被保護。對於跨國企業(MNEs),在許多國家申請專利對於保護其全球發明很重要。隨著跨國專利的快速增長,跨語言專利檢索和趨勢檢測對發明者和專利檢索者至關重要。跨語言主題模型使跨國企業能夠預測和比較不同國家的主題趨勢。我們收集來自美國專利商標局(USPTO)和中華民國智慧財產局(TIPO)的專利資料,應用一種將跨語言詞嵌入結合到隱含狄利克雷分佈(LDA)中的方法,該方法為Post-matching LDA(PMLDA)。我們使用模型的產出來檢視跨國企業的跨語言技術主題趨勢並比較使用跨語言主題、跨語言詞嵌入與專利局分類的跨語言專利檢索效果。 |
Abstract |
Technology trends detection and forecasting help companies making decision on further R&D related activities. Patents are proxy measure of technological capability that provides reliable information that reveal technological information and development. Since patents are territorial rights, when a patent is granted in a country, it is only valid in that country and has no protection in other countries. For multinational enterprises (MNEs), filing patents in many countries is important to protect their invention globally. With the rapid increase in global patents, cross-lingual patent retrieval and trend detection is important for inventors and patent searchers. Cross-lingual topic modeling enables MNEs to forecast and compare topic trends in different countries. We apply a method that incorporate cross-lingual word embedding into Latent Dirichlet Allocation (LDA), called Post-matching LDA (PMLDA), on patent data collected from United States Patent and Trademark Office (USPTO) and Taiwan Intellectual Property Office (TIPO) to forecast cross-lingual topic trends of MNEs using the output of the model. We further compare the performance of cross-lingual patent retrieval based on cross-lingual topic model, cross-lingual embedding, and patent classification. |
目次 Table of Contents |
論文審訂書 i 誌謝 ii 摘要 iii ABSTRACT iv CHAPTER 1– Introduction 1 CHAPTER 2– Related Work 5 2.1 Patent Retrieval 5 2.2 Topic modeling on patent data 6 2.3 Cross-lingual topic model 9 CHAPTER 3 – Patent Data Description 11 CHAPTER 4 – Methodology 13 4.1 Latent Dirichlet Allocation (LDA) 14 4.2 Post-matching LDA (PMLDA) 17 CHAPTER 5 – Experiments and discussion 21 5.1 Monolingual word representation 21 5.2 Cross-lingual word representation 21 5.3 Topic number setting 21 5.4 Representative words of cross-lingual topics 26 5.5 Heatmap reflecting relationships between cross-lingual topics and CPC 31 5.6 Similar patent retrieval across languages 32 5.7 Discussion of technology trend 34 CHAPTER 5 – Conclusion 38 Reference 39 Appendix 44 |
參考文獻 References |
Aharonson, Barak S., and Melissa A. Schilling. 2016. “Mapping the Technological Landscape: Measuring Technology Distance, Technological Footprints, and Technology Evolution.” Research Policy 45 (1): 81–96. https://doi.org/10.1016/j.respol.2015.08.001. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” J. Mach. Learn. Res. 3 (March): 993–1022. Bouma, Gerlof. 2009. “Normalized (Pointwise) Mutual Information in Collocation Extraction.” Proceedings of the Biennial GSCL Conference 2009, January. Chang, Chia-Hsuan, San-Yih Hwang, and Tou-Hsiang Xui. 2018. “Incorporating Word Embedding into Cross-Lingual Topic Modeling.” In 2018 IEEE International Congress on Big Data (BigData Congress), 17–24. https://doi.org/10.1109/BigDataCongress.2018.00010. Chang, Jonathan, Sean Gerrish, Chong Wang, Jordan L. Boyd-graber, and David M. Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, 288–296. Curran Associates, Inc. http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf. Chen, Hongshu, Guangquan Zhang, Donghua Zhu, and Jie Lu. 2017. “Topic-Based Technological Forecasting Based on Patent Data: A Case Study of Australian Patents from 2000 to 2014.” Technological Forecasting and Social Change 119 (June): 39–52. https://doi.org/10.1016/j.techfore.2017.03.009. Cho, Han Pil, Hyunsu Lim, Dongmin Lee, Hunhee Cho, and Kyung-In Kang. 2018. “Patent Analysis for Forecasting Promising Technology in High-Rise Building Construction.” Technological Forecasting and Social Change 128 (March): 144–53. https://doi.org/10.1016/j.techfore.2017.11.012. Gur, Furkan Amil, and Thomas Greckhamer. 2019. “Know Thy Enemy: A Review and Agenda for Research on Competitor Identification.” Journal of Management 45 (5): 2072–2100. https://doi.org/10.1177/0149206317744250. Hao, Shudong, Jordan Boyd-Graber, and Michael J. Paul. 2018. “Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation.” In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 1090–1100. New Orleans, Louisiana: Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1099. Jagarlamudi, Jagadeesh, and Hal Daumé. 2010. “Extracting Multilingual Topics from Unaligned Comparable Corpora.” In Proceedings of the 32Nd European Conference on Advances in Information Retrieval, 444–456. ECIR’2010. Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-642-12275-0_39. Kuhn, Jeffrey, Kenneth Younge, and Alan Marco. 2020. “Patent Citations Reexamined.” The RAND Journal of Economics 51 (1): 109–32. https://doi.org/10.1111/1756-2171.12307. Lau, Jey Han, David Newman, and Timothy Baldwin. 2014. “Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality.” In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530–539. Gothenburg, Sweden: Association for Computational Linguistics. https://doi.org/10.3115/v1/E14-1056. Lee, Mingook, and Sungjoo Lee. 2017. “Identifying New Business Opportunities from Competitor Intelligence: An Integrated Use of Patent and Trademark Databases.” Technological Forecasting and Social Change 119 (June): 170–83. https://doi.org/10.1016/j.techfore.2017.03.026. Li, Xin, Qianqian Xie, Tugrul Daim, and Lucheng Huang. 2019. “Forecasting Technology Trends Using Text Mining of the Gaps between Science and Technology: The Case of Perovskite Solar Cell Technology.” Technological Forecasting and Social Change 146 (September): 432–49. https://doi.org/10.1016/j.techfore.2019.01.012. Magdy, Walid, and Gareth J.F. Jones. 2011. “A Study on Query Expansion Methods for Patent Retrieval.” In Proceedings of the 4th Workshop on Patent Information Retrieval, 19–24. PaIR ’11. Glasgow, Scotland, UK: Association for Computing Machinery. https://doi.org/10.1145/2064975.2064982. Melero, Eduardo, Neus Palomeras, and David Wehrheim. 2020. “The Effect of Patent Protection on Inventor Mobility.” Management Science, May. https://doi.org/10.1287/mnsc.2019.3500. Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” ArXiv:1301.3781 [Cs], September. http://arxiv.org/abs/1301.3781. Mikolov, Tomas, Quoc V. Le, and Ilya Sutskever. 2013. “Exploiting Similarities among Languages for Machine Translation.” ArXiv:1309.4168 [Cs], September. http://arxiv.org/abs/1309.4168. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3111–3119. Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. Mimno, David, Hanna M. Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. 2009. “Polylingual Topic Models.” In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 880–889. Singapore: Association for Computational Linguistics. https://www.aclweb.org/anthology/D09-1092. Ni, Xiaochuan, Jian-Tao Sun, Jian Hu, and Zheng Chen. 2009. “Mining Multilingual Topics from Wikipedia.” In Proceedings of the 18th International Conference on World Wide Web, 1155–1156. WWW ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1526709.1526904. Sievert, Carson, and Kenneth Shirley. 2014. “LDAvis: A Method for Visualizing and Interpreting Topics.” In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70. Baltimore, Maryland, USA: Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-3110. Song, Kisik, Karp Soo Kim, and Sungjoo Lee. 2017. “Discovering New Technology Opportunities Based on Patents: Text-Mining and F-Term Analysis.” Technovation 60–61 (February): 1–14. https://doi.org/10.1016/j.technovation.2017.03.001. Tang, Jie, Weichang Li, Adam K. Usadi, Bo Wang, Yang Yang, Po Hu, Yanting Zhao, et al. 2012. “PatentMiner: Topic-Driven Patent Analysis and Mining.” In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’12, 1366. Beijing, China: ACM Press. https://doi.org/10.1145/2339530.2339741. Venugopalan, Subhashini, and Varun Rai. 2015. “Topic Based Classification and Pattern Identification in Patents.” Technological Forecasting and Social Change 94 (May): 236–50. https://doi.org/10.1016/j.techfore.2014.10.006. Yang, Yang, Jie Tang, and Juanzi Li. 2018. “Learning to Infer Competitive Relationships in Heterogeneous Networks.” ACM Transactions on Knowledge Discovery from Data 12 (1): 1–23. https://doi.org/10.1145/3051127. Yoon, Byungun, and Christopher L. Magee. 2018. “Exploring Technology Opportunities by Visualizing Patent Information Based on Generative Topographic Mapping and Link Prediction.” Technological Forecasting and Social Change 132 (July): 105–17. https://doi.org/10.1016/j.techfore.2018.01.019. Yoon, Janghyeok, Wonchul Seo, Byoung-Youl Coh, Inseok Song, and Jae-Min Lee. 2017. “Identifying Product Opportunities Using Collaborative Filtering-Based Patent Analysis.” Computers & Industrial Engineering 107 (May): 376–87. https://doi.org/10.1016/j.cie.2016.04.009. Younge, Kenneth A., and Jeffrey M. Kuhn. 2016. “Patent-to-Patent Similarity: A Vector Space Model.” SSRN Scholarly Paper ID 2709238. Rochester, NY: Social Science Research Network. https://doi.org/10.2139/ssrn.2709238. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |