論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
文字探勘工作流程設計平台之研究 The research on designing text mining workflow platform |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
55 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2019-07-22 |
繳交日期 Date of Submission |
2019-09-03 |
關鍵字 Keywords |
使用者研究、自然語言處理、工作流程驗證、文字探勘、服務工程、科學工作流程 Service Engineering, Scientific workflows, User Study, Natural Language Processing, Workflow Validation, Text Mining |
||
統計 Statistics |
本論文已被瀏覽 6111 次,被下載 39 次 The thesis/dissertation has been browsed 6111 times, has been downloaded 39 times. |
中文摘要 |
近幾年隨著資訊科技的進步,產生了大量的文檔,電子郵件、網路新聞、網路論壇等文字資料。人們為了探究期中蘊含的價值,使得文字探勘領域成為了最受需求的領域之一。本研究旨在設計和實踐一個文字探勘工作流程系統(TMWP),該系統可以快速的建立和執行文字分析的工作流程,並且針對工作流程的合法性以及可執行性進行驗證。我們提出一個文字探勘流程模型,用於定義系統中的各項任務。我們也建立工作流程驗證之本體(Ontology),已更快速簡單的檢驗工作流程之合法性。最終我們透過使用者研究來對系統效能和準確性進行評估。實驗證明,該系統相較於傳統的 R 語言進行文字分析,得到了使用者更好的反饋。 |
Abstract |
In recent years, with the advancement of information technology, a huge amount of documents, e-mails, online news, social media content have been produced. To discover the hidden information in those textual data, Text mining had become increasingly important nowadays. In this thesis, we aim to design and implement a Text mining Workflow Platform (TMWP), which can prototypically create an executable text analysis workflow and can validate and verify the workflow. We propose a text mining process model which defines the task in the system. We also build an ontology for workflow validation, which has been faster and easier to verify the validity of workflows. Finally, we conduct a user study experiment to evaluate our system. Experiments show that compared with R language in text analysis, our system has better performance and user feedback. |
目次 Table of Contents |
Content 論文審定書 ............................................................................................................. i 摘要 ....................................................................................................................... ii Abstract ................................................................................................................. iii Table of Contents ................................................................................................... iv Table of Figures ..................................................................................................... vi List of Tables .......................................................................................................... vii Chapter 1 - Introduction ......................................................................................... 1 1.1 Background and Motivation ................................................................................ 1 1.2 Thesis Organization ............................................................................................. 2 Chapter 2 – Related Work ....................................................................................... 3 2.1 Scientific and Data Analytics workflow ................................................................ 3 2.2 Text mining workflow .......................................................................................... 5 2.3 Text mining process model .................................................................................. 6 2.4 Workflow Validation ............................................................................................ 7 Chapter 3 – Text Mining Process Model Design ....................................................... 9 3.1 A Process Model for Text Mining Workflow ........................................................ 9 3.2 Validation Ontology ........................................................................................... 17 v Chapter 4 – Platform Development ....................................................................... 23 4.1 Requirement engineering .................................................................................. 23 4.2 The text mining workflow platform ................................................................... 25 4.3 Workflow engine ............................................................................................... 26 4.4 Execution engine ............................................................................................. 27 4.5 DataObject Retrieval Engine ............................................................................ 27 4.6 Workflow validation Engine ............................................................................. 28 Chapter 5 – Platform Evaluation ........................................................................... 29 5.1 Subject Description ........................................................................................... 29 5.2 Experiment Design ............................................................................................ 30 5.3 Result Evaluation ............................................................................................... 30 Chapter 6 – Conclusion ......................................................................................... 38 REFERENCES ......................................................................................................... 40 Appendix 1: Experiment Task Detail ...................................................................... 43 Appendix 2: Questionnaire Content ...................................................................... 46 |
參考文獻 References |
[1] “Microsoft Azure Machine Learning Studio.” [Online]. Available: https://studio.azureml.net/. [Accessed: 14-Nov-2018]. [2] J. Demsˇar et al., “Orange: Data Mining Toolbox in Python,” p. 5. [3] K. Wolstencroft et al., “The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud,” Nucleic Acids Res., vol. 41, no. W1, pp. W557–W561, Jul. 2013. [4] “Open for Innovation | KNIME.” [Online]. Available: https://www.knime.com/. [Accessed: 14-Jul-2019]. [5] About, • Email, and • Archive, “What Your Audience Is Doing When They’re Not Listening To You by Lori Lewis,” All Access. [Online]. Available: https://www.allaccess.com/merge/archive/26034/what-your-audience-is-doing when-they-re-not. [Accessed: 14-Nov-2018]. [6] E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Workflows and e-Science: An overview of workflow system features and capabilities,” Future Gener. Comput. Syst., vol. 25, no. 5, pp. 528–540, May 2009. [7] M. Perovšek, J. Kranjc, T. Erjavec, B. Cestnik, and N. Lavrač, “TextFlows: A visual programming platform for text mining and natural language processing,” Sci. Comput. Program., vol. 121, pp. 128–152, Jun. 2016. [8] “The Web framework for perfectionists with deadlines | Django.” [Online]. Available: https://www.djangoproject.com/. [Accessed: 14-Nov-2018]. [9] S. C. Kuah and S. Y. Hwang, “On the Construction of Text mining Workflow System,” 2018. [10] S. Narayanan and S. A. McIlraith, “Simulation, Verification and Automated Composition of Web Services,” in Proceedings of the 11th International Conference on World Wide Web, New York, NY, USA, 2002, pp. 77–88. [11] J. Korhonen, L. Pajunen, and J. Puustjarvi, “Automatic composition of Web service workflows using a semantic agent,” in Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), 2003, pp. 566–569. [12] D. Redavid, R. Corizzo, and D. Malerba, “An OWL Ontology for Supporting Semantic Services in Big Data Platforms,” 2018, pp. 228–231. [13] J. Zhang, “Ontology-Driven Composition and Validation of Scientific Grid Workflows in Kepler: a Case Study of Hyperspectral Image Processing,” in 2006 Fifth International Conference on Grid and Cooperative Computing Workshops, 2006, pp. 282–289. [14] S. Sadiq, M. Orlowska, W. Sadiq, and C. Foulger, “Data Flow and Validation in Workflow Modelling,” in Proceedings of the 15th Australasian Database Conference - Volume 27, Darlinghurst, Australia, Australia, 2004, pp. 207–214. [15] J. F.- js.foundation, “jQuery.” . [16] J. Bagga and A. Heinz, “JGraph— A Java Based System for Drawing Graphs and Running Graph Algorithms,” in Graph Drawing, 2002, pp. 459–460. [17] “DataTables | Table plug-in for jQuery.” [Online]. Available: https://datatables.net/. [Accessed: 13-Aug-2019]. [18] “Welcome | Flask (A Python Microframework).” [Online]. Available: http://flask.pocoo.org/. [Accessed: 14-Nov-2018]. [19] “Welcome | Werkzeug (The Python WSGI Utility Library).” [Online]. Available: http://werkzeug.pocoo.org/. [Accessed: 14-Nov-2018]. [20] “Welcome | Jinja2 (The Python Template Engine).” [Online]. Available: http://jinja.pocoo.org/. [Accessed: 14-Nov-2018]. [21] “Open Source Document Database,” MongoDB. [Online]. Available: https://www.mongodb.com/index. [Accessed: 14-Nov-2018]. [22] “SUS -- a quick and dirty usability scale | John Brooke.” [Online]. Available: https://www.researchgate.net/publication/319394819_SUS_- _a_quick_and_dirty_usability_scale. [Accessed: 15-Aug-2019]. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |