Responsive image
博碩士論文 etd-0803119-022850 詳細資訊
Title page for etd-0803119-022850
論文名稱
Title
文字探勘工作流程設計平台之研究
The research on designing text mining workflow platform
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
55
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2019-07-22
繳交日期
Date of Submission
2019-09-03
關鍵字
Keywords
使用者研究、自然語言處理、工作流程驗證、文字探勘、服務工程、科學工作流程
Service Engineering, Scientific workflows, User Study, Natural Language Processing, Workflow Validation, Text Mining
統計
Statistics
本論文已被瀏覽 6111 次,被下載 39
The thesis/dissertation has been browsed 6111 times, has been downloaded 39 times.
中文摘要
近幾年隨著資訊科技的進步,產生了大量的文檔,電子郵件、網路新聞、網路論壇等文字資料。人們為了探究期中蘊含的價值,使得文字探勘領域成為了最受需求的領域之一。本研究旨在設計和實踐一個文字探勘工作流程系統(TMWP),該系統可以快速的建立和執行文字分析的工作流程,並且針對工作流程的合法性以及可執行性進行驗證。我們提出一個文字探勘流程模型,用於定義系統中的各項任務。我們也建立工作流程驗證之本體(Ontology),已更快速簡單的檢驗工作流程之合法性。最終我們透過使用者研究來對系統效能和準確性進行評估。實驗證明,該系統相較於傳統的 R 語言進行文字分析,得到了使用者更好的反饋。
Abstract
In recent years, with the advancement of information technology, a huge amount of documents, e-mails, online news, social media content have been produced. To discover the hidden information in those textual data, Text mining had become increasingly important nowadays. In this thesis, we aim to design and implement a Text mining Workflow Platform (TMWP), which can prototypically create an executable text analysis workflow and can validate and verify the workflow. We propose a text mining process model which defines the task in the system. We also build an ontology for workflow validation, which has been faster and easier to verify the validity of workflows. Finally, we conduct a user study experiment to evaluate our system. Experiments show that compared with R language in text analysis, our system has better performance and user feedback.
目次 Table of Contents
Content
論文審定書 ............................................................................................................. i
摘要 ....................................................................................................................... ii
Abstract ................................................................................................................. iii
Table of Contents ................................................................................................... iv
Table of Figures ..................................................................................................... vi
List of Tables .......................................................................................................... vii
Chapter 1 - Introduction ......................................................................................... 1
1.1 Background and Motivation ................................................................................ 1
1.2 Thesis Organization ............................................................................................. 2
Chapter 2 – Related Work ....................................................................................... 3
2.1 Scientific and Data Analytics workflow ................................................................ 3
2.2 Text mining workflow .......................................................................................... 5
2.3 Text mining process model .................................................................................. 6
2.4 Workflow Validation ............................................................................................ 7
Chapter 3 – Text Mining Process Model Design ....................................................... 9
3.1 A Process Model for Text Mining Workflow ........................................................ 9
3.2 Validation Ontology ........................................................................................... 17
v
Chapter 4 – Platform Development ....................................................................... 23
4.1 Requirement engineering .................................................................................. 23
4.2 The text mining workflow platform ................................................................... 25
4.3 Workflow engine ............................................................................................... 26
4.4 Execution engine ............................................................................................. 27
4.5 DataObject Retrieval Engine ............................................................................ 27
4.6 Workflow validation Engine ............................................................................. 28
Chapter 5 – Platform Evaluation ........................................................................... 29
5.1 Subject Description ........................................................................................... 29
5.2 Experiment Design ............................................................................................ 30
5.3 Result Evaluation ............................................................................................... 30
Chapter 6 – Conclusion ......................................................................................... 38
REFERENCES ......................................................................................................... 40
Appendix 1: Experiment Task Detail ...................................................................... 43
Appendix 2: Questionnaire Content ...................................................................... 46
參考文獻 References
[1] “Microsoft Azure Machine Learning Studio.” [Online]. Available:
https://studio.azureml.net/. [Accessed: 14-Nov-2018].
[2] J. Demsˇar et al., “Orange: Data Mining Toolbox in Python,” p. 5.
[3] K. Wolstencroft et al., “The Taverna workflow suite: designing and executing
workflows of Web Services on the desktop, web or in the cloud,” Nucleic Acids Res.,
vol. 41, no. W1, pp. W557–W561, Jul. 2013.
[4] “Open for Innovation | KNIME.” [Online]. Available: https://www.knime.com/.
[Accessed: 14-Jul-2019].
[5] About, • Email, and • Archive, “What Your Audience Is Doing When They’re Not
Listening To You by Lori Lewis,” All Access. [Online]. Available:
https://www.allaccess.com/merge/archive/26034/what-your-audience-is-doing
when-they-re-not. [Accessed: 14-Nov-2018].
[6] E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Workflows and e-Science: An
overview of workflow system features and capabilities,” Future Gener. Comput. Syst.,
vol. 25, no. 5, pp. 528–540, May 2009.
[7] M. Perovšek, J. Kranjc, T. Erjavec, B. Cestnik, and N. Lavrač, “TextFlows: A visual
programming platform for text mining and natural language processing,” Sci.
Comput. Program., vol. 121, pp. 128–152, Jun. 2016.
[8] “The Web framework for perfectionists with deadlines | Django.” [Online].
Available: https://www.djangoproject.com/. [Accessed: 14-Nov-2018].
[9] S. C. Kuah and S. Y. Hwang, “On the Construction of Text mining Workflow
System,” 2018.
[10] S. Narayanan and S. A. McIlraith, “Simulation, Verification and Automated
Composition of Web Services,” in Proceedings of the 11th International Conference
on World Wide Web, New York, NY, USA, 2002, pp. 77–88.
[11] J. Korhonen, L. Pajunen, and J. Puustjarvi, “Automatic composition of Web
service workflows using a semantic agent,” in Proceedings IEEE/WIC International
Conference on Web Intelligence (WI 2003), 2003, pp. 566–569.
[12] D. Redavid, R. Corizzo, and D. Malerba, “An OWL Ontology for Supporting
Semantic Services in Big Data Platforms,” 2018, pp. 228–231.
[13] J. Zhang, “Ontology-Driven Composition and Validation of Scientific Grid
Workflows in Kepler: a Case Study of Hyperspectral Image Processing,” in 2006 Fifth
International Conference on Grid and Cooperative Computing Workshops, 2006, pp.
282–289.
[14] S. Sadiq, M. Orlowska, W. Sadiq, and C. Foulger, “Data Flow and Validation in
Workflow Modelling,” in Proceedings of the 15th Australasian Database Conference -
Volume 27, Darlinghurst, Australia, Australia, 2004, pp. 207–214.
[15] J. F.- js.foundation, “jQuery.” .
[16] J. Bagga and A. Heinz, “JGraph— A Java Based System for Drawing Graphs and
Running Graph Algorithms,” in Graph Drawing, 2002, pp. 459–460.
[17] “DataTables | Table plug-in for jQuery.” [Online]. Available:
https://datatables.net/. [Accessed: 13-Aug-2019].
[18] “Welcome | Flask (A Python Microframework).” [Online]. Available:
http://flask.pocoo.org/. [Accessed: 14-Nov-2018].
[19] “Welcome | Werkzeug (The Python WSGI Utility Library).” [Online]. Available:
http://werkzeug.pocoo.org/. [Accessed: 14-Nov-2018].
[20] “Welcome | Jinja2 (The Python Template Engine).” [Online]. Available:
http://jinja.pocoo.org/. [Accessed: 14-Nov-2018].
[21] “Open Source Document Database,” MongoDB. [Online]. Available:
https://www.mongodb.com/index. [Accessed: 14-Nov-2018].
[22] “SUS -- a quick and dirty usability scale | John Brooke.” [Online]. Available:
https://www.researchgate.net/publication/319394819_SUS_-
_a_quick_and_dirty_usability_scale. [Accessed: 15-Aug-2019].
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code