國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,綜合法則歸納系統之延伸研究,An Extension to the Composite Rule Induction System

論文名稱 Title	綜合法則歸納系統之延伸研究 An Extension to the Composite Rule Induction System
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	95 學年度第 2 學期 The spring semester of Academic Year 95	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	85
研究生 Author	楊元琪 Yuan-chi Yang
指導教授 Advisor	梁定澎 Ting-Peng Liang
召集委員 Convenor	鄭炳強 Bing-Chiang Jeng
口試委員 Advisory Committee	陳灯能 Deng-Neng Chen
口試日期 Date of Exam	2007-07-19	繳交日期 Date of Submission	2007-07-30
關鍵字 Keywords	法則歸納、資料探勘、知識淬取、專家系統 Rule Induction, Knowledge Management, Knowledge-based systems, Data Mining
統計 Statistics	本論文已被瀏覽 5860 次，被下載 3461 次 The thesis/dissertation has been browsed 5860 times, has been downloaded 3461 times.

中文摘要
綜合法則歸納系統之延伸研究知識淬取一直都是專家系統 (Expert System, ES) 設計的瓶頸，而專家系統中的知識單元更是由這些被淬取的知識所組成，因此知識模型的良寙直接影響專家系統表現的優劣。克服該瓶頸的方法之一，便是自資料樣本中推論出專家法則，也因此有眾多學者專家建議使用有效的分析技術，淬取出專家知識法則 (D. Michie, 1983)。在許多的知識模型 (Pattern) 淬取技術中，較常被使用的分類與預測 (Classification and Prediction) 淬取技術包含ID3、C4.5、類神經網路 (Artificial Neural Network, ANN)…等技術；但這些技術通常利用相同的標準處理類別與非類別 (例：數值資料) 特徵值 (Attribute)，因此可能因為資料轉換的偏誤導致分類知識模型產生誤差。為使資料探勘技術能同時以不同之方式，處理類別與非類別特徵值 (Attribute)，Liang於1992年提出『綜合法則歸納系統』 (Composite Rule Induction System, CRIS)，利用Tabular Approach與Statistical Elaboration的方式，分別分析Qualitative與Quantitative之特徵值以產生較精確之分類法則，同時利用機率運算 (Bayesian Theorem) 將分類法則與事件機率間之關聯被清楚地呈現。但該方法僅能處理二元類別之分析，且建立之歸納法則亦無法明確呈現自變項特徵值 (Independent Attribute) 分類鑑別效力。故本研究提出『複類別特徵值判定』、『特徵值效力檢定』及『假說法則產生限制』等三個方法，改善CRIS技術僅能處理二元類別分析的限制，並呈現出分類法則的鑑別效力。且為了驗證改良CRIS方法之可行性，本研究建立一套簡單的CRIS系統，並使用Cytel Software Corporation開發的XLMiner3做為標竿測試 (Benchmark Testing) 對照組，進行各分類技術之『執行效率』、『分類模型錯誤率』及『分類模型預測準確力』等績效之測試比較。
Abstract
An Extension to the Composite Rule Induction System Discovering knowledge from data is an important task for knowledge management and development of intelligent systems, which is called knowledge acquisition or data mining. Many techniques have been developed for such purpose. For example, ID3, C4.5 (tree induction techniques) and Artificial Neural Networks are among the popular techniques in “Classification and Prediction” area. However, these methods often use the same criteria to analyze nominal and non-nominal attributes, which is very likely to produce biased knowledge due to mis-match between data type and their algorithms. In Liang (1992), he proposed a composite approach called CRIS to inducing knowledge that introduces statistical concepts and data mining heuristics and found the composite method outperformed other methods including tree induction, discriminant analysis, and neural networks. However, the paper focuses on the classification of binary objects and did not describe how the approach can be applied to a problem with more than two classes in the dependent variable. In this research, we extend the previous approach to solve the problem with more than two classes. We also enhance the approach by adding steps to prioritizing attributes using their identification power and controlling the growth of generated hypothesis. In order evaluate the extended CRIS method, a prototype system, eCRIS, was developed and compared with a commercial data mining package, XLMiner3 (developed by Cytel Software Corporation) using three existing datasets in data mining research. The results indicate that the extended CRIS outperforms tree induction and backpropagation in neural networks in datasets that include both nominal and non-nominal data and performed equally well with them.

目次 Table of Contents
目錄第壹章、緒論 1 第一節、研究背景與動機 1 第二節、研究目的 4 第三節、研究步驟 5 第四節、本文結構 7 第貳章、文獻探討 8 第一節、資料探勘簡介 8 第二節、分類模型知識探勘技術 13 第三節、分類決策樹之技術簡介 19 第四節、貝氏分類法技術之簡介 25 第五節、人工類神經網路技術之簡介 27 第六節、 K個最近鄰居分類法技術之簡介 31 第七節、綜合法則歸納系統技術之簡介 34 第參章、綜合法則歸納系統 43 第一節、複類別特徵值判定 46 第二節、特徵值效力檢定 50 第三節、假說法則產生限制 52 第肆章、綜合法則歸納系統設計與實作 53 第一節、系統需求分析 53 第二節、綜合法則歸納系統介紹 55 第三節、綜合法則歸納系統績效實證 65 第伍章、研究結論與建議 71 第一節、研究貢獻 71 第二節、研究限制 72 第三節、後續研究 73 參考文獻 74 附表目錄表一、模型準確度統計表 17 表二、高爾夫球賽決策資料集 22 表三、筆記型電腦購買意願 26 表四、類別關聯次數分析表 35 表五、非類別特徵值統計量表 36 表六、鳶尾花資料集說明 44 表七、部份鳶尾花資料集 45 表八、鳶尾花資料集的非類別特徵值統計量表 47 表九、鳶尾花資料集的常態假說法則 49 表十、鳶尾花資料集的基礎假說法則 49 表十一、具特徵值效力之常態假說法則 51 表十二、酒類常態分類法則 52 表十三、測試資料集說明表 65 表十四、績效測試結果比較表 67 表十五、各資料集最佳預測技術及方法 67 附圖目錄圖一、研究流程圖 6 圖二、知識發掘流程圖 9 圖三、購買筆記型電腦意願的分類知識模型 13 圖四、ID3演算法 21 圖五、高爾夫球賽決策樹 23 圖六、類神經元模型 27 圖七、人工類神經網路之三層式架構圖 28 圖八、K-NEAREST NEIGHBOR分類法 32 圖九、特徵值分類圖 36 圖十、法則篩選器演算法 41 圖十一、綜合法則歸納系統之系統流程圖 42 圖十二、複類別特徵值分類圖 46 圖十三、複分類特徵值分析法 48 圖十四、鑑別力不足之非類別特徵值分類圖 50 圖十五、綜合法則歸納系統流程圖 55 圖十六、綜合法則歸納系統架構圖 56 圖十七、綜合法則歸納系統之訓練資料介面 57 圖十八、綜合法則歸納系統之特徵值定義介面 58 圖十九、基礎假說法則 (CANDIDATE CUT RULES) 60 圖二十、常態假說法則 (CANDIDATE REGULAR RULE) 61 圖二十一、候選基礎法則之SALIENCY值 62 圖二十二、候選常態法則之SALIENCY值 63 圖二十三、CRIS建構之知識模型 63 圖二十四、CRIS知識模型之案例錯誤率 64

參考文獻 References
參考文獻 1. Bessembinder, Hendrik and Chan, Kalok (1995), "The Profitability of Technical Trading Rules in the Asian Stock Markets," Pacific-Basin finance Journal, Vol.3, Page 257-284 2. Bingchiang Jeng, Ting-Peng Liang and MinYang Hong (1996), "Interactive Induction of Expert Knowledge," Expert Systems With Applications, Vol.10, Issue 3-4, Page 393-401 3. Bingchiang Jeng, Yung-Mo Jeng, Ting-Peng Liang (1997), "FILM: a fuzzy inductive learning method for automated knowledge acquisition," Decision Support Systems, Vol.21, Page 61-73 4. Breiman, L., J. H. Friedman and C. J. Stone (1984), "Classification and Regression Trees," Wadsworth & Brooks, Monterey, CA 5. Chandler, J. C. and T. P. Liang (1990), "Developing Expert Systems for Business Applicalians," Merrill Publishing Co., Columbus. OH. 6. Feigenbaum, E.A. (1981), "Expert systems in the 1980s," State of the Art Report on Machine Intelligence, (A. Bond, Ed.) 7. Fernando Fernández-Rodríguez, Christian Gonzalez-Martel and Simon Sosvilla-Rivero (2000), "On The Profitability of Technical Trading Rules Based On Artificial Neural Networks: Evidence from the Madrid Stock Market," Economic Letters, Vol.69, Issue 1, Page 89-94 8. Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic Problems," Annals of Eugenics, Vol.7, 179-188. 9. Gencay, Ramazan (1996), "Non-linear Prediction of Security Returns with Moving Average Rules," Journal of Forecasting, Vol.15, Page 165-174 10. Gencay, Ramazan (1998), "The Predictability of Security Returns with Simple Technical Trading Rules," Journal of Empirical Finance, Vol.5, Issue 4, Page 347-359 11. Grudnitzky, G. and Osburn, L. (1993), "Forecasting S&P and Gold Futures Prices: An Application of Neural Networks," Journal of Futures Markets, Vol. 13, Issue 6, Page 631-643 12. Hung, Shin-Yuan and Liang, Ting-Peng and Liu, Victor Wei-Chi (1996), "Integrating Arbitrage Pricing Theory and Artificial Neural Networks to Support Portfolio Management," Decision Support Systems, Vol.18, Issue 3-4, Page 301-316 13. Hyafil, L. & Rivest, R. L. (1976), "Constructing optimal binary decision trees is NP-complete," Information Processing Letters, Vol.5, Issue 1, Page 15-17 14. Jiawei Han and Micheline Kamber (2006), "Data Mining - Concepts and Techniques," Morgan Kaufmann 15. Kimoto, T. and Asakawa, K. (1990), "Stock Market Prediction System with Modular Networks," IEEE International Joint Conference on Neural Networks, Vol.1, Page 1-6 16. Liang, Ting-Peng (1992), "A Composite Approach to Inducing Knowledge for Expert Systems Design," Management Science, Vol 38, Issue 1 17. Matsatsinis, Nikolaos F. (2002), "CCAS: An Intelligent Decision Support System for Credit Card Assessment," Journal of Multi-Criteria Decision Analysis, Vol.11, Page 213-235 18. Michie, D. (1983), "Inductive rule generation in the context of the Fifth Generation," Proceedings of the Secound International Machine Learning Workshop 19. Mizuno, Hirotaka and Kosaka, Michitaka and Yajima, Hiroshi (1998), "Application of Neural Network to Technical Analysis of Stock Market Prediction," Studies in Informatic and Control, Vol.7, Issue 2, Page 111-120 20. Quinlan, J. R. (1986), "Induction of Decision Trees", Machine Learning, Vol.1, Issue 1 21. Quinlan, J. R. (1989). "Unknown Attributes Values in Induction," Machine Learning, Vol.4 Page 89-116 22. Quinlan, J. R. & Rivest, R. L. (1989), "Inferring Decision Trees Using the Minimum Description Length Principle," Information and Computation, Vol.80, Page 227-248 23. Quinlan, J. R. (1993), "C4.5: The Programs for Machines Learning," Morgan Kaufmann Publishers 24. Sullivan, Ryan and Timmermann, Allan and White, Halbert Oct. (1999), "Data-Snooping, Technical Trading Rule Performance and the Bootstrap," The Journal of Finance, Vol.54, Issue 5, Page 1647-1691 25. W. Brock, J. Lakonishok and B. LeBaron Dec. (1992), "Simple Technical Trading Rules and the Stochastic Properties of Stock Return," Journal of Finance, Vol.47, Page 1731-1764 26. Y. J. Ko and Y. J. Seo (2002), "Text categorization using feature projections," Proceedings of the Nineteenth international conference on Computational linguistics, Volume 1, pp.1-7, 2002.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內外都一年後公開 withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0730107-180923.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS