Title page for etd-0728119-132453
 論文名稱Title 基於深度規則森林的可解釋邏輯特徵學習Interpretable Logic Representation Learning based on Deep Rule Forest 系所名稱Department 資訊管理學系Department of Information Management 畢業學年期Year, semester 107 學年度 第 2 學期The spring semester of Academic Year 107 語文別Language 英文English 學位類別Degree 碩士Master 頁數Number of pages 44 研究生Author 黃升泰Sheng-Tai Huang 指導教授Advisor 召集委員Convenor 口試委員Advisory Committee 口試日期Date of Exam 2019-07-22 繳交日期Date of Submission 2019-08-28 關鍵字Keywords 可解釋性、隨機森林、邏輯最佳化、深度規則森林、深度模型結構Deep Rule Forest, Deep Model Architecture, Interpretability, Random Forest, Logic Optimization 統計Statistics 本論文已被瀏覽 6579 次，被下載 75 次The thesis/dissertation has been browsed 6579 times, has been downloaded 75 times.
 中文摘要 相較於傳統的機器學習演算法，現今大多數的演算法都在正確率上有顯著的提升，但是這些模型的架構也因此變得愈來愈複雜，讓我們無法從中理解預測是如何產生。這導致資料中可能潛藏的歧視難以被人類發覺，因此出現法規要求模型的可解釋性。然而，現有的可解釋模型（例如：決策樹、線性模型）太過於簡單，處理大型的複雜資料時無法產生足夠準確的預測。因此，我們從組成隨機森林的決策樹當中萃取規則，一來讓原本被認為是黑盒模型的隨機森林具有可解釋性，二來運用整體學習讓演算法能夠得到較佳的準確率。此外，借用深度學習中表徵學習的概念，我們加上深層的模型結構，讓隨機森林能學習更加複雜的特徵。在這篇論文當中，我們提出深度規則森林，同時結合可解釋性和深層模型結構，也在實驗中取得超越隨機森林等複雜模型的表現。但是這樣的結構卻導致其中的規則太過複雜不易理解，因此失去可解釋性。我們提出邏輯最佳化演算法，將萃取出的規則簡化，使之能成為易於人們閱讀且理解的形式並保留可解釋性。 Abstract Compared to traditional machine learning algorithms, most contemporary algorithms have prominent promotion in terms of accuracy, but this also complicate the model architecture, which disables human from understanding how the predictions are generated. This makes the latent discrimination in data difficult for human to discover, and thus there are legislations enforce that models should have interpretability. However, recent interpretable models (e.g. decision tree, linear model) are too simple to produce enough accurate predictions in case of dealing large and complex datasets. Therefore, we extract rules from the decision tree component in random forest, not only makes random forest, regarded as black box model, interpretable, but exploits ensemble learning to boost the accuracy. Moreover, inspired by the concept of representation learning in deep learning, we add multilayer structure to enable random forest to learn more complicated representation. In this paper, we propose Deep Rule Forest, with both interpretability and deep model architecture, and it outperform several complex models such as random forest on accuracy. Nevertheless, this structure makes the rules too complicated to understand by human and hence lose interpretability. At last, via logic optimization, we retain interpretability by simplifying the rules and making them readable and understandable to human.
 目次 Table of Contents 摘要 iiAbstract iiiList of Figures vList of Table vi1. Introduction 12. Background and Related Work 22.1. Tree-based Algorithms 22.2. Representation Learning 62.3. Deep Architecture 82.4. Deep Architecture Models 92.5. Explainable AI (XAI) 122.6. Logic Optimization (Logic Minimization) 133. Methodology 143.1. Building DRF 153.2. Interpretability of DRF 183.3. New Encoding for Regression Data 224. Experiment and Discussion 244.1. Experiment Setup 244.2. Classification with DRF 254.3. Regression with DRF 275. Conclusion 306. Reference 31
 參考文獻 References Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. https://doi.org/10.1561/2200000006Bengio, Yoshua. (2013). Deep Learning of Representations: Looking Forward. ArXiv:1305.0445 [Cs]. Retrieved from http://arxiv.org/abs/1305.0445Bengio, Yoshua, Courville, A., & Vincent, P. (2012). Representation Learning: A Review and New Perspectives. ArXiv:1206.5538 [Cs]. Retrieved from http://arxiv.org/abs/1206.5538Bengio, Yoshua, Delalleau, O., & Simard, C. (2010). DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS. Computational Intelligence, 26(4), 449–467. https://doi.org/10.1111/j.1467-8640.2010.00366.xBengio, Yoshua, Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks. In B. Schölkopf, J. C. Platt, & T. Hoffman (Eds.), Advances in Neural Information Processing Systems 19 (pp. 153–160). Retrieved from http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdfBergmeir, C., & Benítez, J. M. (2012). Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS. Journal of Statistical Software, 46(7). https://doi.org/10.18637/jss.v046.i07Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152. https://doi.org/10.1145/130385.130401Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59(4), 291–294. https://doi.org/10.1007/BF00332918Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655Breiman, L. (2017). Classification and Regression Trees. https://doi.org/10.1201/9781315139470Breiman, L. (n.d.). RANDOM FORESTS--RANDOM FEATURES. 29.Bunn, A., & Korpela, M. (n.d.). An introduction to dplR. 16.Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 785–794. https://doi.org/10.1145/2939672.2939785Chen, T., & He, T. (n.d.). xgboost: EXtreme Gradient Boosting. 4.Cichocki, A., Zdunek, R., & Amari, S. (2007). Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization. In M. E. Davies, C. J. James, S. A. Abdallah, & M. D. Plumbley (Eds.), Independent Component Analysis and Signal Separation (pp. 169–176). Springer Berlin Heidelberg.Cun, Y. L. (1987). Modèles connexionnistes de l’apprentissage.Doshi-Velez, F., & Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. ArXiv:1702.08608 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1702.08608Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26. https://doi.org/10.1214/aos/1176344552Freund, Y., & Schapire, R. E. (1995). A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting.Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. https://doi.org/10.1214/aos/1013203451Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1). https://doi.org/10.18637/jss.v033.i01F.R.S, K. P. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572. https://doi.org/10.1080/14786440109462720Goodman, B., & Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a “right to explanation.” AI Magazine, 38(3), 50. https://doi.org/10.1609/aimag.v38i3.2741Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys, 51(5), 1–42. https://doi.org/10.1145/3236009Hahnloser, R. H. R., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J., & Seung, H. S. (2000). Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789), 947. https://doi.org/10.1038/35016072Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102. https://doi.org/10.1016/0095-0696(78)90006-2He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. Retrieved from http://arxiv.org/abs/1512.03385Hinton, G E, McCLELLAND, J. L., & Rumelhart, D. E. (n.d.). Distributed Representations. 33.Hinton, Geoffrey E. (2009). Deep belief networks. Scholarpedia, 4(5), 5947. https://doi.org/10.4249/scholarpedia.5947Hinton, Geoffrey E, & Zemel, R. S. (1994). Autoencoders, Minimum Description Length and Helmholtz Free Energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6 (pp. 3–10). Retrieved from http://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdfHyvärinen, A., Karhunen, J., & Oja, E. (2004). Independent Component Analysis. John Wiley & Sons.Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11(9). https://doi.org/10.18637/jss.v011.i09Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5). https://doi.org/10.18637/jss.v028.i05LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539Liaw, A., & Wiener, M. (2002). Classiﬁcation and Regression by randomForest. 2, 5.McCluskey, E. J. (1956). Minimization of Boolean functions. The Bell System Technical Journal, 35(6), 1417–1444. https://doi.org/10.1002/j.1538-7305.1956.tb03835.xMiller, K., Hettinger, C., Humpherys, J., Jarvis, T., & Kartchner, D. (2017). Forward Thinking: Building Deep Random Forests. ArXiv:1705.07366 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1705.07366Peer to Peer Lending & Alternative Investing | Save with LendingClub. (n.d.). Retrieved November 15, 2018, from https://www.lendingclub.com/Quine, W. V. (1952). The Problem of Simplifying Truth Functions. The American Mathematical Monthly, 59(8), 521–531. https://doi.org/10.2307/2308219Quine, W. V. (1955). A Way to Simplify Truth Functions. The American Mathematical Monthly, 62(9), 627–631. https://doi.org/10.2307/2307285Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251Quinlan, J. Ross. (1993). C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Ragin, C. C. (2014). The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies. Univ of California Press.Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1602.04938RStudio Team. (2015). RStudio: Integrated Development for R. Retrieved from http://www.rstudio.com/.Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1 (David E. Rumelhart, J. L. McClelland, & C. PDP Research Group, Eds.). Retrieved from http://dl.acm.org/citation.cfm?id=104279.104293Santosa, F., & Symes, W. (1986). Linear Inversion of Band-Limited Reflection Seismograms. SIAM Journal on Scientific and Statistical Computing, 7(4), 1307–1330. https://doi.org/10.1137/0907087Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. Retrieved from http://arxiv.org/abs/1409.1556Sirikulviriya, N., & Sinthupinyo, S. (n.d.). Integration of Rules from a Random Forest. 5.Su, G., Wei, D., Varshney, K. R., & Malioutov, D. M. (2015). Interpretable Two-level Boolean Rule Learning for Classification. ArXiv:1511.07361 [Cs]. Retrieved from http://arxiv.org/abs/1511.07361Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2016). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. ArXiv:1602.07261 [Cs]. Retrieved from http://arxiv.org/abs/1602.07261Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. 2818–2826. Retrieved from https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.htmlTherneau, T. M., Atkinson, E. J., & Foundation, M. (n.d.). An Introduction to Recursive Partitioning Using the RPART Routines. 60.UCI Machine Learning Repository: Abalone Data Set. (n.d.). Retrieved June 28, 2019, from http://archive.ics.uci.edu/ml/datasets/AbaloneUCI Machine Learning Repository: Iris Data Set. (n.d.). Retrieved November 15, 2018, from http://archive.ics.uci.edu/ml/datasets/IrisUCI Machine Learning Repository: Poker Hand Data Set. (n.d.). Retrieved July 14, 2019, from https://archive.ics.uci.edu/ml/datasets/Poker+HandWickham, H. (2009). Ggplot2: Elegant graphics for data analysis. New York: Springer.Zhou, Z.-H., & Feng, J. (2017). Deep Forest. ArXiv:1702.08835 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1702.08835
 電子全文 Fulltext 本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define開放時間 Available：校內 Campus： 已公開 available 校外 Off-campus： 已公開 available 紙本論文 Printed copies 紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available
 QR Code