Responsive image
博碩士論文 etd-0728119-132453 詳細資訊
Title page for etd-0728119-132453
Interpretable Logic Representation Learning based on Deep Rule Forest
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
Deep Rule Forest, Deep Model Architecture, Interpretability, Random Forest, Logic Optimization
本論文已被瀏覽 6579 次,被下載 75
The thesis/dissertation has been browsed 6579 times, has been downloaded 75 times.
Compared to traditional machine learning algorithms, most contemporary algorithms have prominent promotion in terms of accuracy, but this also complicate the model architecture, which disables human from understanding how the predictions are generated. This makes the latent discrimination in data difficult for human to discover, and thus there are legislations enforce that models should have interpretability. However, recent interpretable models (e.g. decision tree, linear model) are too simple to produce enough accurate predictions in case of dealing large and complex datasets. Therefore, we extract rules from the decision tree component in random forest, not only makes random forest, regarded as black box model, interpretable, but exploits ensemble learning to boost the accuracy. Moreover, inspired by the concept of representation learning in deep learning, we add multilayer structure to enable random forest to learn more complicated representation. In this paper, we propose Deep Rule Forest, with both interpretability and deep model architecture, and it outperform several complex models such as random forest on accuracy. Nevertheless, this structure makes the rules too complicated to understand by human and hence lose interpretability. At last, via logic optimization, we retain interpretability by simplifying the rules and making them readable and understandable to human.
目次 Table of Contents
摘要 ii
Abstract iii
List of Figures v
List of Table vi
1. Introduction 1
2. Background and Related Work 2
2.1. Tree-based Algorithms 2
2.2. Representation Learning 6
2.3. Deep Architecture 8
2.4. Deep Architecture Models 9
2.5. Explainable AI (XAI) 12
2.6. Logic Optimization (Logic Minimization) 13
3. Methodology 14
3.1. Building DRF 15
3.2. Interpretability of DRF 18
3.3. New Encoding for Regression Data 22
4. Experiment and Discussion 24
4.1. Experiment Setup 24
4.2. Classification with DRF 25
4.3. Regression with DRF 27
5. Conclusion 30
6. Reference 31
參考文獻 References
Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127.
Bengio, Yoshua. (2013). Deep Learning of Representations: Looking Forward. ArXiv:1305.0445 [Cs]. Retrieved from
Bengio, Yoshua, Courville, A., & Vincent, P. (2012). Representation Learning: A Review and New Perspectives. ArXiv:1206.5538 [Cs]. Retrieved from
Bengio, Yoshua, Delalleau, O., & Simard, C. (2010). DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS. Computational Intelligence, 26(4), 449–467.
Bengio, Yoshua, Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks. In B. Schölkopf, J. C. Platt, & T. Hoffman (Eds.), Advances in Neural Information Processing Systems 19 (pp. 153–160). Retrieved from
Bergmeir, C., & Benítez, J. M. (2012). Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS. Journal of Statistical Software, 46(7).
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152.
Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59(4), 291–294.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Breiman, L. (2017). Classification and Regression Trees.
Bunn, A., & Korpela, M. (n.d.). An introduction to dplR. 16.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 785–794.
Chen, T., & He, T. (n.d.). xgboost: EXtreme Gradient Boosting. 4.
Cichocki, A., Zdunek, R., & Amari, S. (2007). Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization. In M. E. Davies, C. J. James, S. A. Abdallah, & M. D. Plumbley (Eds.), Independent Component Analysis and Signal Separation (pp. 169–176). Springer Berlin Heidelberg.
Cun, Y. L. (1987). Modèles connexionnistes de l’apprentissage.
Doshi-Velez, F., & Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. ArXiv:1702.08608 [Cs, Stat]. Retrieved from
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26.
Freund, Y., & Schapire, R. E. (1995). A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1).
F.R.S, K. P. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572.
Goodman, B., & Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a “right to explanation.” AI Magazine, 38(3), 50.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys, 51(5), 1–42.
Hahnloser, R. H. R., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J., & Seung, H. S. (2000). Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789), 947.
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. Retrieved from
Hinton, G E, McCLELLAND, J. L., & Rumelhart, D. E. (n.d.). Distributed Representations. 33.
Hinton, Geoffrey E. (2009). Deep belief networks. Scholarpedia, 4(5), 5947.
Hinton, Geoffrey E, & Zemel, R. S. (1994). Autoencoders, Minimum Description Length and Helmholtz Free Energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6 (pp. 3–10). Retrieved from
Hyvärinen, A., Karhunen, J., & Oja, E. (2004). Independent Component Analysis. John Wiley & Sons.
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11(9).
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5).
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. 2, 5.
McCluskey, E. J. (1956). Minimization of Boolean functions. The Bell System Technical Journal, 35(6), 1417–1444.
Miller, K., Hettinger, C., Humpherys, J., Jarvis, T., & Kartchner, D. (2017). Forward Thinking: Building Deep Random Forests. ArXiv:1705.07366 [Cs, Stat]. Retrieved from
Peer to Peer Lending & Alternative Investing | Save with LendingClub. (n.d.). Retrieved November 15, 2018, from
Quine, W. V. (1952). The Problem of Simplifying Truth Functions. The American Mathematical Monthly, 59(8), 521–531.
Quine, W. V. (1955). A Way to Simplify Truth Functions. The American Mathematical Monthly, 62(9), 627–631.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Quinlan, J. Ross. (1993). C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Ragin, C. C. (2014). The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies. Univ of California Press.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. Retrieved from
RStudio Team. (2015). RStudio: Integrated Development for R. Retrieved from
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1 (David E. Rumelhart, J. L. McClelland, & C. PDP Research Group, Eds.). Retrieved from
Santosa, F., & Symes, W. (1986). Linear Inversion of Band-Limited Reflection Seismograms. SIAM Journal on Scientific and Statistical Computing, 7(4), 1307–1330.
Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. Retrieved from
Sirikulviriya, N., & Sinthupinyo, S. (n.d.). Integration of Rules from a Random Forest. 5.
Su, G., Wei, D., Varshney, K. R., & Malioutov, D. M. (2015). Interpretable Two-level Boolean Rule Learning for Classification. ArXiv:1511.07361 [Cs]. Retrieved from
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2016). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. ArXiv:1602.07261 [Cs]. Retrieved from
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. 2818–2826. Retrieved from
Therneau, T. M., Atkinson, E. J., & Foundation, M. (n.d.). An Introduction to Recursive Partitioning Using the RPART Routines. 60.
UCI Machine Learning Repository: Abalone Data Set. (n.d.). Retrieved June 28, 2019, from
UCI Machine Learning Repository: Iris Data Set. (n.d.). Retrieved November 15, 2018, from
UCI Machine Learning Repository: Poker Hand Data Set. (n.d.). Retrieved July 14, 2019, from
Wickham, H. (2009). Ggplot2: Elegant graphics for data analysis. New York: Springer.
Zhou, Z.-H., & Feng, J. (2017). Deep Forest. ArXiv:1702.08835 [Cs, Stat]. Retrieved from
電子全文 Fulltext
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available

紙本論文 Printed copies
開放時間 available 已公開 available

QR Code