國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,探索性多模態機器學習模型—以房產鑑價為例,Exploration of Multimodal Machine Learning Model

論文名稱 Title	探索性多模態機器學習模型—以房產鑑價為例 Exploration of Multimodal Machine Learning Model - Findings from Real Estate Valuation
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	109 學年度第 1 學期 The fall semester of Academic Year 109	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	46
研究生 Author	許紘齊 Hung-Chi Hsu
指導教授 Advisor	康藝晃 Yihuang Kang
召集委員 Convenor	林耕霈 Lin Keng-Pei
口試委員 Advisory Committee	簡士鎰 Shih-Yi Chien
口試日期 Date of Exam	2020-08-28	繳交日期 Date of Submission	2020-09-04
關鍵字 Keywords	房產鑑價、模型可解釋性、多模態模型、卷積神經網路 Convolutional neural network, Real estate value evaluation, Multi-modal model, Model interpretability
統計 Statistics	本論文已被瀏覽 6355 次，被下載 107 次 The thesis/dissertation has been browsed 6355 times, has been downloaded 107 times.

中文摘要
機器學習以及深度學習近年來被廣泛應用在各個領域，然而許多模型在追求預測表現的同時卻犧牲了模型的可解釋性，使得模型像是黑盒子一樣讓人難以理解。在本文中，我們提出透過多模態模型的概念來使得模型同時擁有預測準確度以及模型的可解釋性。模型的可解釋性指得是我們能不能去了解模型是如何產生預測結果的，或者是說模型是根據哪些特徵來產生預測。我們透過房地產鑑價為例子並搭配我們所提出的多模態模型架構來驗證我們的想法，使得模型在擁有好的預測表現的同時，也具有一定的解釋能力。而模型的可解釋性我們更進一步透過模型所學到的特徵來做全局解釋以及透過局部解釋器來做局部解釋。
Abstract
Machine learning and deep learning have been woidely used in various fields in recent years. However, many models sacrifice the model interpretability while purchasing the predictive performance, which make the model difficult to understand like a block box. In this article, we propose the concept of multi-modal models to enable the model to have both predictive performance and model interpretability. The interpretability of a model refers to whether we can understand how the model produces the predictions. We take real estate value evaluation task as an example with our propsed method yp verify our ideas, so that the model has a good predictive performance while also having a certain explanatory power. As for the interpretability of the model, we further use the features learned by the model to make a global explanations and a local explainer to make a local explanations.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 v List of Figures vi List of Tables vii 1. Introduction 1 2. Background & Related Work 3 2.1. Explainable AI 3 2.2. Housing Price Estimation 6 2.3. Representation Learning 8 2.4. Multi-modal Model 10 2.5. LIME explainer 11 3. Methodology 13 3.1. Real estate transaction dataset 14 3.2. Boosting model’s predictive performance by image features 14 3.3. Interpretability of model 17 3.3.1. Explaining the models by sets of labels 18 3.3.2. Explaining the models by LIME explainer 19 4. Experimental Results 22 4.1. Experiment environment 22 4.2. Data pre-processing 23 4.3. The base real estate value evaluation model 25 4.4. The effect of images’ embeddings 26 4.5. Model interpretability 28 4.5.1. Explain the models by images’ labels 28 4.5.2. Explain the models by LIME explainer 30 5. Conclusion 35 6. Reference 35

參考文獻 References
Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2019). Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607 Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. https://doi.org/10.1561/2200000006 Bengio, Yoshua, Courville, A., & Vincent, P. (2014). Representation Learning: A Review and New Perspectives. ArXiv:1206.5538 [Cs]. http://arxiv.org/abs/1206.5538 Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ArXiv:1412.7062 [Cs]. http://arxiv.org/abs/1412.7062 Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255. https://doi.org/10.1109/CVPR.2009.5206848 Detect Labels \| Cloud Vision API. (n.d.). Google Cloud. Retrieved August 17, 2020, from https://cloud.google.com/vision/docs/labels?hl=zh-tw D’mello, S. K., & Kory, J. (2015). A Review and Meta-Analysis of Multimodal Affect Detection Systems. ACM Computing Surveys, 47(3), 1–36. https://doi.org/10.1145/2682899 Dubey, A., Naik, N., Parikh, D., Raskar, R., & Hidalgo, C. A. (2016). Deep Learning the City: Quantifying Urban Perception At A Global Scale. ArXiv:1608.01769 [Cs]. http://arxiv.org/abs/1608.01769 Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01 Fu, X., Jia, T., Zhang, X., Li, S., & Zhang, Y. (2019). Do street-level scene perceptions affect housing prices in Chinese megacities? An analysis using open access datasets and deep learning. PLOS ONE, 14(5), e0217505. https://doi.org/10.1371/journal.pone.0217505 Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), 1440–1448. https://doi.org/10.1109/ICCV.2015.169 Goodman, B., & Flaxman, S. (2017). European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation.” AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741 He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385 Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597 Hodosh, M., Young, P., & Hockenmaier, J. (n.d.). Framing Image Description as a Ranking Task Data, Models and Evaluation Metrics Extended Abstract. 5. K-means clustering. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=K-means_clustering&oldid=973148926 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105). Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Law, S., Paige, B., & Russell, C. (2019). Take a Look Around: Using Street View and Satellite Images to Estimate House Prices. ACM Transactions on Intelligent Systems and Technology, 10(5), 1–19. https://doi.org/10.1145/3342240 Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94 Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf RBM, wikipedia. (2019). In 維基百科，自由的百科全書. https://zh.wikipedia.org/w/index.php?title=%E5%8F%97%E9%99%90%E7%8E%BB%E5%B0%94%E5%85%B9%E6%9B%BC%E6%9C%BA&oldid=57289227 Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. ArXiv:1605.05396 [Cs]. http://arxiv.org/abs/1605.05396 Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. http://arxiv.org/abs/1602.04938 RStudio \| Open source & professional software for data science teams. (n.d.). Retrieved July 15, 2020, from https://rstudio.com/ Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ArXiv:1708.08296 [Cs, Stat]. http://arxiv.org/abs/1708.08296 Seresinhe, C. I., Preis, T., & Moat, H. S. (n.d.). Using deep learning to quantify the beauty of outdoor places. Royal Society Open Science, 4(7), 170170. https://doi.org/10.1098/rsos.170170 Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961 Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556 Srivastava, N., & Salakhutdinov, R. (n.d.). Multimodal Learning with Deep Boltzmann Machines. 32. Therneau, T. M., Atkinson, E. J., & Foundation, M. (n.d.). An Introduction to Recursive Partitioning Using the RPART Routines. 60. Wright, M. N., & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1). https://doi.org/10.18637/jss.v077.i01 Yuhas, B. P., Goldstein, M. H., & Sejnowski, T. J. (1989). Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, 27(11), 65–71. https://doi.org/10.1109/35.41402

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0804120-103028.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS