論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
探索性多模態機器學習模型—以房產鑑價為例 Exploration of Multimodal Machine Learning Model - Findings from Real Estate Valuation |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
46 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2020-08-28 |
繳交日期 Date of Submission |
2020-09-04 |
關鍵字 Keywords |
房產鑑價、模型可解釋性、多模態模型、卷積神經網路 Convolutional neural network, Real estate value evaluation, Multi-modal model, Model interpretability |
||
統計 Statistics |
本論文已被瀏覽 6211 次,被下載 106 次 The thesis/dissertation has been browsed 6211 times, has been downloaded 106 times. |
中文摘要 |
機器學習以及深度學習近年來被廣泛應用在各個領域,然而許多模型在追求預 測表現的同時卻犧牲了模型的可解釋性,使得模型像是黑盒子一樣讓人難以理解。在 本文中,我們提出透過多模態模型的概念來使得模型同時擁有預測準確度以及模型的 可解釋性。模型的可解釋性指得是我們能不能去了解模型是如何產生預測結果的,或 者是說模型是根據哪些特徵來產生預測。我們透過房地產鑑價為例子並搭配我們所提 出的多模態模型架構來驗證我們的想法,使得模型在擁有好的預測表現的同時,也具 有一定的解釋能力。而模型的可解釋性我們更進一步透過模型所學到的特徵來做全局 解釋以及透過局部解釋器來做局部解釋。 |
Abstract |
Machine learning and deep learning have been woidely used in various fields in recent years. However, many models sacrifice the model interpretability while purchasing the predictive performance, which make the model difficult to understand like a block box. In this article, we propose the concept of multi-modal models to enable the model to have both predictive performance and model interpretability. The interpretability of a model refers to whether we can understand how the model produces the predictions. We take real estate value evaluation task as an example with our propsed method yp verify our ideas, so that the model has a good predictive performance while also having a certain explanatory power. As for the interpretability of the model, we further use the features learned by the model to make a global explanations and a local explainer to make a local explanations. |
目次 Table of Contents |
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 v List of Figures vi List of Tables vii 1. Introduction 1 2. Background & Related Work 3 2.1. Explainable AI 3 2.2. Housing Price Estimation 6 2.3. Representation Learning 8 2.4. Multi-modal Model 10 2.5. LIME explainer 11 3. Methodology 13 3.1. Real estate transaction dataset 14 3.2. Boosting model’s predictive performance by image features 14 3.3. Interpretability of model 17 3.3.1. Explaining the models by sets of labels 18 3.3.2. Explaining the models by LIME explainer 19 4. Experimental Results 22 4.1. Experiment environment 22 4.2. Data pre-processing 23 4.3. The base real estate value evaluation model 25 4.4. The effect of images’ embeddings 26 4.5. Model interpretability 28 4.5.1. Explain the models by images’ labels 28 4.5.2. Explain the models by LIME explainer 30 5. Conclusion 35 6. Reference 35 |
參考文獻 References |
Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2019). Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607 Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. https://doi.org/10.1561/2200000006 Bengio, Yoshua, Courville, A., & Vincent, P. (2014). Representation Learning: A Review and New Perspectives. ArXiv:1206.5538 [Cs]. http://arxiv.org/abs/1206.5538 Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ArXiv:1412.7062 [Cs]. http://arxiv.org/abs/1412.7062 Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255. https://doi.org/10.1109/CVPR.2009.5206848 Detect Labels | Cloud Vision API. (n.d.). Google Cloud. Retrieved August 17, 2020, from https://cloud.google.com/vision/docs/labels?hl=zh-tw D’mello, S. K., & Kory, J. (2015). A Review and Meta-Analysis of Multimodal Affect Detection Systems. ACM Computing Surveys, 47(3), 1–36. https://doi.org/10.1145/2682899 Dubey, A., Naik, N., Parikh, D., Raskar, R., & Hidalgo, C. A. (2016). Deep Learning the City: Quantifying Urban Perception At A Global Scale. ArXiv:1608.01769 [Cs]. http://arxiv.org/abs/1608.01769 Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01 Fu, X., Jia, T., Zhang, X., Li, S., & Zhang, Y. (2019). Do street-level scene perceptions affect housing prices in Chinese megacities? An analysis using open access datasets and deep learning. PLOS ONE, 14(5), e0217505. https://doi.org/10.1371/journal.pone.0217505 Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), 1440–1448. https://doi.org/10.1109/ICCV.2015.169 Goodman, B., & Flaxman, S. (2017). European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation.” AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741 He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385 Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597 Hodosh, M., Young, P., & Hockenmaier, J. (n.d.). Framing Image Description as a Ranking Task Data, Models and Evaluation Metrics Extended Abstract. 5. K-means clustering. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=K-means_clustering&oldid=973148926 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105). Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Law, S., Paige, B., & Russell, C. (2019). Take a Look Around: Using Street View and Satellite Images to Estimate House Prices. ACM Transactions on Intelligent Systems and Technology, 10(5), 1–19. https://doi.org/10.1145/3342240 Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94 Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf RBM, wikipedia. (2019). In 維基百科,自由的百科全書. https://zh.wikipedia.org/w/index.php?title=%E5%8F%97%E9%99%90%E7%8E%BB%E5%B0%94%E5%85%B9%E6%9B%BC%E6%9C%BA&oldid=57289227 Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. ArXiv:1605.05396 [Cs]. http://arxiv.org/abs/1605.05396 Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. http://arxiv.org/abs/1602.04938 RStudio | Open source & professional software for data science teams. (n.d.). Retrieved July 15, 2020, from https://rstudio.com/ Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ArXiv:1708.08296 [Cs, Stat]. http://arxiv.org/abs/1708.08296 Seresinhe, C. I., Preis, T., & Moat, H. S. (n.d.). Using deep learning to quantify the beauty of outdoor places. Royal Society Open Science, 4(7), 170170. https://doi.org/10.1098/rsos.170170 Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961 Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556 Srivastava, N., & Salakhutdinov, R. (n.d.). Multimodal Learning with Deep Boltzmann Machines. 32. Therneau, T. M., Atkinson, E. J., & Foundation, M. (n.d.). An Introduction to Recursive Partitioning Using the RPART Routines. 60. Wright, M. N., & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1). https://doi.org/10.18637/jss.v077.i01 Yuhas, B. P., Goldstein, M. H., & Sejnowski, T. J. (1989). Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, 27(11), 65–71. https://doi.org/10.1109/35.41402 |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外完全公開 unrestricted 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |