Responsive image
博碩士論文 etd-0804120-103028 詳細資訊
Title page for etd-0804120-103028
論文名稱
Title
探索性多模態機器學習模型—以房產鑑價為例
Exploration of Multimodal Machine Learning Model - Findings from Real Estate Valuation
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
46
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2020-08-28
繳交日期
Date of Submission
2020-09-04
關鍵字
Keywords
房產鑑價、模型可解釋性、多模態模型、卷積神經網路
Convolutional neural network, Real estate value evaluation, Multi-modal model, Model interpretability
統計
Statistics
本論文已被瀏覽 6113 次,被下載 105
The thesis/dissertation has been browsed 6113 times, has been downloaded 105 times.
中文摘要
機器學習以及深度學習近年來被廣泛應用在各個領域,然而許多模型在追求預 測表現的同時卻犧牲了模型的可解釋性,使得模型像是黑盒子一樣讓人難以理解。在 本文中,我們提出透過多模態模型的概念來使得模型同時擁有預測準確度以及模型的 可解釋性。模型的可解釋性指得是我們能不能去了解模型是如何產生預測結果的,或 者是說模型是根據哪些特徵來產生預測。我們透過房地產鑑價為例子並搭配我們所提 出的多模態模型架構來驗證我們的想法,使得模型在擁有好的預測表現的同時,也具 有一定的解釋能力。而模型的可解釋性我們更進一步透過模型所學到的特徵來做全局 解釋以及透過局部解釋器來做局部解釋。
Abstract
Machine learning and deep learning have been woidely used in various fields in recent years. However, many models sacrifice the model interpretability while purchasing the predictive performance, which make the model difficult to understand like a block box. In this article, we propose the concept of multi-modal models to enable the model to have both predictive performance and model interpretability. The interpretability of a model refers to whether we can understand how the model produces the predictions. We take real estate value evaluation task as an example with our propsed method yp verify our ideas, so that the model has a good predictive performance while also having a certain explanatory power. As for the interpretability of the model, we further use the features learned by the model to make a global explanations and a local explainer to make a local explanations.
目次 Table of Contents
論文審定書 i
誌謝 ii
摘要 iii
Abstract iv
目錄 v
List of Figures vi
List of Tables vii
1. Introduction 1
2. Background & Related Work 3
2.1. Explainable AI 3
2.2. Housing Price Estimation 6
2.3. Representation Learning 8
2.4. Multi-modal Model 10
2.5. LIME explainer 11
3. Methodology 13
3.1. Real estate transaction dataset 14
3.2. Boosting model’s predictive performance by image features 14
3.3. Interpretability of model 17
3.3.1. Explaining the models by sets of labels 18
3.3.2. Explaining the models by LIME explainer 19
4. Experimental Results 22
4.1. Experiment environment 22
4.2. Data pre-processing 23
4.3. The base real estate value evaluation model 25
4.4. The effect of images’ embeddings 26
4.5. Model interpretability 28
4.5.1. Explain the models by images’ labels 28
4.5.2. Explain the models by LIME explainer 30
5. Conclusion 35
6. Reference 35
參考文獻 References
Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2019). Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607
Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. https://doi.org/10.1561/2200000006
Bengio, Yoshua, Courville, A., & Vincent, P. (2014). Representation Learning: A Review and New Perspectives. ArXiv:1206.5538 [Cs]. http://arxiv.org/abs/1206.5538
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ArXiv:1412.7062 [Cs]. http://arxiv.org/abs/1412.7062
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Detect Labels | Cloud Vision API. (n.d.). Google Cloud. Retrieved August 17, 2020, from https://cloud.google.com/vision/docs/labels?hl=zh-tw
D’mello, S. K., & Kory, J. (2015). A Review and Meta-Analysis of Multimodal Affect Detection Systems. ACM Computing Surveys, 47(3), 1–36. https://doi.org/10.1145/2682899
Dubey, A., Naik, N., Parikh, D., Raskar, R., & Hidalgo, C. A. (2016). Deep Learning the City: Quantifying Urban Perception At A Global Scale. ArXiv:1608.01769 [Cs]. http://arxiv.org/abs/1608.01769
Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
Fu, X., Jia, T., Zhang, X., Li, S., & Zhang, Y. (2019). Do street-level scene perceptions affect housing prices in Chinese megacities? An analysis using open access datasets and deep learning. PLOS ONE, 14(5), e0217505. https://doi.org/10.1371/journal.pone.0217505
Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Goodman, B., & Flaxman, S. (2017). European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation.” AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597
Hodosh, M., Young, P., & Hockenmaier, J. (n.d.). Framing Image Description as a Ranking Task Data, Models and Evaluation Metrics Extended Abstract. 5.
K-means clustering. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=K-means_clustering&oldid=973148926
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105). Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Law, S., Paige, B., & Russell, C. (2019). Take a Look Around: Using Street View and Satellite Images to Estimate House Prices. ACM Transactions on Intelligent Systems and Technology, 10(5), 1–19. https://doi.org/10.1145/3342240
Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
RBM, wikipedia. (2019). In 維基百科,自由的百科全書. https://zh.wikipedia.org/w/index.php?title=%E5%8F%97%E9%99%90%E7%8E%BB%E5%B0%94%E5%85%B9%E6%9B%BC%E6%9C%BA&oldid=57289227
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. ArXiv:1605.05396 [Cs]. http://arxiv.org/abs/1605.05396
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. http://arxiv.org/abs/1602.04938
RStudio | Open source & professional software for data science teams. (n.d.). Retrieved July 15, 2020, from https://rstudio.com/
Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ArXiv:1708.08296 [Cs, Stat]. http://arxiv.org/abs/1708.08296
Seresinhe, C. I., Preis, T., & Moat, H. S. (n.d.). Using deep learning to quantify the beauty of outdoor places. Royal Society Open Science, 4(7), 170170. https://doi.org/10.1098/rsos.170170
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556
Srivastava, N., & Salakhutdinov, R. (n.d.). Multimodal Learning with Deep Boltzmann Machines. 32.
Therneau, T. M., Atkinson, E. J., & Foundation, M. (n.d.). An Introduction to Recursive Partitioning Using the RPART Routines. 60.
Wright, M. N., & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1). https://doi.org/10.18637/jss.v077.i01
Yuhas, B. P., Goldstein, M. H., & Sejnowski, T. J. (1989). Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, 27(11), 65–71. https://doi.org/10.1109/35.41402
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code