國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於輔助性多工與可解釋多標籤學習的食材辨識系統,Food Ingredients Recognition via Interpretable Multi-label Learning with Auxiliary tasks

論文名稱 Title	基於輔助性多工與可解釋多標籤學習的食材辨識系統 Food Ingredients Recognition via Interpretable Multi-label Learning with Auxiliary tasks
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	108 學年度第 2 學期 The spring semester of Academic Year 108	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	41
研究生 Author	張薰月 Hsun-Yueh Chang
指導教授 Advisor	康藝晃 Yihuang Kang
召集委員 Convenor	黃三益 Hwang San-Yih
口試委員 Advisory Committee	李珮如 LEE, PEI-JU
口試日期 Date of Exam	2020-07-30	繳交日期 Date of Submission	2020-08-23
關鍵字 Keywords	詞嵌入、多標籤學習、卷積神經網絡、多任務學習、機器學習可解釋性 Word-embedding, Convolutional neural network, Multi-task learning, Machine Learning Interpretability, Multi-label Learning
統計 Statistics	本論文已被瀏覽 6027 次，被下載 108 次 The thesis/dissertation has been browsed 6027 times, has been downloaded 108 times.

中文摘要
隨著大數據的興起，深度學習已廣泛用於解決各種分類問題，在食品相關領域，食材識別是一種熱門且具有挑戰性的應用。挑戰之一是烹飪後食材難以識別，另一個挑戰是多標籤學習。在本研究中，我們將多標籤學習應用於 BBC食品網站上的食譜資料，嘗試在食物圖像中找到相應的食材。我們提出了一種多任務學習方法來解決多標籤問題。首先將烹飪步驟的文本內容做轉換，得到的向量用作多任務學習的輸出之一，而另一個輸出是食材。我們的方法通過多任務學習，兩個任務彼此共享學習到的資訊，可以學習單任務學習無法學習的資訊，從而提高食材預測的準確性，並對模型提供可理解的解釋。
Abstract
With the rise of big data in recent years, deep learning has been extensively used to solve various classification problems, for food-related fields, ingredient recognition is one of the popular and challenging applications. One of the challenges is the difficulty of recognition after cooking, and another challenge is multi-label learning. In this thesis, we try to find the corresponding ingredient set in food images from the recipe data on the BBC food website. by proposing a deep learning multi-task learning algorithm to solve this multi-label problem. This method first converts the cooking instruction text into the vector and uses it as one of the outputs of multi-task learning, and another output is the ingredient set. With multi-task learning, the two tasks share the learned information with each other, and learn the patterns that single-task learning may not learn, thereby improving the accuracy of the ingredient prediction and providing an understandable explanation for the model.

目次 Table of Contents
論文審定書................................................................................................................... i 誌謝............................................................................................................................. ii 摘要............................................................................................................................ iii Abstract....................................................................................................................... iv 目錄............................................................................................................................. v List of Figures.............................................................................................................. vi List of Table.................................................................................................................vii 1. Introduction................................................................................................................1 2. Background and Related Work.................................................................................3 2.1 Convolutional neural network...............................................................................3 2.2 Food Understanding............................................................................................ 5 2.3 Multi-label classification......................................................................................8 2.4 Multi-task Learning........................................................................................... 13 2.5 Word embedding...............................................................................................15 2.6 Explainable AI..................................................................................................17 3. Proposed approach............................................................................................18 4. Experiments...................................................................................................... 23 4.1 Dataset............................................................................................................. 23 4.2 Evaluation metrics.............................................................................................23 4.3 Comparison Methods........................................................................................24 4.4 Experimental Results.........................................................................................25 5. Conclusion.............................................................................................................. 27 6. References...............................................................................................................28

參考文獻 References
Agrawal, R., Gupta, A., Prabhu, Y., & Varma, M. (n.d.). Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages. 11. Argyriou, A., Evgeniou, T., & Pontil, M. (n.d.). Convex Multi-Task Feature Learning. 40. Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. https://doi.org/10.1561/2200000006 Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (n.d.). Sparse Local Embeddings for Extreme Multi-label Classiﬁcation. 18. Bossard, L., Guillaumin, M., & Van Gool, L. (2014). Food-101 – Mining Discriminative Components with Random Forests. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision – ECCV 2014 (Vol. 8694, pp. 446–461). Springer International Publishing. https://doi.org/10.1007/978-3-319-10599-4_29 Caruana, R. (n.d.). Multitask Learning. 35. Chen, J., & Ngo, C. (2016). Deep-based Ingredient Recognition for Cooking Recipe Retrieval. Proceedings of the 2016 ACM on Multimedia Conference - MM ’16, 32–41. https://doi.org/10.1145/2964284.2964315 Convolutional neural network. (2019). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&oldid=925804458 Ege, T., & Yanai, K. (2017). Image-Based Food Calorie Estimation Using Knowledge on Food Categories, Ingredients and Cooking Directions. Proceedings of the on Thematic Workshops of ACM Multimedia 2017 - Thematic Workshops ’17, 367–375. https://doi.org/10.1145/3126686.3126742 Evgeniou, T. (2004). Regularized multi-task learning. 109–117. Jain, H., Prabhu, Y., & Varma, M. (2016). Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 935–944. https://doi.org/10.1145/2939672.2939756 Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of Tricks for Efficient Text Classification. ArXiv:1607.01759 [Cs]. http://arxiv.org/abs/1607.01759 Kang, Y., Cheng, I.-L., Mao, W., Kuo, B., & Lee, P.-J. (2019). Towards Interpretable Deep Extreme Multi-label Learning. ArXiv:1907.01723 [Cs, Stat]. http://arxiv.org/abs/1907.01723 Kawano, Y., & Yanai, K. (2014). Food image recognition with deep convolutional features. Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing Adjunct Publication - UbiComp ’14 Adjunct, 589–593. https://doi.org/10.1145/2638728.2641339 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105). Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539 Liang, Y., & Li, J. (2017). Computer vision-based food calorie estimation: Dataset, method, and experiment. ArXiv:1705.07632 [Cs]. http://arxiv.org/abs/1705.07632 Liu, J., Chang, W.-C., Wu, Y., & Yang, Y. (2017). Deep Learning for Extreme Multi-label Text Classification. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’17, 115–124. https://doi.org/10.1145/3077136.3080834 Lounici, K., Pontil, M., Tsybakov, A. B., & van de Geer, S. (2009). Taking Advantage of Sparsity in Multi-Task Learning. ArXiv:0903.1468 [Math, Stat]. http://arxiv.org/abs/0903.1468 Marin, J., Biswas, A., Ofli, F., Hynes, N., Salvador, A., Aytar, Y., Weber, I., & Torralba, A. (2019). Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. https://doi.org/10.1109/TPAMI.2019.2927476 Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (n.d.). Distributed Representations of Words and Phrases and their Compositionality. 9. Myers, A., Johnston, N., Rathod, V., Korattikara, A., Gorban, A., Silberman, N., Guadarrama, S., Papandreou, G., Huang, J., & Murphy, K. (2015). Im2Calories: Towards an Automated Mobile Vision Food Diary. 2015 IEEE International Conference on Computer Vision (ICCV), 1233–1241. https://doi.org/10.1109/ICCV.2015.146 Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162 Pouladzadeh, P., Kuhad, P., Peddi, S. V. B., Yassine, A., & Shirmohammadi, S. (2016). Food calorie measurement using deep learning neural network. 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings, 1–6. https://doi.org/10.1109/I2MTC.2016.7520547 Pouladzadeh, P., Yassine, A., & Shirmohammadi, S. (2015). FooDD: Food Detection Dataset for Calorie Measurement Using Food Images. In V. Murino, E. Puppo, D. Sona, M. Cristani, & C. Sansone (Eds.), New Trends in Image Analysis and Processing—ICIAP 2015 Workshops (Vol. 9281, pp. 441–448). Springer International Publishing. https://doi.org/10.1007/978-3-319-23222-5_54 Prabhu, Y., & Varma, M. (2014). FastXML: A fast, accurate and stable tree-classifier for extreme multi-label learning. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’14, 263–272. https://doi.org/10.1145/2623330.2623651 Recipes—BBC Food. (n.d.). Retrieved July 23, 2020, from https://www.bbc.co.uk/food/recipes Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. http://arxiv.org/abs/1602.04938 Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. ArXiv:1706.05098 [Cs, Stat]. http://arxiv.org/abs/1706.05098 Salvador, A., Drozdzal, M., Giro-i-Nieto, X., & Romero, A. (n.d.). Inverse Cooking: Recipe Generation From Food Images. 10. Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ArXiv:1708.08296 [Cs, Stat]. http://arxiv.org/abs/1708.08296 Shen, D., Wang, G., Wang, W., Min, M. R., Su, Q., Zhang, Y., Li, C., Henao, R., & Carin, L. (2018). Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms. ArXiv:1805.09843 [Cs]. http://arxiv.org/abs/1805.09843 Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556 Sorower, M. S. (2010). A Literature Survey on Algorithms for Multi-label Learning. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv:1409.4842 [Cs]. http://arxiv.org/abs/1409.4842 Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., & Xu, W. (2016). CNN-RNN: A Unified Framework for Multi-label Image Classification. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2285–2294. https://doi.org/10.1109/CVPR.2016.251 Wei, T., & Li, Y.-F. (n.d.). Does Tail Label Help for Large-Scale Multi-Label Learning. 7. Yeh, C.-K., Wu, W.-C., Ko, W.-J., & Wang, Y.-C. F. (2017). Learning Deep Latent Spaces for Multi-Label Classification. ArXiv:1707.00418 [Cs]. http://arxiv.org/abs/1707.00418 Yu, H.-F., Jain, P., Kar, P., & Dhillon, I. S. (n.d.). Large-scale Multi-label Learning with Missing Labels. 9. Zhang, J., Wu, Q., Shen, C., Zhang, J., & Lu, J. (2017). Multi-Label Image Classification with Regional Latent Semantic Dependencies. ArXiv:1612.01082 [Cs]. http://arxiv.org/abs/1612.01082 Zhang, W., Yan, J., Wang, X., & Zha, H. (2017). Deep Extreme Multi-label Learning. ArXiv:1704.03718 [Cs]. http://arxiv.org/abs/1704.03718 Zhang, Y., & Yang, Q. (2018). A Survey on Multi-Task Learning. ArXiv:1707.08114 [Cs]. http://arxiv.org/abs/1707.08114

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0723120-225123.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS