國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,公司經營疑慮之評估與預測：一套植基於結構與非結構資料的模式,The Evaluation and Prediction of the Going-Concern Status for Companies: A Model Based on Structured and Un-Structured Data

論文名稱 Title	公司經營疑慮之評估與預測：一套植基於結構與非結構資料的模式 The Evaluation and Prediction of the Going-Concern Status for Companies: A Model Based on Structured and Un-Structured Data
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	105 學年度第 1 學期 The fall semester of Academic Year 105	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	187
研究生 Author	許育峰 Yu-Feng Hsu
指導教授 Advisor	李偉柏, 鄭炳強 Wei-Po Lee; Bing-Chiang Jeng
召集委員 Convenor	陳嘉玫 Chia-Mei Chen
口試委員 Advisory Committee	王萬成, 林耕霈 Wann-cherng Wang; Keng-Pei Lin
口試日期 Date of Exam	2017-01-06	繳交日期 Date of Submission	2017-01-24
關鍵字 Keywords	持續經營預測、集成方法、隨機森林、文字探勘、財務新聞 Going-concern prediction, Ensemble framework, Text mining, Random forest, Financial news articles
統計 Statistics	本論文已被瀏覽 6115 次，被下載 52 次 The thesis/dissertation has been browsed 6115 times, has been downloaded 52 times.

中文摘要
確認公司是否能持續經營對於投資者和股東來說是一個重要的議題。在會計和審計領域，持續經營是一個眾所周知的概念，其用來衡量公司是否有足夠的資源得以永續經驗。然而，在當今複雜的商業環境難以評估公司的財務狀況。為了改善這個問題，有一些研究人員提出了新的方法來協助審計過程。大多數這些研究都是提出單一模型，並應用從財務報表收集的數據來驗證其方法。然而，仍有改進的空間，例如缺乏靈活性，普遍性和時間效率。為了解決這些問題，在本研究中，我們引進一個稱為集成方法的框架，並採用財務新聞作為分析數據的來源。集成框架的特徵之一是，如果新的方法效能比較好的話，較差的方法可以很容易的被替換。此外，財務新聞是一個重要的訊息來源，特別是考量到初上市公司缺乏年度報告的問題。本研究應用文字探勘技術來取出隱藏在財務新聞中的訊息，並將文件內容轉換為可於實驗中使用的數字格式。在研究一中，應用隨機森林方法來實現集成方法的概念。由實驗結果可得知，隨機森林方法在準確率，ROC面積，Kappa值，型II誤差，精確度和回憶率方面皆優於基準方法。此外，在研究二中獲得的實驗結果顯示，文字探勘技術對持續經營之預測表現良好。財務新聞是一個有用的參考來源，過去尚未有研究應用其於分析非新上市公司或是新上市公司在發布年度報告之前的持續經營狀況。
Abstract
Ascertainment of the going-concern status of a company is a critical issue for investors and stockholders. In the Accounting and Auditing domain, the going-concern is a well-known concept used to measure whether a company has the resources to operate indefinitely or not. However, it is difficult to evaluate a company’s financial condition in today’s complicated business environment. To make this easier, some researchers have proposed new methods to assist in the auditing process. The majority of these studies have proposed single models, applying numerical data gathered from financial statements to verify their methodology. However, shortcomings remain such as a lack of flexibility, generalizability and time efficiency. In order to address these issues, in this study, we introduce a framework called the ensemble method and adopt financial news as source of data source. One of the characteristics of the ensemble framework is that a weaker algorithm can be easily replaced by another if it is better. In addition, financial news is an important source of information, especially given the issue of the lack of annual reports for a new to market company. Text mining techniques are applied to capture messages hidden in financial news, and convert the textual data to a numerical format for implementation in the experiments. In study one, the random forest method is applied to implement the concepts of the ensemble method. The experimental results show that the random forest method outperforms the baseline methods in terms of accuracy rate, ROC area, kappa value, type II error, precision and recall rate. In addition, the experimental results obtained in study two reflect that text mining techniques perform well for going-concern prediction. Financial news is a useful data source for analyzing the going-concern status of a company before the issue of an annual report or for a new to market company, where such reports do not yet exist.

目次 Table of Contents
論文審定書 i 中文摘要 ii ABSTRACT iii Chapter 1 Introduction 1 1.1 Research background 1 1.2 Research motivation 1 1.3 Research objectives 6 Chapter 2 Literature Review 8 2.1 Going-concern prediction literature 8 2.2 Literature comparison 11 Chapter 3 Ensemble Framework based on Structured Data 15 3.1 Datasets 15 3.2 Variables 16 3.3 Ensemble method 19 3.4 Random forest 20 3.4.1 Decision Tree 22 3.5 Prediction methods 24 3.5.1 Random forest 24 3.5.2 Baseline methods 24 3.5.3 Evaluation method 26 3.5.3.1 Prediction accuracy and Type II error rate 27 3.5.3.2 Kappa statistic value 28 3.5.3.3 Receiver Operating Characteristic Curve 29 3.5.3.4 Precision and recall rates 31 3.5.3.5 F-measure 31 3.6 Imbalanced data 32 Chapter 4 Ensemble Framework based on Un-structured Data 34 4.1 Financial news articles 35 4.2 Text mining 37 4.3 Textual representation 44 4.3.1 Bag-of-words Method 44 4.3.2 Term weighting 45 4.3.3 Latent Topic Analysis 48 4.4 Clustering algorithms 51 4.4.1 K-nearest neighbor 51 4.4.2 K-means 53 4.4.3 Self-organizing map 54 4.5 Configuration of the un-structured data framework 56 4.5.1 Data source 56 4.5.2 Model Construction and Evaluation Methods 57 Chapter 5 Performance Evaluation of the Structured Model 63 5.1 Experimental results for pre-subprime mortgage crisis 63 5.1.1 Experimental results for the original dataset 64 5.1.2 Performance comparison using datasets with different proportion 68 5.2 Experimental results for the post-subprime mortgage crisis 75 5.2.1 Experimental results for the original dataset 75 5.2.2 Performance comparison using datasets with different proportions 81 5.3 Experimental results for the full dataset 87 5.3.1 Experiment results with the original dataset 88 5.3.2 Performance comparison on datasets with different proportions 91 5.4 Discussion of Type II error risk and summary 97 5.5 Verification of generalizability on a Taiwan dataset 100 5.6 Performance evaluation from the viewpoint of a time series 101 5.6.1 Performance comparison based on a one pair one combination 103 5.6.2 Performance comparison based on the two pair one combination 108 Chapter 6 Performance Evaluation of the Un-structured Model 115 6.1 Experimental results for scenario one 116 6.1.1 Prediction performance of the TFIDF method 116 6.1.2 Prediction performance of LDA method 125 6.1.3 Performance comparison of the two methods 132 6.1.4 Prediction performance of the TFIDF method after feature extraction 138 6.2 Experimental results of scenario two 145 6.2.1 Prediction performance of the LDA method 145 Chapter 7 Conclusions 161 References 167

參考文獻 References
[1] AICPA, The auditor's consideration of an entity's ability to continue as a going concern, Statement on Auditing Standards No. 59 (1988). [2] J.C. McKeown, et al., Towards an Explanation of Auditor Failure to Modify the Audit Opinions of Bankrupt Companies, Auditing, 10 (1991) 1-13. [3] M.A. Geiger, K. Raghunandan, Auditor tenure and audit reporting failures, Auditing: A Journal of Practice & Theory, 21 (2002) 67-78. [4] K.C.W. Chen, B.K. Church, Default on debt obligations and the Issuance of going-concern opinions, Auditing: A Journal of Practice & Theory, 11 (1992) 30-49. [5] N. Dopuch, et al., Predicting audit qualifications with financial and market variables, The Accounting Review, 62 (1987). [6] H. Chye Koh, R. Moren Brown, Probit prediction of going and non-going concerns, Managerial Auditing Journal, 6 (1991). [7] E.B. Deakin, A discriminant analysis of predictors of business failure, Journal of Accounting Research, (1972) 167-179. [8] M. Anandarajan, A. Anandarajan, A comparison of machine learning techniques with a qualitative response model for auditor's going concern reporting, Expert Systems with Applications, 16 (1999) 385-392. [9] B. Efron, R.J. Tibshirani, An introduction to the bootstrap, CRC press, 1994. [10] C. Gaganis, et al., Probabilistic neural networks for the identification of qualified audit opinions, Expert Systems with Applications, 32 (2007) 114-124. [11] H.C. Koh, C.K. Low, Going concern prediction using data mining techniques, Managerial Auditing Journal, 19 (2004) 462-476. [12] M.J. Lenard, et al., The application of neural networks and a qualitative response model to the auditor's going concern uncertainty decision, Decision Sciences, 26 (1995) 209-227. [13] D. Martens, et al., Predicting going concern opinion with data mining, Decision Support Systems, 45 (2008) 765-777. [14] H. Tin Kam, et al., Decision combination in multiple classifier systems, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16 (1994) 66-75. [15] J. Kittler, et al., On combining classifiers, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20 (1998) 226-239. [16] B.L. Happel, J.M. Murre, Design and evolution of modular neural network architectures, Neural Networks, 7 (1994) 985-1004. [17] A.J. Sharkey, Combining artificial neural nets: ensemble and modular multi-net systems, Springer Science & Business Media, 2012. [18] R.A. Jacobs, et al., Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks, Cognitive Science, 15 (1991) 219-250. [19] P.D. Turney, M.L. Littman, Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21 (2003) 315-346. [20] T. Nasukawa, T. Nagano, Text analysis and knowledge mining system, IBM Systems Journal, 40 (2001) 967-984. [21] S. Morinaga, et al., Mining product reputations on the web, in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2002, pp. 341-349. [22] S.R. Das, M.Y. Chen, Yahoo! for Amazon: Sentiment extraction from small talk on the web, Management Science, 53 (2007) 1375-1388. [23] R.P. Schumaker, H. Chen, Textual analysis of stock market prediction using breaking financial news: The AZFin text system, ACM Transactions on Information Systems (TOIS), 27 (2009) 12. [24] M.-A. Mittermayer, Forecasting intraday stock price trends with text mining techniques, in: System Sciences, Proceedings of the 37th Annual Hawaii International Conference on, IEEE, 2004, pp. 10. [25] J.F. Mutchler, A multivariate analysis of the auditor's going-concern opinion decision, Journal of Accounting Research, 23 (1985) 668-682. [26] H.C. Koh, The sensitivity of optimal cutoff points to misclassification costs of type I and type II errors in the going-concern prediction context, Journal of Business Finance & Accounting, 19 (1992) 187-197. [27] M.J. Lenard, et al., An analysis of fuzzy clustering and a hybrid model for the auditor’s going concern assessment, Decision Sciences 31 (2000) 861-884. [28] M. Salehi, F.Z. Fard, Data mining approach to prediction of going concern using classification and regression tree (CART), Global Journal of Management And Business Research, 13 (2013). [29] C.-C. Yeh, et al., Going-concern prediction using hybrid random forests and rough set approach, Information Sciences, 254 (2014) 98-110. [30] Y.-J.J. Goo, et al., Improving the prediction of going concern of Taiwanese listed companies using a hybrid of LASSO with data mining techniques, SpringerPlus, 5 (2016) 1-18. [31] J.V. Carcello, T.L. Neal, Audit committee composition and auditor reporting, The Accounting Review, 75 (2000) 453-467. [32] B.P. Foster, et al., An analysis of the usefulness of debt defaults and going concern opinions in bankruptcy risk assessment, Journal of Accounting, Auditing & Finance, 13 (1998) 351-371. [33] J.F. Mutchler, et al., The influence of contrary information and mitigating factors on audit opinion decisions on bankrupt companies, Journal of Accounting Research, 35 (1997) 295-310. [34] D. Cormier, et al., The auditor's consideration of the going concern assumption: A diagnostic model, Journal of Accounting, Auditing & Finance, 10 (1995) 201-222. [35] P. Barnes, H.D. HUAN, The auditor's going concern decision: some UK evidence concerning independence and competence, Journal of Business Finance & Accounting, 20 (1993) 213-228. [36] T.B. Bell, R.H. Tabor, Empirical analysis of audit uncertainty qualifications, Journal of Accounting Research, 29 (1991) 350-370. [37] J.C. McKeown, et al., Towards an explanation of auditor failure to modify the audit opinions of bankrupt companies, Auditing: A Journal of Practice & Theory, 10 (1991) 1-13. [38] T. Kida, An investigation into auditors' continuity and related qualification judgments, Journal of Accounting Research, (1980) 506-523. [39] K. Menon, K.B. Schwartz, An empirical investigation of audit qualification decisions in the presence of going concem uncertainties, Contemporary Accounting Research, 3 (1987) 302-315. [40] B.K. Behn, et al., Further evidence on the auditor's going-concern report: the influence of management plans, Auditing: A Journal of Practice & Theory, 20 (2001) 13-28. [41] J.F. Mutchler, Empirical evidence regarding the auditor's going-concern opinion decision, Auditing: A Journal of Practice & Theory, 6 (1986) 148-163. [42] M.J. Lenard, et al., Decision-making capabilities of a hybrid system applied to the auditor’s going-concern assessment, International Journal of Intelligent Systems in Accounting, Finance & Management, 10 (2001) 1-24. [43] A. Gaeremynck, M. Willekens, The endogenous relationship between audit-report type and business termination: evidence on private firms in a non-litigious environment, Accounting and Business Research, 33 (2003) 65-80. [44] M.A. Geiger, K. Raghunandan, Bankruptcies, audit reports, and the reform act, Auditing: A Journal of Practice & Theory, 20 (2001) 187-195. [45] M.A. Geiger, D.V. Rama, Audit fees, nonaudit fees, and auditor reporting on stressed companies, Auditing: A Journal of Practice & Theory, 22 (2003) 53-69. [46] M.A. Geiger, et al., Recent changes in the association between bankruptcies and prior audit opinions, Auditing: A Journal of Practice & Theory, 24 (2005) 21-35. [47] C.-Y. Lim, H.-T. Tan, Non-audit service fees and audit quality: the impact of auditor specialization, Journal of Accounting Research, 46 (2008) 199-246. [48] G.V. Krishnan, P. Sengupta, How do auditors perceive recognized vs. disclosed lease and pension obligations? evidence from fees and going?concern opinions, International Journal of Auditing, 15 (2011) 127-149. [49] W. Jiang, et al., Internal control deficiencies and the issuance of going concern opinions, Research in Accounting Regulation, 22 (2010) 40-46. [50] K. Raghunandan, D.V. Rama, Audit reports for companies in financial distress: before and after SAS No. 59, Auditing: A Journal of Practice & Theory, 14 (1995) 50-63. [51] C. Li, Does client importance affect auditor independence at the of?ce level? empirical evidence from going-concern opinions, Contemporary Accounting Research, 26 (2009) 201-230. [52] P. Carey, R. Simnett, Audit partner tenure and audit quality, The Accounting Review, 81 (2006) 653-676. [53] K.J. Reichelt, D. Wang, National and of?ce-speci?c measures of auditor industry expertise and effects on audit quality, Journal of Accounting Research, 48 (2010) 647-686. [54] P. Ye, et al., Threats to auditor independence: the impact of relationship and economic bonds, Auditing: A Journal of Practice & Theory, 30 (2011) 121-148. [55] M.L. DeFond, et al., Do Non–Audit Service Fees Impair Auditor Independence? Evidence from Going Concern Audit Opinions, Journal of Accounting Research, 40 (2002) 1247-1274. [56] J. Surowiecki, The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business, 2004. [57] D. Frosyniotis, et al., A divide-and-conquer method for multi-net classifiers, Pattern Analysis & Applications, 6 (2003) 32-40. [58] E. Kim, et al., Combination of multiple classifiers for the customer's purchase behavior prediction, Decision Support Systems, 34 (2003) 167-175. [59] D. West, et al., Neural network ensemble strategies for financial decision applications, Computers & Operations Research, 32 (2005) 2543-2559. [60] G.Y. Chen, B. Kegl, Invariant pattern recognition using contourlets and AdaBoost, Pattern Recognition, 43 (2010) 579-583. [61] J. Sun, et al., Adaboost ensemble for financial distress prediction: an empirical comparison with data from chinese listed companies, Expert Systems with Applications, 38 (2011) 9305-9312. [62] L. Breiman, Random forests, Machine Learning, 45 (2001) 5-32. [63] D.R. Cutler, et al., Random forests for classification in ecology, Ecology, 88 (2007) 2783-2792. [64] V.F. Rodriguez-Galiano, et al., An assessment of the effectiveness of a random forest classifier for land-cover classification, ISPRS Journal of Photogrammetry and Remote Sensing, 67 (2012) 93-104. [65] Y. Sakiyama, et al., Predicting human liver microsomal stability with machine learning techniques, Journal of Molecular Graphics and Modelling, 26 (2008) 907-915. [66] J.C.-W. Chan, D. Paelinckx, Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery, Remote Sensing of Environment, 112 (2008) 2999-3011. [67] R. Diaz-Uriarte, Genesrf and varseirf: a web-based tool and r package for gene selection and classification using random forest, BMC bioinformatics, 8 (2007) 328. [68] J. Lundstrom, A. Verikas, Assessing print quality by machine in offset colour printing, Knowledge-Based Systems, 37 (2013) 70-79. [69] C.-C. Yeh, et al., A hybrid kmv model, random forests and rough set theory approach for credit rating, Knowledge-Based Systems, 33 (2012) 166-172. [70] C. Catal, B. Diri, Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem, Information Sciences, 179 (2009) 1040-1058. [71] R. Genuer, et al., Variable selection using random forests, Pattern Recognition Letters, 31 (2010) 2225-2236. [72] L. Breiman, et al., Classification and regression trees, Chapman & Hall, 1993. [73] T.M. Mitchell, Machine learning. 1997, Burr Ridge, IL: McGraw Hill, 45 (1997) 37. [74] J.R. Quinlan, C4. 5: programs for machine learning, Elsevier, 2014. [75] W. Feller, An introduction to probability theory and its applications, 3 ed., John Wiley & Sons, 1968. [76] L. Breiman, Bagging predictors, Machine Learning, 24 (1996) 123-140. [77] C.-F. Tsai, Y.-F. Hsu, A meta-learning framework for bankruptcy prediction, Journal of Forecasting, 32 (2013) 167-179. [78] H. Zhou, et al., Modeling no x emissions from coal-fired utility boilers using support vector regression with ant colony optimization, Engineering Applications of Artificial Intelligence, 25 (2012) 147-158. [79] R.A. Monserud, R. Leemans, Comparing global vegetation maps with the Kappa statistic, Ecological Modelling, 62 (1992) 275-293. [80] S.-H. Wu, et al., On generalizable low false-positive learning using asymmetric support vector machines, IEEE Transactions on Knowledge and Data Engineering, 25 (2013) 1083-1096. [81] Y. Sun, et al., Classification of imbalanced data: a review, International Journal of Pattern Recognition and Artificial Intelligence, 23 (2009) 687-719. [82] H. He, E.A. Garcia, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21 (2009) 1263-1284. [83] M. Galar, et al., A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42 (2012) 463-484. [84] N.V. Chawla, et al., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16 (2002) 321-357. [85] G.E. Batista, et al., A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6 (2004) 20-29. [86] S. Barua, et al., MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning, IEEE Transactions on Knowledge and Data Engineering, 26 (2014) 405-425. [87] V. Lopez, et al., An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics, Information Sciences, 250 (2013) 113-141. [88] B. Das, et al., Handling class overlap and imbalance to detect prompt situations in smart homes, in: 13th International Conference on Data Mining Workshops, IEEE, 2013, pp. 266-273. [89] H.-M. Lu, et al., Financial text mining: supporting decision making using web 2.0 content, IEEE Intelligent Systems, 25 (2010). [90] A. Mowshowitz, On the market value of information commodities. I. the nature of information and information commodities, Journal of the American Society for Information Science, 43 (1992) 225-232. [91] D.R. Raban, S. Rafaeli, The effect of source nature and status on the subjective value of information, Journal of the American Society for Information Science and Technology, 57 (2006) 321-329. [92] W.S. Chan, Stock price reaction to news and no-news: drift and reversal after headlines, Journal of Financial Economics, 70 (2003) 223-260. [93] P.C. Tetlock, Giving content to investor sentiment: the role of media in the stock market, The Journal of Finance, 62 (2007) 1139-1168. [94] C. Dougal, et al., Journalists and the stock market, Review of Financial Studies, 25 (2012) 639-679. [95] B. Wuthrich, et al., Daily prediction of major stock indices from textual www data, HKIE Transactions, 5 (1998) 151-156. [96] V. Lavrenko, et al., Mining of concurrent text and time series, in: KDD-2000 Workshop on Text Mining, 2000, pp. 37-44. [97] G.P.C. Fung, et al., News sensitive stock trend prediction, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2002, pp. 481-493. [98] W. Zhang, S. Skiena, Trading strategies to exploit blog and news sentiment, in: ICWSM, 2010. [99] G. Gidofalvi, C. Elkan, Using news articles to predict stock price movements, in: Department of Computer Science and Engineering, University of California, San Diego, 2001. [100] V. Lavrenko, et al., Language models for financial news recommendation, in: Proceedings of the Ninth International Conference on Information and Knowledge Management, ACM, McLean, Virginia, USA, 2000, pp. 389-396. [101] M.A. Mittermayer, Forecasting intraday stock price trends with text mining techniques, in: System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on, 2004, pp. 10 pp. [102] B. Wuthrich, et al., Daily stock market forecast from textual web data, in: Systems, Man, and Cybernetics. IEEE International Conference on, 1998, pp. 2720-2725. [103] M.A. Hearst, Untangling text data mining, in: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Association for Computational Linguistics, College Park, Maryland, 1999, pp. 3-10. [104] M.F. Porter, An algorithm for suffix stripping, Program, 14 (1980) 130-137. [105] W.B. Frakes, R. Baeza-Yates, Information retrieval: data structures and algorithms, 1992. [106] S.W. Chan, M.W. Chong, Sentiment analysis in financial texts, Decision Support Systems, (2016). [107] H. Gunduz, Z. Cataltepe, Borsa Istanbul (BIST) daily prediction using financial news and balanced feature selection, Expert Systems with Applications, 42 (2015) 9001-9011. [108] M. Hagenau, et al., Automated news reading: stock price prediction based on financial news using context-capturing features, Decision Support Systems, 55 (2013) 685-697. [109] R.P. Schumaker, et al., Evaluating sentiment in financial news articles, Decision Support Systems, 53 (2012) 458-464. [110] S.W. Chan, J. Franklin, A text-based decision support system for financial sequence prediction, Decision Support Systems, 52 (2011) 189-198. [111] R.P. Schumaker, H. Chen, A quantitative stock prediction system based on financial news, Information Processing & Management, 45 (2009) 571-583. [112] K.C. Lan, et al., FNDS: a dialogue-based system for accessing digested financial news, Journal of Systems and Software, 78 (2005) 180-193. [113] T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: 10th European Conference on Machine Learning, Springer Berlin Heidelberg, Chemnitz, Germany, 1998, pp. 137-142. [114] T. Joachims, A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization, in, DTIC Document, 1996. [115] A. Rajaraman, J.D. Ullman, Mining of massive datasets, Cambridge University Press, Cambridge, 2011. [116] T. Hofmann, Unsupervised learning by probabilistic latent semantic analysis, Machine Learning, 42 (2001) 177-196. [117] D.M. Blei, et al., Latent dirichlet allocation, Journal of Machine Learning Research, 3 (2003) 993-1022. [118] L. Pollock, et al., Natural language-based software analyses and tools for software maintenance, in: Software Engineering, Springer, 2013, pp. 94-125. [119] I. Biro, et al., Linked latent dirichlet allocation in web spam filtering, in: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, ACM, 2009, pp. 37-40. [120] D. Xing, M. Girolami, Employing latent dirichlet allocation for fraud detection in telecommunications, Pattern Recognition Letters, 28 (2007) 1727-1734. [121] T.L. Griffiths, M. Steyvers, Finding scientific topics, Proceedings of the National Academy of Sciences, 101 (2004) 5228-5235. [122] R. Krestel, et al., Latent dirichlet allocation for tag recommendation, in: Proceedings of the third ACM Conference on Recommender Systems, ACM, 2009, pp. 61-68. [123] H. Misra, et al., Text segmentation: a topic modeling perspective, Information Processing & Management, 47 (2011) 528-544. [124] B. Masand, et al., Classifying news stories using memory based reasoning, in: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Copenhagen, Denmark, 1992, pp. 59-65. [125] Y. Yang, Expert network: effective and efficient learning from human decisions in text categorization and retrieval, in: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag New York, Inc., Dublin, Ireland, 1994, pp. 13-22. [126] M. Iwayama, T. Tokunaga, Cluster-based text categorization: a comparison of category search strategies, in: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Seattle, Washington, USA, 1995, pp. 273-280. [127] L.S. Larkey, W.B. Croft, Combining classifiers in text categorization, in: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Zurich, Switzerland, 1996, pp. 289-297. [128] O.-W. Kwon, et al., Evaluation of category features and text structural information on a text categorization using memory based reasoning, in: Proceedings of the 18th International Conference on Computer Processing of Oriental Languages, 1999, pp. 153-158. [129] Y. Yang, X. Liu, A re-examination of text categorization methods, in: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Berkeley, California, USA, 1999, pp. 42-49. [130] C.M. Bishop, Neural networks for pattern recognition, Oxford university press, 1995. [131] H.-S. Lim, A comparative evaluation of korean text categorization based on k-NN learning, in: Proceeding of the International Conference on Artificial Intelligence, 2002, pp. 755-759. [132] P. Soucy, G.W. Mineau, A simple KNN algorithm for text categorization, in: Data Mining, Proceedings IEEE International Conference on, 2001, pp. 647-648. [133] W. Lam, Y. Han, Automatic textual document categorization based on generalized instance sets and a metamodel, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25 (2003) 628-633. [134] E.-H. Han, et al., Text categorization using weight adjusted k-nearest neighbor classification, in: Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2001, pp. 53-65. [135] A.K. Jain, et al., Statistical pattern recognition: a review, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22 (2000) 4-37. [136] J. Bazan, et al., On the evolution of rough set exploration system, in: Rough Sets and Current Trends in Computing, Springer Berlin Heidelberg, 2004, pp. 592-601. [137] J.A. Hartigan, M.A. Wong, Algorithm as 136: a k-means clustering algorithm, Journal of the Royal Statistical Society: Series C (Applied Statistics), 28 (1979) 100. [138] M. Berg, et al., Voronoi diagrams, in: Computational Geometry, Springer Berlin Heidelberg, 1997, pp. 145-161. [139] C.-F. Tsai, et al., A comparative study of hybrid machine learning techniques for customer lifetime value prediction, Kybernetes, 42 (2013) 357-370. [140] A. Hatamlou, et al., A combined approach for clustering based on k-means and gravitational search algorithms, Swarm and Evolutionary Computation, 6 (2012) 47-52. [141] L. Bai, et al., Fast global k-means clustering based on local geometrical information, Information Sciences, 245 (2013) 168-180. [142] K. Liao, et al., A sample-based hierarchical adaptive k-means clustering method for large-scale video retrieval, Knowledge-Based Systems, 49 (2013) 123-133. [143] F. An, H.J. Mattausch, K-means clustering algorithm for multimedia applications with flexible HW/SW co-design, Journal of Systems Architecture, 59 (2013) 155-164. [144] T. Kohonen, Adaptive, associative, and self-organizing functions in neural computing, Applied Optics, 26 (1987) 4910-4918. [145] C.-F. Tsai, et al., Discovering stock trading preferences by self-organizing maps and decision trees, International Journal on Artificial Intelligence Tools, 18 (2009) 603-611. [146] S.-T. Li, S.-C. Kuo, Knowledge discovery in financial investment for forecasting and trading strategy through wavelet-based SOM networks, Expert Systems with Applications, 34 (2008) 935-951. [147] T. Eklund, et al., Assessing the feasibility of self-organizing maps for data mining financial information, in: ECIS 2002 Proceedings, 2002.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0023117-230904.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS