Title page for etd-1117115-161646


[Back to Results | New Search]

URN etd-1117115-161646
Author Cing-lin Han
Author's Email Address No Public.
Statistics This thesis had been viewed 5351 times. Download 0 times.
Department Electrical Engineering
Year 2015
Semester 2
Degree Master
Type of Document
Language zh-TW.Big5 Chinese
Title A Design of Multimedia Search System
Date of Defense 2016-07-27
Page Count 88
Keyword
  • Color model
  • Image recognition system
  • Local binary pattern
  • Gaussian mixture model
  • Speaker recognition system
  • Mel-frequency cepstral coefficients
  • Abstract Multimedia plays an important role in our life. It communicates messages and ideas using a combination of text, audio, image, animation and video, and delivers easily understandable and much more fascinating materials than pure text. As multimedia and internet technologies are advanced, the applications of multimedia to the product presentation and education training are getting popular. Multimedia education makes effective interaction, active engagement and convenient learning plausible, and promote education to a higher level. 
      In this thesis, a multimedia searching system of image and speech is developed. Both mobile phone and camera are used for painting image capture, and microphone is used for speech recording. The system will respond the complete information about the painting or the speaker after correct image or speech recognition.
      For painting recognition, the image from mobile phone or camera is first preprocessed, segmented and scale normalized. Then, YCbCr color model, projection features and local binary pattern are utilized for feature extraction. The nearest similarity between the testing and the training image is calculated to obtain the final answer. Using mobile phone and camera as the testing image capture devices, recognition rates of 92.06% and 90.07% can be reached respectively for the 20,160 paintings' system under the Intel Core i5 2.6 GHz laptop and Windows 7 operating system environment.
      For speaker recognition, Mel frequency cepstrum coefficients are applied as the feature parameters, and a Gaussian mixture model for each speaker is established using 40 second training material. Using 10 second testing material at different times and locations, a correct speaker recognition rate of 83.4% can be obtained for the 2,000 speakers' system under the Intel Core i5-3337U 1.8 GHz personal computer and Ubuntu 14.04 operating system environment.
    Advisory Committee
  • Sheau-Shong Bor - chair
  • Chii-Maw Wang - co-chair
  • Tsung Lee - advisor
  • Chih-Chien Thomas Chen - advisor
  • Files
  • etd-1117115-161646.pdf
  • Indicate in-campus at 5 year and off-campus access at 5 year.
    Date of Submission 2016-07-11

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys