Title page for etd-0801114-040509


[Back to Results | New Search]

URN etd-0801114-040509
Author Feng-Quan Li
Author's Email Address No Public.
Statistics This thesis had been viewed 5564 times. Download 30 times.
Department Electrical Engineering
Year 2013
Semester 2
Degree Master
Type of Document
Language English
Title A Study on Multi-Agent Cooperation by Shaped-Q Learning
Date of Defense 2014-07-25
Page Count 60
Keyword
  • Reinforcement learning
  • Multi-agent system
  • Cooperation
  • Abstract In this thesis, we primarily discuss the problem about multi-agent can cooperate together in no communication environment. That is to say, each agent is unable to communicate with each other in the environment. For this reason, agents cannot reach a consensus with each other by coordinating in the process of learning cooperation. We proposed the concept that in order to cooperate with each other, each agent uses own past experiences to speculate about other agent's actions before making a decision. By using this concept, it could let agents reach to cooperate with each other and successfully complete the tasks in no communication environment.
    Reinforcement learning is a trial-and-error method. In other words, agents can learn how to achieve the goal by reinforcement learning. When agents have to decide to take an action, they fail to reach a consensus with each other in no communication environment, this situation will cause stagnation. Therefore, it is an important issue that how to design a policy which can reduce the occurrence of stagnation and enhance learning efficiency in no communication environment.
    In the process of learning cooperation, this thesis proposes a method that each agent creates Cooperative Tendency Table (CTT) for the purpose of recording the cooperative tendency value of each action; moreover, CTT will be updated in learning process. We use this policy, cooperative tendency value of each action multiplied by q-value of each action is Shaped-Q of each action, to determine the action to be taken at present. Therefore, agents could use this method to quickly reach a consensus with each other in order to enhance learning efficiency and reduce the occurrence of stagnation.
    In addition, not only memory requirement in the proposed method is less than Win or Learning Fast Policy hill-climbing (WoLF-PHC) but also performance is better than WoLF-PHC. In other words, this proposed method could let agents use less memory and complete the task more efficiently in no communication environment. The research results are presented by the video at YouTube: http://youtu.be/CFS-KzOtMOg
    Advisory Committee
  • Jin-Ling Lin - chair
  • Ming-Yi Ju - co-chair
  • Yu-Jen Chen - co-chair
  • Kao-Shing Huang - advisor
  • Files
  • etd-0801114-040509.pdf
  • Indicate in-campus at 5 year and off-campus access at 5 year.
    Date of Submission 2014-09-03

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys