Abstract |
In this thesis, we primarily discuss the problem about multi-agent can cooperate together in no communication environment. That is to say, each agent is unable to communicate with each other in the environment. For this reason, agents cannot reach a consensus with each other by coordinating in the process of learning cooperation. We proposed the concept that in order to cooperate with each other, each agent uses own past experiences to speculate about other agent's actions before making a decision. By using this concept, it could let agents reach to cooperate with each other and successfully complete the tasks in no communication environment. Reinforcement learning is a trial-and-error method. In other words, agents can learn how to achieve the goal by reinforcement learning. When agents have to decide to take an action, they fail to reach a consensus with each other in no communication environment, this situation will cause stagnation. Therefore, it is an important issue that how to design a policy which can reduce the occurrence of stagnation and enhance learning efficiency in no communication environment. In the process of learning cooperation, this thesis proposes a method that each agent creates Cooperative Tendency Table (CTT) for the purpose of recording the cooperative tendency value of each action; moreover, CTT will be updated in learning process. We use this policy, cooperative tendency value of each action multiplied by q-value of each action is Shaped-Q of each action, to determine the action to be taken at present. Therefore, agents could use this method to quickly reach a consensus with each other in order to enhance learning efficiency and reduce the occurrence of stagnation. In addition, not only memory requirement in the proposed method is less than Win or Learning Fast Policy hill-climbing (WoLF-PHC) but also performance is better than WoLF-PHC. In other words, this proposed method could let agents use less memory and complete the task more efficiently in no communication environment. The research results are presented by the video at YouTube: http://youtu.be/CFS-KzOtMOg |