Title page for etd-0223117-131536


[Back to Results | New Search]

URN etd-0223117-131536
Author Kun-da Lin
Author's Email Address darrendarren1231@gmail.com
Statistics This thesis had been viewed 5355 times. Download 7 times.
Department Electrical Engineering
Year 2016
Semester 2
Degree Master
Type of Document
Language zh-TW.Big5 Chinese
Title Deep Reinforcement Learning with a Gating Network
Date of Defense 2017-03-10
Page Count 63
Keyword
  • Reinforcement Learning
  • Deep Reinforcement Learning
  • Deep Learning
  • Gating network
  • Neural network
  • Abstract Reinforcement Learning (RL) is a good way to train the robot since it doesn't need an exact model of the environment. All need is to let a learning agent interact with the en-vironment by an appropriate reward function, which is associated with the goal of the task that the agent is expected to accomplish. Unfortunately, it’s hard to learn a diffi-cult reward function for a complicated problem, such as a soccer player in the game where the goal of scoring is not directly related to the mission or the role the player is asked to play by the coach. Besides, the tabular method for approximation of returns in RL is more suitable for an environment with less states. In a huge state space, RL methods always face the curse of dimensionality. To alleviate those difficulties, this paper proposed an algorithm – a deep reinforcement learning method regulated by and gating networks. By the merit of deep learning neural networks, even regarding pixels in an image as states, the latent features can be trained and implicitly extracted layer by layer from raw data. In the proposed method, a composed policy can be obtained by a gating network which regulates the outputs from several deep learning modules, each of which is trained for an individual policy. In this thesis, two video games, flappy bird, and ping-pong is adopted as the testbeds to examinate the performance of the proposed method. In the proposed architecture, each policy module of deep learning is trained by a simple reward functions first. By the gating networks, these simple policies can be composed into a more sophisticated one, so as to accommodate with more complicated tasks. This is akin to the divide-and-conquer strategy. The proposed architecture has two kinds of arrangements and structures. One is called the in-parallel gating network, and the other is called the in-serial network. From the outcomes, it’s observed that both can efficiently shorten the training time.
    Advisory Committee
  • Jin-Ling Lin - chair
  • Ming-Yi Ju - co-chair
  • Yu-Jen Chen - co-chair
  • Kao-Shing Hwang - advisor
  • Files
  • etd-0223117-131536.pdf
  • Indicate in-campus at 0 year and off-campus access at 3 year.
    Date of Submission 2017-03-27

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys