2019年1月29日火曜日

AlphaStar

AlphaStar



How AlphaStar is trained
AlphaStar’s behaviour is generated by a deep neural network that receives input data from the raw game interface (a list of units and their properties), and outputs a sequence of instructions that constitute an action within the game. More specifically, the neural network architecture applies a transformertorso to the units (similar to relational deep reinforcement learning), combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centralised value baseline. We believe that this advanced model will help with many other challenges in machine learning research that involve long-term sequence modelling and large output spaces such as translation, language modelling and visual representations.
AlphaStar also uses a novel multi-agent learning algorithm. The neural network was initially trained by supervised learning from anonymised human games released by Blizzard. This allowed AlphaStar to learn, by imitation, the basic micro and macro-strategies used by players on the StarCraft ladder. This initial agent defeated the built-in “Elite” level AI - around gold level for a human player - in 95% of games.
transformer
https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

 relational deep reinforcement learning
https://openreview.net/forum?id=HkxaFoC9KQ

deep LSTM core,
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf

auto-regressive policy head
https://arxiv.org/abs/1708.04782

pointer network
https://papers.nips.cc/paper/5866-pointer-networks.pdf

 centralised value baseline
https://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/foersteraaai18.pdf

0 件のコメント:

コメントを投稿

石田吉貞『隠者の文学』講談社学術文庫

 FIRE系の動画や書籍を読んでいる。また、年齢を上になるにつれ、自分と社会の距離感なども気になる。本書は、以前から読んで気にいったいた中野孝二『清貧の思想』と同様の思想の書籍である。 石田吉貞『隠者の文学』講談社学術文庫 隠者の文学―苦悶する美 (講談社学術文庫) 隠者と隠者...