Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
8
pubmed:dateCreated
2009-10-15
pubmed:abstractText
Direct reciprocity is a chief mechanism of mutual cooperation in social dilemma. Agents cooperate if future interactions with the same opponents are highly likely. Direct reciprocity has been explored mostly by evolutionary game theory based on natural selection. Our daily experience tells, however, that real social agents including humans learn to cooperate based on experience. In this paper, we analyze a reinforcement learning model called temporal difference learning and study its performance in the iterated Prisoner's Dilemma game. Temporal difference learning is unique among a variety of learning models in that it inherently aims at increasing future payoffs, not immediate ones. It also has a neural basis. We analytically and numerically show that learners with only two internal states properly learn to cooperate with retaliatory players and to defect against unconditional cooperators and defectors. Four-state learners are more capable of achieving a high payoff against various opponents. Moreover, we numerically show that four-state learners can learn to establish mutual cooperation for sufficiently small learning rates.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:month
Nov
pubmed:issn
1522-9602
pubmed:author
pubmed:issnType
Electronic
pubmed:volume
71
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
1818-50
pubmed:meshHeading
pubmed:year
2009
pubmed:articleTitle
A theoretical analysis of temporal difference learning in the iterated prisoner's dilemma game.
pubmed:affiliation
Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo, 113-8656, Japan. masuda@mist.i.u-tokyo.ac.jp
pubmed:publicationType
Journal Article, Research Support, Non-U.S. Gov't