Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
³ÌÐòÔ±´øÄãÒ»²½²½·ÖÎö AI ÈçºÎÍæ Flappy Bird
 
À´Ô´£ºÂëÅ©Íø ·¢²¼ÓÚ£º 2017-4-25
  3308  次浏览      31
 

ÒÔÏÂÄÚÈÝÀ´Ô´ÓÚÒ»´Î²¿ÃÅÄÚ²¿µÄ·ÖÏí£¬Ö÷ÒªÕë¶ÔAI³õѧÕߣ¬½éÉܰüÀ¨CNN¡¢Deep Q NetworkÒÔ¼°TensorFlowƽ̨µÈÄÚÈÝ¡£ÓÉÓÚ±ÊÕß²¢·ÇÉî¶ÈѧϰËã·¨Ñо¿Õߣ¬Òò´ËÒÔϸü¶à´ÓÓ¦ÓõĽǶȶÔÕû¸öϵͳ½øÐнéÉÜ£¬¶ø²»»á½øÐÐÏêϸµÄ¹«Ê½ÍƵ¼¡£

±¾ÎÄÖ÷Òª½éÉÜÈçºÎͨ¹ýAI£¨È˹¤ÖÇÄÜ£©µÄ·½Ê½ÍæFlappy BirdÓÎÏ·£¬·ÖΪÒÔÏÂËĸö²¿·ÖÄÚÈÝ£º

1.Flappy Bird ÓÎϷչʾ

2.Ä£ÐÍ£º¾í»ýÉñ¾­ÍøÂç

3.Ëã·¨£ºDeep Q Network

4.´úÂ룺TensorFlowʵÏÖ

Ò»¡¢Flappy Bird ÓÎϷչʾ

ÔÚ½éÉÜÄ£ÐÍ¡¢Ë㷨ǰÏÈÀ´Ö±½Ó¿´ÏÂЧ¹û£¬ÉÏͼÊǸտªÊ¼ÑµÁ·µÄʱºò£¬»­ÃæÖеÄСÄñ¾ÍÏñÎÞÍ·²ÔÓ¬Ò»ÑùÂÒ·É£¬ÏÂͼչʾµÄÊÇÔÚ±¾»ú£¨ºóÃæ»á¸ø³öÅäÖã©ÑµÁ·³¬¹ý10Сʱºó£¨ÑµÁ·²½Êý³¬¹ý2000000£©µÄÇé¿ö£¬Æä×îºÃ³É¼¨ÒѾ­³¬¹ý200·Ö£¬ÈËÀàÍæ¼ÒÒÑ»ù±¾²»¿ÉÄܳ¬Ô½¡£

ѵÁ·ÊýСÓÚ10000²½£¨¸Õ¿ªÊ¼ÑµÁ·£©

ѵÁ·²½Êý´óÓÚ2000000²½£¨10Сʱºó£©

ÓÉÓÚ±¾»úÅäÖÃÁËCUDAÒÔ¼°cuDNN£¬²ÉÓÃÁËNVIDIAµÄÏÔ¿¨½øÐв¢ÐмÆË㣬ËùÒÔÕâÀïÌáǰÌùÒ»ÏÂÔËÐÐʱµÄÈÕÖ¾Êä³ö¡£

¹ØÓÚCUDAÒÔ¼°cuDNNµÄÅäÖã¬ÆäÖÐÓÐһЩ¿Ó°üÀ¨£º°²×°CUDAÖ®ºóÑ­»·µÇ¼£¬ÆÁÄ»·Ö±æÂÊÎÞ·¨Õý³£µ÷½ÚµÈµÈ£¬¶¼

ÊÇÓÉÓÚNVIDIAÇý¶¯°²×°µÄÎÊÌ⣬Õâ²»ÊDZ¾ÎÄÒªÌÖÂÛµÄÖ÷ÒªÄÚÈÝ£¬¶ÁÕß¿É×ÔÐÐGoogle¡£

¼ÓÔØCUDAÔËËã¿â

TensorFlowÔËÐÐÉ豸 /gpu:0

/gpu:0 ÕâÊÇTensorFlowƽ̨ĬÈϵÄÅäÖ÷½·¨£¬±íʾʹÓÃϵͳÖеĵÚÒ»¿éÏÔ¿¨¡£

±¾»úÈíÓ²¼þÅäÖãº

ϵͳ£ºUbuntu 16.04

ÏÔ¿¨£ºNVIDIA GeForce GTX 745 4G

°æ±¾£ºTensorFlow 1.0

Èí¼þ°ü£ºOpenCV 3.2.0¡¢Pygame¡¢Numpy¡¢¡­

ϸÐĵÄÅóÓÑ¿ÉÄÜ·¢ÏÖ£¬±ÊÕßµÄÏÔ¿¨ÅäÖò¢²»¸ß£¬GeForce GTX 745£¬ÏÔ´æ3.94G£¬¿ÉÓÃ3.77G£¨×ÀÃæÕ¼ÓÃÁËÒ»²¿·Ö£©£¬ÊôÓÚÈëÃÅÖеÄÈëÃÅ¡£¶ÔÓÚרҵ×öÉî¶ÈѧϰËã·¨µÄÅóÓÑ£¬Õâ¸öÏÔ¿¨±ØÈ»ÊDz»¹»µÄ¡£ÖªºõÉÏÓÐÌû×ӽ̴ó¼ÒÔõôÅäÖøüרҵµÄÏÔ¿¨£¬ÓÐÐËȤµÄ¿ÉÒÔÒÆ²½¡£

¶þ¡¢Ä£ÐÍ£º¾í»ýÉñ¾­ÍøÂç

Éñ¾­ÍøÂçËã·¨ÊÇÓÉÖÚ¶àµÄÉñ¾­Ôª¿Éµ÷µÄÁ¬½ÓȨֵÁ¬½Ó¶ø³É£¬¾ßÓдó¹æÄ£²¢Ðд¦Àí¡¢·Ö²¼Ê½ÐÅÏ¢´æ´¢¡¢Á¼ºÃµÄ×Ô×éÖ¯×ÔѧϰÄÜÁ¦µÈÌØµã¡£È˹¤Éñ¾­ÔªÓëÉúÎïÉñ¾­Ôª½á¹¹ÀàËÆ£¬Æä½á¹¹¶Ô±ÈÈçÏÂͼËùʾ¡£

È˹¤Éñ¾­ÔªµÄÊäÈ루x1,x2¡­xm£©ÀàËÆÓÚÉúÎïÉñ¾­ÔªµÄÊ÷Í»£¬ÊäÈë¾­¹ý²»Í¬µÄȨֵ£¨wk1, wk2, ¡­.wkn£©£¬¼ÓÉÏÆ«Ö㬾­¹ý¼¤»îº¯ÊýµÃµ½Êä³ö£¬×îºó½«Êä³ö´«Êäµ½ÏÂÒ»²ãÉñ¾­Ôª½øÐд¦Àí¡£

¼¤»îº¯ÊýΪÕû¸öÍøÂçÒýÈëÁË·ÇÏßÐÔÌØÐÔ£¬ÕâÒ²ÊÇÉñ¾­ÍøÂçÏà±ÈÓڻعéµÈËã·¨ÄâºÏÄÜÁ¦¸üÇ¿µÄÔ­Òò¡£³£Óõļ¤»îº¯Êý°üÀ¨sigmoid¡¢tanhµÈ£¬ËüÃǵĺ¯Êý±í´ïʽÈçÏ£º

ÕâÀï¿ÉÒÔ¿´³ö£¬sigmoidº¯ÊýµÄÖµÓòÊÇ£¨0,1£©£¬tanhº¯ÊýµÄÖµÓòÊÇ£¨-1,1£©¡£

¾í»ýÉñ¾­ÍøÂçÆðÔ´ÓÚ¶¯ÎïµÄÊÓ¾õϵͳ£¬Ö÷Òª°üº¬µÄ¼¼ÊõÊÇ£º

1.¾Ö²¿¸ÐÖªÓò£¨Ï¡ÊèÁ¬½Ó£©£»

2.²ÎÊý¹²Ïí£»

3.¶à¾í»ýºË£»

4.³Ø»¯¡£

1. ¾Ö²¿¸ÐÖªÓò£¨Ï¡ÊèÁ¬½Ó£©

È«Á¬½ÓÍøÂçµÄÎÊÌâÔÚÓÚ£º

ÐèҪѵÁ·µÄ²ÎÊý¹ý¶à£¬ÈÝÆ÷µ¼Ö½á¹û²»ÊÕÁ²£¨ÌݶÈÏûʧ£©£¬ÇÒѵÁ·ÄѶȼ«´ó£»

ʵ¼ÊÉ϶ÔÓÚij¸ö¾Ö²¿µÄÉñ¾­ÔªÀ´½²£¬Ëü¸ü¼ÓÃô¸ÐµÄÊÇС·¶Î§ÄÚµÄÊäÈ룬»»¾ä»°Ëµ£¬¶ÔÓÚ½ÏÔ¶µÄÊäÈ룬ÆäÏà¹ØÐԺܵͣ¬È¨ÖµÒ²¾Í·Ç³£Ð¡¡£

ÈËÀàµÄÊÓ¾õϵͳ¾ö¶¨ÁËÈËÔÚ¹Û²ìÍâ½çµÄʱºò£¬×ÜÊÇ´Ó¾Ö²¿µ½È«¾Ö¡£

±ÈÈ磬ÎÒÃÇ¿´µ½Ò»¸öÃÀÅ®£¬¿ÉÄÜ×îÏȹ۲쵽µÄÊÇÃÀÅ®ÉíÉϵÄijЩ²¿Î»£¨×Ô¼ºÌå»á£©¡£

Òò´Ë£¬¾í»ýÉñ¾­ÍøÂçÓëÈËÀàµÄÊÓ¾õÀàËÆ£¬²ÉÓþֲ¿¸ÐÖª£¬µÍ²ãµÄÉñ¾­ÔªÖ»¸ºÔð¸ÐÖª¾Ö²¿µÄÐÅÏ¢£¬ÔÚÏòºó´«ÊäµÄ¹ý³ÌÖУ¬¸ß²ãµÄÉñ¾­Ôª½«¾Ö²¿ÐÅÏ¢×ÛºÏÆðÀ´µÃµ½È«¾ÖÐÅÏ¢¡£

È«Á¬½ÓÓë¾Ö²¿Á¬½ÓµÄ¶Ô±È£¨Í¼Æ¬À´×Ô»¥ÁªÍø£©

´ÓÉÏͼÖпÉÒÔ¿´³ö£¬²ÉÓþֲ¿Á¬½ÓÖ®ºó£¬¿ÉÒÔ´ó´óµÄ½µµÍѵÁ·²ÎÊýµÄÁ¿¼¶¡£

2. ²ÎÊý¹²Ïí

ËäȻͨ¹ý¾Ö²¿¸ÐÖª½µµÍÁËѵÁ·²ÎÊýµÄÁ¿¼¶£¬µ«Õû¸öÍøÂçÐèҪѵÁ·µÄ²ÎÊýÒÀÈ»ºÜ¶à¡£

²ÎÊý¹²Ïí¾ÍÊǽ«¶à¸ö¾ßÓÐÏàͬͳ¼ÆÌØÕ÷µÄ²ÎÊýÉèÖÃΪÏàͬ£¬ÆäÒÀ¾ÝÊÇͼÏñÖÐÒ»²¿·ÖµÄͳ¼ÆÌØÕ÷ÓëÆäËü²¿·ÖÊÇÒ»ÑùµÄ¡£ÆäʵÏÖÊÇͨ¹ý¶ÔͼÏñ½øÐоí»ý£¨¾í»ýÉñ¾­ÍøÂçÃüÃûµÄÀ´Ô´£©¡£

¿ÉÒÔÀí½âΪ£¬±ÈÈç´ÓÒ»ÕÅͼÏñÖеÄij¸ö¾Ö²¿£¨¾í»ýºË´óС£©ÌáÈ¡ÁËijÖÖÌØÕ÷£¬È»ºóÒÔÕâÖÖÌØÕ÷Ϊ̽²âÆ÷£¬Ó¦Óõ½Õû¸öͼÏñÖУ¬¶ÔÕû¸öͼÏñ˳Ðò½øÐоí»ý£¬µÃµ½²»Í¬µÄÌØÕ÷¡£

¾í»ý¹ý³Ì£¨Í¼Æ¬À´×Ô»¥ÁªÍø£©

ÿ¸ö¾í»ý¶¼ÊÇÒ»ÖÖÌØÕ÷ÌáÈ¡·½Ê½£¬¾ÍÏñÒ»¸öɸ×Ó£¬½«Í¼ÏñÖзûºÏÌõ¼þ£¨¼¤»îÖµÔ½´óÔ½·ûºÏÌõ¼þ£©µÄ²¿·Öɸѡ³öÀ´£¬Í¨¹ýÕâÖÖ¾í»ý¾Í½øÒ»²½½µµÍѵÁ·²ÎÊýµÄÁ¿¼¶¡£

3. ¶à¾í»ýºË

ÈçÉÏ£¬Ã¿¸ö¾í»ý¶¼ÊÇÒ»ÖÖÌØÕ÷ÌáÈ¡·½Ê½£¬ÄÇô¶ÔÓÚÕû·ùͼÏñÀ´½²£¬µ¥¸ö¾í»ýºËÌáÈ¡µÄÌØÕ÷¿Ï¶¨ÊDz»¹»µÄ£¬ÄÇô¶Ôͬһ·ùͼÏñʹÓöàÖÖ¾í»ýºË½øÐÐÌØÕ÷ÌáÈ¡£¬¾ÍÄܵõ½¶à·ùÌØÕ÷ͼ£¨feature map£©¡£

²»Í¬µÄ¾í»ýºËÌáÈ¡²»Í¬µÄÌØÕ÷£¨Í¼Æ¬À´×Ô»¥ÁªÍø£©

¶à·ùÌØÕ÷ͼ¿ÉÒÔ¿´³ÉÊÇͬһÕÅͼÏñµÄ²»Í¬Í¨µÀ£¬Õâ¸ö¸ÅÄîÔÚºóÃæ´úÂëʵÏÖµÄʱºòÓõÃÉÏ¡£

4. ³Ø»¯

µÃµ½ÌØÕ÷ͼ֮ºó£¬¿ÉÒÔʹÓÃÌáÈ¡µ½µÄÌØÕ÷ȥѵÁ··ÖÀàÆ÷£¬µ«ÒÀÈ»»áÃæÁÙÌØÕ÷ά¶È¹ý¶à£¬ÄÑÒÔ¼ÆË㣬²¢ÇÒ¿ÉÄܹýÄâºÏµÄÎÊÌâ¡£´ÓͼÏñʶ±ðµÄ½Ç¶ÈÀ´½²£¬Í¼Ïñ¿ÉÄÜ´æÔÚÆ«ÒÆ¡¢ÐýתµÈ£¬µ«Í¼ÏñµÄÖ÷ÌåÈ´ÏàͬµÄÇé¿ö¡£Ò²¾ÍÊDz»Í¬µÄÌØÕ÷ÏòÁ¿¿ÉÄܶÔÓ¦×ÅÏàͬµÄ½á¹û£¬ÄÇô³Ø»¯¾ÍÊǽâ¾öÕâ¸öÎÊÌâµÄ¡£

³Ø»¯¹ý³Ì£¨Í¼Æ¬À´×Ô»¥ÁªÍø£©

³Ø»¯¾ÍÊǽ«³Ø»¯ºË·¶Î§ÄÚ£¨±ÈÈç2*2·¶Î§£©µÄѵÁ·²ÎÊý²ÉÓÃÆ½¾ùÖµ£¨Æ½¾ùÖµ³Ø»¯£©»ò×î´óÖµ£¨×î´óÖµ³Ø»¯£©À´½øÐÐÌæ´ú¡£

ÖÕÓÚµ½ÁËչʾģÐ͵Äʱºò£¬ÏÂÃæÕâ·ùͼÊDZÊÕßÊÖ»­µÄ£¨ÓõçÄÔ»­Ì«·Ñʱ£¬½«¾Í¿´°É£©£¬Õâ·ùͼչʾÁ˱¾ÎÄÖÐÓÃÓÚѵÁ·ÓÎÏ·ËùÓõľí»ýÉñ¾­ÍøÂçÄ£ÐÍ¡£

¾í»ýÉñ¾­ÍøÂçÄ£ÐÍ

ͼÏñµÄ´¦Àí¹ý³Ì

1.³õʼÊäÈëËÄ·ùͼÏñ80¡Á80¡Á4£¨4´ú±íÊäÈëͨµÀ£¬³õʼʱËÄ·ùͼÏñÊÇÍêȫһֵ쩣¬¾­¹ý¾í»ýºË8¡Á8¡Á4¡Á32£¨ÊäÈëͨµÀ4£¬Êä³öͨµÀ32£©£¬²½¾àΪ4£¨Ã¿²½¾í»ý×ß4¸öÏñËØµã£©£¬µÃµ½32·ùÌØÕ÷ͼ£¨feature map£©£¬´óСΪ20¡Á20£»

2.½«20¡Á20µÄͼÏñ½øÐгػ¯£¬³Ø»¯ºËΪ2¡Á2£¬µÃµ½Í¼Ïñ´óСΪ10¡Á10£»

3.Ôٴξí»ý£¬¾í»ýºËΪ4¡Á4¡Á32¡Á64£¬²½¾àΪ2£¬µÃµ½Í¼Ïñ5¡Á5¡Á64£»

4.Ôٴξí»ý£¬¾í»ýºËΪ3¡Á3¡Á64*64£¬²½¾àΪ2£¬µÃµ½Í¼Ïñ5¡Á5¡Á64£¬ËäÈ»ÓëÉÏÒ»²½µÃµ½µÄͼÏñ¹æÄ£Ò»Ö£¬µ«Ôٴξí»ýÖ®ºóµÄͼÏñÐÅÏ¢¸üΪ³éÏó£¬Ò²¸ü½Ó½üÈ«¾ÖÐÅÏ¢£»

5.Reshape£¬¼´½«¶àÎ¬ÌØÕ÷ͼת»»ÎªÌØÕ÷ÏòÁ¿£¬µÃµ½1600άµÄÌØÕ÷ÏòÁ¿£»

6.¾­¹ýÈ«Á¬½Ó1600¡Á512£¬µÃµ½512Î¬ÌØÕ÷ÏòÁ¿£»

7.ÔÙ´ÎÈ«Á¬½Ó512¡Á2£¬µÃµ½×îÖÕµÄ2άÏòÁ¿[0,1]ºÍ[1,0]£¬·Ö±ð´ú±íÓÎÏ·ÆÁÄ»ÉϵÄÊÇ·ñµã»÷ʼþ¡£

¿ÉÒÔ¿´³ö£¬¸ÃÄ£ÐÍʵÏÖÁ˶˵½¶ËµÄѧϰ£¬ÊäÈëµÄÊÇÓÎÏ·ÆÁÄ»µÄ½ØÍ¼ÐÅÏ¢£¨´úÂëÖо­¹ýopencv´¦Àí£©£¬Êä³öµÄÊÇÓÎÏ·µÄ¶¯×÷£¬¼´ÊÇ·ñµã»÷ÆÁÄ»¡£Éî¶ÈѧϰµÄÇ¿´óÔÚÓÚÆäÊý¾ÝÄâºÏÄÜÁ¦£¬²»ÐèÒª´«Í³»úÆ÷ѧϰÖи´ÔÓµÄÌØÕ÷ÌáÈ¡¹ý³Ì£¬¶øÊÇÒÀ¿¿Ä£ÐÍ·¢ÏÖÊý¾ÝÄÚ²¿µÄ¹ØÏµ¡£

²»¹ýÕâÒ²´øÀ´ÁíÒ»·½ÃæµÄÎÊÌ⣬ÄǾÍÊÇÉî¶Èѧϰ¸ß¶ÈÒÀÀµ´óÁ¿µÄ±êÇ©Êý¾Ý£¬¶øÕâЩÊý¾Ý»ñÈ¡³É±¾¼«¸ß¡£

Èý¡¢Ëã·¨£ºDeep Q Network

ÓÐÁ˾í»ýÉñ¾­ÍøÂçÄ£ÐÍ£¬ÄÇôÔõÑùѵÁ·Ä£ÐÍ£¿Ê¹µÃÄ£ÐÍÊÕÁ²£¬´Ó¶øÄܹ»Ö¸µ¼ÓÎÏ·¶¯×÷ÄØ£¿»úÆ÷ѧϰ·ÖΪ¼à¶½Ñ§Ï°¡¢·Ç¼à¶½Ñ§Ï°ºÍÇ¿»¯Ñ§Ï°£¬ÕâÀïÒª½éÉܵÄQ NetworkÊôÓÚÇ¿»¯Ñ§Ï°£¨Reinforcement Learning£©µÄ·¶³ë¡£ÔÚÕýʽ½éÉÜQ Network֮ǰ£¬Ïȼòµ¥ËµÏÂËüµÄ¹âÈÙÀúÊ·¡£

2014ÄêGoogle 4ÒÚÃÀ½ðÊÕ¹ºDeepMindµÄÇŶΣ¬´ó¼Ò¿ÉÄÜÌý˵¹ý¡£ÄÇô£¬DeepMindÊÇÈçºÎ±»Google¸ø¶¢ÉϵÄÄØ£¿×îÖÕÔ­Òò¿ÉÒÔ¹é¾ÌΪÕâÆªÂÛÎÄ£º

Playing Atari with Deep Reinforcement Learning

DeepMindÍŶÓͨ¹ýÇ¿»¯Ñ§Ï°£¬Íê³ÉÁË20¶àÖÖÓÎÏ·£¬ÊµÏÖÁ˶˵½¶ËµÄѧϰ¡£ÆäÓõ½µÄËã·¨¾ÍÊÇQ Network¡£2015Ä꣬DeepMindÍŶÓÔÚ¡¶Nature¡·ÉÏ·¢±íÁËһƪÉý¼¶°æ£º

Human-level control through deep reinforcement learning

×Ô´Ë£¬ÔÚÕâÀàÓÎÏ·ÁìÓò£¬ÈËÒѾ­ÎÞ·¨³¬¹ý»úÆ÷ÁË¡£ºóÀ´ÓÖÓÐÁËAlphaGo£¬ÒÔ¼°Master£¬µ±È»£¬Õâ¶¼ÊǺó»°ÁË¡£Æäʵ±¾ÎÄÒ²ÊôÓÚÉÏÊöÂÛÎĵķ¶³ë£¬Ö»²»¹ý»ùÓÚTensorFlowƽ̨½øÐÐÁËʵÏÖ£¬¼ÓÈëÁËһЩ±ÊÕß×Ô¼ºµÄÀí½â¶øÒÑ¡£

»Øµ½ÕýÌ⣬Q NetworkÊôÓÚÇ¿»¯Ñ§Ï°£¬ÄÇôÏȽéÉÜÏÂÇ¿»¯Ñ§Ï°¡£

Ç¿»¯Ñ§Ï°Ä£ÐÍ

ÕâÕÅͼÊÇ´ÓUCLµÄ¿Î³ÌÖп½³öÀ´µÄ£¬¿Î³ÌÁ´½ÓµØÖ·£¨YouTube£©£º

ÕâÕÅͼÊÇ´ÓUCLµÄ¿Î³ÌÖп½³öÀ´µÄ£¬¿Î³ÌÁ´½ÓµØÖ·£¨YouTube£©£º

https://www.youtube.com/watch?v=2pWv7GOvuf0

Ç¿»¯Ñ§Ï°¹ý³ÌÓÐÁ½¸ö×é³É²¿·Ö£º

1.ÖÇÄÜ´úÀí£¨Ñ§Ï°ÏµÍ³£©

2.»·¾³

ÈçͼËùʾ£¬ÔÚÿ²½µü´ú¹ý³ÌÖУ¬Ê×ÏÈÖÇÄÜ´úÀí£¨Ñ§Ï°ÏµÍ³£©½ÓÊÕ»·¾³µÄ״̬st£¬È»ºó²úÉú¶¯×÷at×÷ÓÃÓÚ»·¾³£¬»·¾³½ÓÊÕ¶¯×÷at£¬²¢ÇÒ¶ÔÆä½øÐÐÆÀ¼Û£¬·´À¡¸øÖÇÄÜ´úÀírt¡£²»¶ÏµÄÑ­»·Õâ¸ö¹ý³Ì£¬¾Í»á²úÉúÒ»¸ö״̬/¶¯×÷/·´À¡µÄÐòÁУº£¨s1, a1, r1, s2, a2, r2¡­..,sn, an, rn£©£¬¶øÕâ¸öÐòÁÐÈÃÎÒÃǺÜ×ÔÈ»µÄÏëÆðÁË:

Âí¶û¿Æ·ò¾ö²ß¹ý³Ì

MDP£ºÂí¶û¿Æ·ò¾ö²ß¹ý³Ì

Âí¶û¿Æ·ò¾ö²ß¹ý³ÌÓëÖøÃûµÄHMM£¨ÒþÂí¶û¿Æ·òÄ£ÐÍ£©ÏàͬµÄÊÇ£¬ËüÃǶ¼¾ßÓÐÂí¶û¿Æ·òÌØÐÔ¡£ÄÇôʲôÊÇÂí¶û¿Æ·òÌØÐÔÄØ£¿¼òµ¥À´Ëµ£¬¾ÍÊÇδÀ´µÄ״ֻ̬ȡ¾öÓÚµ±Ç°µÄ״̬£¬Óë¹ýÈ¥µÄ״̬Î޹ء£

HMM£¨Âí¶û¿Æ·òÄ£ÐÍ£©ÔÚÓïÒôʶ±ð£¬ÐÐΪʶ±ðµÈ»úÆ÷ѧϰÁìÓòÓнÏΪ¹ã·ºµÄÓ¦Óá£Ìõ¼þËæ»ú³¡Ä£ÐÍ£¨Conditional

Random Field£©ÔòÓÃÓÚ×ÔÈ»ÓïÑÔ´¦Àí¡£Á½´óÄ£ÐÍÊÇÓïÒôʶ±ð¡¢×ÔÈ»ÓïÑÔ´¦ÀíÁìÓòµÄ»ùʯ¡£

ÉÏͼ¿ÉÒÔÓÃÒ»¸öºÜÐÎÏóµÄÀý×ÓÀ´ËµÃ÷¡£±ÈÈçÄã±ÏÒµ½øÈëÁËÒ»¸ö¹«Ë¾£¬ÄãµÄ³õʼְ¼¶ÊÇT1£¨¶ÔӦͼÖÐµÄ s1£©£¬ÄãÔÚ¹¤×÷ÉϿ̿àŬÁ¦£¬×·ÇóÉϽø£¨¶ÔӦͼÖеÄa1£©£¬È»ºóÁìµ¼¾õµÃÄã²»´í£¬×¼±¸¸øÄãÉýÖ°£¨¶ÔӦͼÖеÄr1£©£¬ÓÚÊÇ£¬ÄãÉýµ½ÁËT2£»Äã¼ÌÐø¿Ì¿àŬÁ¦£¬×·ÇóÉϽø¡­¡­²»¶ÏµÄŬÁ¦£¬²»¶ÏµÄÉýÖ°£¬×îºóÉýµ½ÁËsn¡£µ±È»£¬ÄãÒ²ÓпÉÄܲ»Å¬Á¦ÉϽø£¬ÕâÒ²ÊÇÒ»ÖÖ¶¯×÷£¬»»¾ä»°Ëµ£¬¸Ã¶¯×÷aÒ²ÊôÓÚ¶¯×÷¼¯ºÏA£¬È»ºóµÃµ½µÄ·´À¡r¾ÍÊÇûÓÐÉýÖ°¼ÓнµÄ»ú»á¡£

ÕâÀï×¢ÒâÏ£¬ÎÒÃǵ±È»Ï£Íû»ñÈ¡×î¶àµÄÉýÖ°£¬ÄÇôÎÊÌâת»»Îª£ºÈçºÎ¸ù¾Ýµ±Ç°×´Ì¬s£¨sÊôÓÚ״̬¼¯S£©£¬´ÓAÖÐѡȡ¶¯×÷aÖ´ÐÐÓÚ»·¾³£¬´Ó¶ø»ñÈ¡×î¶àµÄr£¬¼´r1 + r2 ¡­¡­+rnµÄºÍ×î´ó £¿ÕâÀï±ØÐëÒªÒýÈëÒ»¸öÊýѧ¹«Ê½£º×´Ì¬Öµº¯Êý¡£

״ֵ̬º¯ÊýÄ£ÐÍ

¹«Ê½ÖÐÓиöÕÛºÏÒò×Ӧã¬Æäȡֵ·¶Î§Îª[0,1]£¬µ±ÆäΪ0ʱ£¬±íʾֻ¿¼Âǵ±Ç°¶¯×÷¶Ôµ±Ç°µÄÓ°Ï죬²»¿¼ÂǶԺóÐø²½ÖèµÄÓ°Ï죬µ±ÆäΪ1ʱ£¬±íʾµ±Ç°¶¯×÷¶ÔºóÐøÃ¿²½¶¼ÓоùµÈµÄÓ°Ïì¡£µ±È»£¬Êµ¼ÊÇé¿öͨ³£Êǵ±Ç°¶¯×÷¶ÔºóÐøµÃ·ÖÓÐÒ»¶¨µÄÓ°Ï죬µ«Ëæ×Ų½ÊýÔö¼Ó£¬ÆäÓ°Ïì¼õС¡£

´Ó¹«Ê½ÖпÉÒÔ¿´³ö£¬×´Ì¬Öµº¯Êý¿ÉÒÔͨ¹ýµü´úµÄ·½Ê½À´Çó½â¡£ÔöǿѧϰµÄÄ¿µÄ¾ÍÊÇÇó½âÂí¶û¿É·ò¾ö²ß¹ý³Ì£¨MDP£©µÄ×îÓŲßÂÔ¡£

²ßÂÔ¾ÍÊÇÈçºÎ¸ù¾Ý»·¾³Ñ¡È¡¶¯×÷À´Ö´ÐеÄÒÀ¾Ý¡£²ßÂÔ·ÖΪÎȶ¨µÄ²ßÂԺͲ»Îȶ¨µÄ²ßÂÔ£¬Îȶ¨µÄ²ßÂÔÔÚÏàͬµÄ»·¾³ÏÂ

×ÜÊÇ»á¸ø³öÏàͬµÄ¶¯×÷£¬²»Îȶ¨µÄ²ßÂÔÔò·´Ö®£¬ÕâÀïÎÒÃÇÖ÷ÒªÌÖÂÛÎȶ¨µÄ²ßÂÔ¡£

Çó½âÉÏÊö״̬º¯ÊýÐèÒª²ÉÓö¯Ì¬¹æ»®µÄ·½·¨£¬¶ø¾ßÌåµ½¹«Ê½£¬²»µÃ²»Ì᣺

±´¶ûÂü·½³Ì

ÆäÖУ¬¦Ð´ú±íÉÏÊöÌáµ½µÄ²ßÂÔ£¬Q ¦Ð (s, a)Ïà±ÈÓÚV ¦Ð (s)£¬ÒýÈëÁ˶¯×÷£¬±»³Æ×÷¶¯×÷Öµº¯Êý¡£¶Ô±´¶ûÂü·½³ÌÇó×îÓŽ⣬¾ÍµÃµ½Á˱´¶ûÂü×îÓÅÐÔ·½³Ì¡£

Çó½â¸Ã·½³ÌÓÐÁ½ÖÖ·½·¨£º²ßÂÔµü´úºÍÖµµü´ú¡£

²ßÂÔµü´ú

²ßÂÔµü´ú·ÖΪÁ½¸ö²½Ö裺²ßÂÔÆÀ¹ÀºÍ²ßÂԸĽø£¬¼´Ê×ÏÈÆÀ¹À²ßÂÔ£¬µÃµ½×´Ì¬Öµº¯Êý£¬Æä´Î£¬¸Ä½ø²ßÂÔ£¬Èç¹ûеIJßÂÔ±È֮ǰºÃ£¬¾ÍÌæ´úÀϵIJßÂÔ¡£

Öµµü´ú

´ÓÉÏÃæÎÒÃÇ¿ÉÒÔ¿´µ½£¬²ßÂÔµü´úËã·¨°üº¬ÁËÒ»¸ö²ßÂÔ¹À¼ÆµÄ¹ý³Ì£¬¶ø²ßÂÔ¹À¼ÆÔòÐèҪɨÃè(sweep)ËùÓеÄ״̬Èô¸É´Î£¬ÆäÖо޴óµÄ¼ÆËãÁ¿Ö±½ÓÓ°ÏìÁ˲ßÂÔµü´úËã·¨µÄЧÂÊ¡£¶øÖµµü´úÿ´ÎֻɨÃèÒ»´Î£¬¸üйý³ÌÈçÏ£º

¼´ÔÚÖµµü´úµÄµÚk+1´Îµü´úʱ£¬Ö±½Ó½«ÄÜ»ñµÃµÄ×î´óµÄV¦Ð(s)Öµ¸³¸øVk+1¡£

Q-Learning

Q-LearningÊǸù¾ÝÖµµü´úµÄ˼·À´½øÐÐѧϰµÄ¡£¸ÃËã·¨ÖУ¬QÖµ¸üÐµķ½·¨ÈçÏ£º

ËäÈ»¸ù¾ÝÖµµü´ú¼ÆËã³öÄ¿±êQÖµ£¬µ«ÊÇÕâÀﲢûÓÐÖ±½Ó½«Õâ¸öQÖµ£¨ÊǹÀ¼ÆÖµ£©Ö±½Ó¸³ÓèеÄQ£¬¶øÊDzÉÓý¥½øµÄ·½Ê½ÀàËÆÌݶÈϽµ£¬³¯Ä¿±êÂõ½üһС²½£¬È¡¾öÓÚ¦Á£¬Õâ¾ÍÄܹ»¼õÉÙ¹À¼ÆÎó²îÔì³ÉµÄÓ°Ïì¡£ÀàËÆËæ»úÌݶÈϽµ£¬×îºó¿ÉÒÔÊÕÁ²µ½×îÓŵÄQÖµ¡£¾ßÌåËã·¨ÈçÏ£º

Èç¹ûûÓнӴ¥¹ý¶¯Ì¬¹æ»®µÄͯЬ¿´ÉÏÊö¹«Ê½¿ÉÄÜÓеãÍ·´ó£¬ÏÂÃæÍ¨¹ý±í¸ñÀ´ÑÝʾÏÂQÖµ¸üеĹý³Ì£¬´ó¼Ò¾ÍÃ÷°×ÁË¡£

Q-LearningËã·¨µÄ¹ý³Ì¾ÍÊÇ´æ´¢QÖµµÄ¹ý³Ì¡£ÉϱíÖУ¬ºáÁÐΪ״̬s£¬×ÝÁÐΪAction a£¬sºÍa¾ö¶¨Á˱íÖеÄQÖµ¡£

µÚÒ»²½£º³õʼ»¯£¬½«±íÖеÄQֵȫ²¿ÖÃ0£»

µÚ¶þ²½£º¸ù¾Ý²ßÂÔ¼°×´Ì¬s£¬Ñ¡ÔñaÖ´ÐС£¼Ù¶¨µ±Ç°×´Ì¬Îªs1£¬ÓÉÓÚ³õʼֵ¶¼Îª0£¬ËùÒÔÈÎÒâѡȡaÖ´ÐУ¬¼Ù¶¨ÕâÀïѡȡÁËa2Ö´ÐУ¬µÃµ½ÁËrewardΪ1£¬²¢ÇÒ½øÈëÁË״̬s3¡£¸ù¾ÝQÖµ¸üй«Ê½£º

À´¸üÐÂQÖµ£¬ÕâÀïÎÒÃǼÙÉè¦ÁÊÇ1£¬¦ËÒ²µÈÓÚ1£¬Ò²¾ÍÊÇÿһ´Î¶¼°ÑÄ¿±êQÖµ¸³¸øQ¡£ÄÇôÕâÀ﹫ʽ±ä³É£º

ËùÒÔÔÚÕâÀ¾ÍÊÇ

ÄÇô¶ÔÓ¦µÄs3״̬£¬×î´óÖµÊÇ0£¬ËùÒÔ

Q±í¸ñ¾Í±ä³É£º

È»ºóÖÃλµ±Ç°×´Ì¬sΪs3¡£

µÚÈý²½£º¼ÌÐøÑ­»·²Ù×÷£¬½øÈëÏÂÒ»´Î¶¯×÷£¬µ±Ç°×´Ì¬ÊÇs3£¬¼ÙÉèÑ¡Ôñ¶¯×÷a3£¬È»ºóµÃµ½rewardΪ2£¬×´Ì¬±ä³És1£¬ÄÇôÎÒÃÇͬÑù½øÐиüУº

ËùÒÔQµÄ±í¸ñ¾Í±ä³É£º

µÚËIJ½£º ¼ÌÐøÑ­»·£¬QÖµÔÚÊÔÑéµÄͬʱ·´¸´¸üУ¬Ö±µ½ÊÕÁ²¡£

ÉÏÊö±í¸ñÑÝʾÁ˾ßÓÐ4ÖÖ״̬/4ÖÖÐÐΪµÄϵͳ£¬È»¶øÔÚʵ¼ÊÓ¦ÓÃÖУ¬ÒÔ±¾ÎĽ²µ½µÄFlappy BirdÓÎϷΪÀý£¬½çÃæÎª80*80¸öÏñËØµã£¬Ã¿¸öÏñËØµãµÄɫֵÓÐ256ÖÖ¿ÉÄÜ¡£ÄÇôʵ¼ÊµÄ״̬×ÜÊýΪ256µÄ80*80´Î·½£¬ÕâÊÇÒ»¸öºÜ´óµÄÊý×Ö£¬Ö±½Óµ¼ÖÂÎÞ·¨Í¨¹ý±í¸ñµÄ˼·½øÐмÆËã¡£

Òò´Ë£¬ÎªÁËʵÏÖ½µÎ¬£¬ÕâÀïÒýÈëÁËÒ»¸ö¼ÛÖµº¯Êý½üËÆµÄ·½·¨£¬Í¨¹ýÒ»¸öº¯Êý±í½üËÆ±í´ï¼ÛÖµº¯Êý£º

ÆäÖУ¬¦Ø Óë b ·Ö±ðΪ²ÎÊý¡£¿´µ½ÕâÀÖÕÓÚ¿ÉÒÔÁªÏµµ½Ç°ÃæÌáµ½µÄÉñ¾­ÍøÂçÁË£¬ÉÏÃæµÄ±í´ïʽ²»¾ÍÊÇÉñ¾­ÔªµÄº¯ÊýÂð£¿

Q-network

ÏÂÃæÕâÕÅͼÀ´×ÔÂÛÎÄ¡¶Human-level Control through Deep Reinforcement Learning¡·£¬ÆäÖÐÏêϸ½éÉÜÁËÉÏÊö½«QÖµÉñ¾­ÍøÂ绯µÄ¹ý³Ì¡££¨¸ÐÐËȤµÄ¿ÉÒÔµã֮ǰµÄÁ´½ÓÁ˽âÔ­ÎÄ¡«£©

ÒÔ±¾ÎÄΪÀý£¬ÊäÈëÊǾ­¹ý´¦ÀíµÄ4¸öÁ¬ÐøµÄ80¡Á80ͼÏñ£¬È»ºó¾­¹ýÈý¸ö¾í»ý²ã£¬Ò»¸ö³Ø»¯²ã£¬Á½¸öÈ«Á¬½Ó²ã£¬×îºóÊä³ö°üº¬Ã¿Ò»¸ö¶¯×÷QÖµµÄÏòÁ¿¡£

ÏÖÔÚÒѾ­½«Q-learningÉñ¾­ÍøÂ绯ΪQ-networkÁË£¬½ÓÏÂÀ´µÄÎÊÌâÊÇÈçºÎѵÁ·Õâ¸öÉñ¾­ÍøÂç¡£Éñ¾­ÍøÂçѵÁ·µÄ¹ý³ÌÆäʵ¾ÍÊÇÒ»¸ö×îÓÅ»¯·½³ÌÇó½âµÄ¹ý³Ì£¬¶¨ÒåϵͳµÄËðʧº¯Êý£¬È»ºóÈÃËðʧº¯Êý×îС»¯µÄ¹ý³Ì¡£

ѵÁ·¹ý³ÌÒÀÀµÓÚÉÏÊöÌáµ½µÄDQNËã·¨£¬ÒÔÄ¿±êQÖµ×÷Ϊ±êÇ©£¬Òò´Ë£¬Ëðʧº¯Êý¿ÉÒÔ¶¨ÒåΪ£º

ÉÏÃæ¹«Ê½ÊÇs'£¬a'¼´ÏÂÒ»¸ö״̬ºÍ¶¯×÷¡£È·¶¨ÁËËðʧº¯Êý£¬È·¶¨ÁË»ñÈ¡Ñù±¾µÄ·½Ê½£¬DQNµÄÕû¸öËã·¨Ò²¾Í³ÉÐÍÁË£¡

ÖµµÃ×¢ÒâµÄÊÇÕâÀïµÄD¡ªExperience Replay£¬Ò²¾ÍÊǾ­Ñ鳨£¬¾ÍÊÇÈçºÎ´æ´¢Ñù±¾¼°²ÉÑùµÄÎÊÌâ¡£

ÓÉÓÚÍæFlappy BirdÓÎÏ·£¬²É¼¯µÄÑù±¾ÊÇÒ»¸öʱ¼äÐòÁУ¬Ñù±¾Ö®¼ä¾ßÓÐÁ¬ÐøÐÔ£¬Èç¹ûÿ´ÎµÃµ½Ñù±¾¾Í¸üÐÂQÖµ£¬ÊÜÑù±¾·Ö²¼Ó°Ï죬Ч¹û»á²»ºÃ¡£Òò´Ë£¬Ò»¸öºÜÖ±½ÓµÄÏë·¨¾ÍÊǰÑÑù±¾ÏÈ´æÆðÀ´£¬È»ºóËæ»ú²ÉÑùÈçºÎ£¿Õâ¾ÍÊÇExperience ReplayµÄ˼Ïë¡£

Ë㷨ʵÏÖÉÏ£¬ÏÈ·´¸´ÊµÑ飬²¢ÇÒ½«ÊµÑéÊý¾Ý´æ´¢ÔÚDÖУ»´æ´¢µ½Ò»¶¨³Ì¶È£¬¾Í´ÓÖÐËæ»ú³éÈ¡Êý¾Ý£¬¶ÔËðʧº¯Êý½øÐÐÌݶÈϽµ¡£

ËÄ¡¢´úÂ룺TensorFlowʵÏÖ

ÖÕÓÚµ½ÁË¿´´úÂëµÄʱºò¡£Ê×ÏÈÉêÃ÷Ï£¬µ±±ÊÕß´ÓDeep MindµÄÂÛÎÄÈëÊÖ£¬ÊÔͼÓÃTensorFlowʵÏÖ¶ÔFlappy BirdÓÎÏ·½øÐÐʵÏÖʱ£¬·¢ÏÖgithubÒÑÓдóÉñÍê³Édemo¡£Ë¼Â·Ïàͬ£¬ËùÒÔÖ±½ÓÒÔ¹«¿ª´úÂëΪÀý½øÐзÖÎö˵Ã÷ÁË¡£

ÈçÓÐÔ´ÂëÐèÒª£¬ÇëÒÆ²½github£ºUsing Deep Q-Network to Learn How To Play Flappy Bird¡£

´úÂë´Ó½á¹¹ÉÏÀ´½²£¬Ö÷Òª·ÖΪÒÔϼ¸²¿·Ö£º

1.GameStateÓÎÏ·À࣬frame_step·½·¨¿ØÖÆÒƶ¯

2.CNNÄ£Ð͹¹½¨

3.OpenCV-PythonͼÏñÔ¤´¦Àí·½·¨

4.Ä£ÐÍѵÁ·¹ý³Ì

1. GameStateÓÎÏ·À༰frame_step·½·¨

ͨ¹ýPythonʵÏÖÓÎÏ·±ØÈ»ÒªÓÃpygame¿â£¬Æä°üº¬Ê±ÖÓ¡¢»ù±¾µÄÏÔʾ¿ØÖÆ¡¢¸÷ÖÖÓÎÏ·¿Ø¼þ¡¢´¥·¢Ê¼þµÈ£¬¶Ô´ËÓÐÐËȤµÄ£¬¿ÉÒÔÏêϸÁ˽âpygame¡£frame_step·½·¨µÄÈë²ÎΪshapeΪ (2,) µÄndarray£¬ÖµÓò£º [1,0]£ºÊ²Ã´¶¼²»×ö£» [0,1]£ºÌáÉýBird¡£À´¿´Ï´úÂëʵÏÖ£º

if input_actions[1] == 1:

if self.playery > -2 * PLAYER_HEIGHT:

self.playerVelY = self.playerFlapAcc

self.playerFlapped = True

# SOUNDS['wing'].play()

ºóÐø²Ù×÷°üÀ¨¼ì²éµÃ·Ö¡¢ÉèÖýçÃæ¡¢¼ì²éÊÇ·ñÅöײµÈ£¬ÕâÀï²»ÔÙÏêϸչ¿ª¡£

frame_step·½·¨µÄ·µ»ØÖµÊÇ£º

return image_data, reward, terminal

·Ö±ð±íʾ½çÃæÍ¼ÏñÊý¾Ý£¬µÃ·ÖÒÔ¼°ÊÇ·ñ½áÊøÓÎÏ·¡£¶ÔÓ¦Ç°ÃæÇ¿»¯Ñ§Ï°Ä£ÐÍ£¬½çÃæÍ¼ÏñÊý¾Ý±íʾ»·¾³×´Ì¬ s£¬µÃ·Ö±íʾ»·¾³¸øÓèѧϰϵͳµÄ·´À¡ r¡£

2. CNNÄ£Ð͹¹½¨

¸ÃDemoÖаüº¬Èý¸ö¾í»ý²ã£¬Ò»¸ö³Ø»¯²ã£¬Á½¸öÈ«Á¬½Ó²ã£¬×îºóÊä³ö°üº¬Ã¿Ò»¸ö¶¯×÷QÖµµÄÏòÁ¿¡£Òò´Ë£¬Ê×Ïȶ¨ÒåÈ¨ÖØ¡¢Æ«Öᢾí»ýºÍ³Ø»¯º¯Êý£º

# È¨ÖØ

def weight_variable(shape):

initial = tf.truncated_normal(shape, stddev=0.01)

return tf.Variable(initial)

# Æ«ÖÃ

def bias_variable(shape):

initial = tf.constant(0.01, shape=shape)

return tf.Variable(initial)

# ¾í»ý

def conv2d(x, W, stride):

return tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding="SAME")

# ³Ø»¯

def max_pool_2x2(x):

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

È»ºó£¬Í¨¹ýÉÏÊöº¯Êý¹¹½¨¾í»ýÉñ¾­ÍøÂçÄ£ÐÍ£¨¶Ô´úÂëÖвÎÊý²»½âµÄ£¬¿ÉÖ±½ÓÍùǰ·­£¬¿´ÉÏÃæÄÇÕÅÊÖ»­µÄͼ£©¡£

def createNetwork():

# µÚÒ»²ã¾í»ý

W_conv1 = weight_variable([8, 8, 4, 32])

b_conv1 = bias_variable([32])

# µÚ¶þ²ã¾í»ý

W_conv2 = weight_variable([4, 4, 32, 64])

b_conv2 = bias_variable([64])

# µÚÈý²ã¾í»ý

W_conv3 = weight_variable([3, 3, 64, 64])

b_conv3 = bias_variable([64])

# µÚÒ»²ãÈ«Á¬½Ó

W_fc1 = weight_variable([1600, 512])

b_fc1 = bias_variable([512])

# µÚ¶þ²ãÈ«Á¬½Ó

W_fc2 = weight_variable([512, ACTIONS])

b_fc2 = bias_variable([ACTIONS])

# ÊäÈë²ã

s = tf.placeholder("float", [None, 80, 80, 4])

# µÚÒ»²ãÒþ²Ø²ã+³Ø»¯²ã

h_conv1 = tf.nn.relu(conv2d(s, W_conv1, 4) + b_conv1)

h_pool1 = max_pool_2x2(h_conv1)

# µÚ¶þ²ãÒþ²Ø²ã£¨ÕâÀïÖ»ÓÃÁËÒ»²ã³Ø»¯²ã£©

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2, 2) + b_conv2)

# h_pool2 = max_pool_2x2(h_conv2)

# µÚÈý²ãÒþ²Ø²ã

h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1) + b_conv3)

# h_pool3 = max_pool_2x2(h_conv3)

# Reshape

# h_pool3_flat = tf.reshape(h_pool3, [-1, 256])

h_conv3_flat = tf.reshape(h_conv3, [-1, 1600])

# È«Á¬½Ó²ã

h_fc1 = tf.nn.relu(tf.matmul(h_conv3_flat, W_fc1) + b_fc1)

# Êä³ö²ã

# readout layer

readout = tf.matmul(h_fc1, W_fc2) + b_fc2

return s, readout, h_fc1

3. OpenCV-PythonͼÏñÔ¤´¦Àí·½·¨

ÔÚUbuntuÖа²×°opencvµÄ²½Öè±È½ÏÂé·³£¬µ±Ê±Ò²²ÈÁ˲»ÉÙ¿Ó£¬¸÷ÖÖGoogle½â¾ö¡£½¨Òé°²×°opencv3¡£

Õⲿ·ÖÖ÷Òª¶Ôframe_step·½·¨·µ»ØµÄÊý¾Ý½øÐÐÁ˻ҶȻ¯ºÍ¶þÖµ»¯£¬Ò²¾ÍÊÇ×î»ù±¾µÄͼÏñÔ¤´¦Àí·½·¨¡£

x_t, r_0, terminal = game_state.frame_step(do_nothing)

# Ê×ÏȽ«Í¼Ïñת»»Îª80*80£¬È»ºó½øÐлҶȻ¯

x_t = cv2.cvtColor(cv2.resize(x_t, (80, 80)), cv2.COLOR_BGR2GRAY)

# ¶Ô»Ò¶ÈͼÏñ¶þÖµ»¯

ret, x_t = cv2.threshold(x_t, 1, 255, cv2.THRESH_BINARY)

# ËÄͨµÀÊäÈëͼÏñ

s_t = np.stack((x_t, x_t, x_t, x_t), axis=2)

4. DQNѵÁ·¹ý³Ì

ÕâÊÇ´úÂ벿·ÖÒª½²µÄÖØµã£¬Ò²ÊÇÉÏÊöQ-learningËã·¨µÄ´úÂ뻯¡£

i. ÔÚ½øÈëѵÁ·Ö®Ç°£¬Ê×ÏÈ´´½¨Ò»Ð©±äÁ¿£º

# define the cost function

a = tf.placeholder("float", [None, ACTIONS])

y = tf.placeholder("float", [None])

readout_action = tf.reduce_sum(tf.multiply(readout, a), axis=1)

cost = tf.reduce_mean(tf.square(y - readout_action))

train_step = tf.train.AdamOptimizer(1e-6).minimize(cost)

# open up a game state to communicate with emulator

game_state = game.GameState()

# store the previous observations in replay memory

D = deque()

ÔÚTensorFlowÖУ¬Í¨³£ÓÐÈýÖÖ¶ÁÈ¡Êý¾ÝµÄ·½Ê½£ºFeeding¡¢Reading from filesºÍPreloaded data¡£FeedingÊÇ×î³£ÓÃÒ²×îÓÐЧµÄ·½·¨¡£¼´ÔÚÄ£ÐÍ£¨Graph£©¹¹½¨Ö®Ç°£¬ÏÈʹÓÃplaceholder½øÐÐռ룬µ«´Ëʱ²¢Ã»ÓÐѵÁ·Êý¾Ý£¬ÑµÁ·ÊÇͨ¹ýfeed_dict´«ÈëÊý¾Ý¡£

ÕâÀïµÄa±íʾÊä³öµÄ¶¯×÷£¬¼´Ç¿»¯Ñ§Ï°Ä£ÐÍÖеÄAction£¬y±íʾ±êǩֵ£¬readout_action±íʾģÐÍÊä³öÓëaÏà³Ëºó£¬ÔÚһάÇóºÍ£¬Ëðʧº¯Êý¶Ô±êǩֵÓëÊä³öÖµµÄ²î½øÐÐÆ½·½£¬train_step±íʾ¶ÔËðʧº¯Êý½øÐÐAdamÓÅ»¯¡£

¸³ÖµµÄ¹ý³ÌΪ£º

# perform gradient step

train_step.run(feed_dict={

y: y_batch,

a: a_batch,

s: s_j_batch}

)

ii. ´´½¨ÓÎÏ·¼°¾­Ñ鳨 D

# open up a game state to communicate with emulator

game_state = game.GameState()

# store the previous observations in replay memory

D = deque()

¾­Ñ鳨 D²ÉÓÃÁ˶ÓÁеÄÊý¾Ý½á¹¹£¬ÊÇTensorFlowÖÐ×î»ù´¡µÄÊý¾Ý½á¹¹£¬¿ÉÒÔͨ¹ýdequeue()ºÍenqueue([y])·½·¨½øÐÐÈ¡³öºÍѹÈëÊý¾Ý¡£¾­Ñ鳨 DÓÃÀ´´æ´¢ÊµÑé¹ý³ÌÖеÄÊý¾Ý£¬ºóÃæµÄѵÁ·¹ý³Ì»á´ÓÖÐËæ»úÈ¡³öÒ»¶¨Á¿µÄbatch½øÐÐѵÁ·¡£

±äÁ¿´´½¨Íê³ÉÖ®ºó£¬ÐèÒªµ÷ÓÃTensorFlowϵͳ·½·¨tf.global_variables_initializer()Ìí¼ÓÒ»¸ö²Ù×÷ʵÏÖ±äÁ¿³õʼ»¯¡£ÔËÐÐʱ»úÊÇÔÚÄ£Ð͹¹½¨Íê³É£¬Session½¨Á¢Ö®³õ¡£±ÈÈ磺

# Create two variables.

weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),

name="weights")

biases = tf.Variable(tf.zeros([200]), name="biases")

...

# Add an op to initialize the variables.

init_op = tf.global_variables_initializer()

# Later, when launching the model

with tf.Session() as sess:

# Run the init operation.

sess.run(init_op)

...

# Use the model

...

iii. ²ÎÊý±£´æ¼°¼ÓÔØ

²ÉÓÃTensorFlowѵÁ·Ä£ÐÍ£¬ÐèÒª½«ÑµÁ·µÃµ½µÄ²ÎÊý½øÐб£´æ£¬²»È»Ò»¹Ø»ú£¬¾ÍÒ»Ò¹»Øµ½½â·ÅǰÁË¡£TensorFlow²ÉÓÃSaverÀ´±£¡£Ò»°ãÔÚSession()½¨Á¢Ö®Ç°£¬Í¨¹ýtf.train.Saver()»ñÈ¡SaverʵÀý¡£

saver = tf.train.Saver()

±äÁ¿µÄ»Ö¸´Ê¹ÓÃsaverµÄrestore·½·¨£º

# Create some variables.

v1 = tf.Variable(..., name="v1")

v2 = tf.Variable(..., name="v2")

...

# Add ops to save and restore all the variables.

saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and

# do some work with the model.

with tf.Session() as sess:

# Restore variables from disk.

saver.restore(sess, "/tmp/model.ckpt")

print("Model restored.")

# Do some work with the model

...

ÔÚ¸ÃDemoѵÁ·Ê±£¬Ò²²ÉÓÃÁËSaver½øÐвÎÊý±£´æ¡£

# saving and loading networks

saver = tf.train.Saver()

checkpoint = tf.train.get_checkpoint_state("saved_networks")

if checkpoint and checkpoint.model_checkpoint_path:

saver.restore(sess, checkpoint.model_checkpoint_path)

print("Successfully loaded:", checkpoint.model_checkpoint_path)

else:

print("Could not find old network weights")

Ê×ÏȼÓÔØCheckPointStateÎļþ£¬È»ºó²ÉÓÃsaver.restore¶ÔÒÑ´æÔÚ²ÎÊý½øÐлָ´¡£

ÔÚ¸ÃDemoÖУ¬Ã¿¸ô10000²½£¬¾Í¶Ô²ÎÊý½øÐб£´æ£º

# save progress every 10000 iterations

if t % 10000 == 0:

saver.save(sess, 'saved_networks/' + GAME + '-dqn', global_step=t)

iv. ʵÑé¼°Ñù±¾´æ´¢

Ê×ÏÈ£¬¸ù¾Ý¦Å ¸ÅÂÊÑ¡ÔñÒ»¸öAction¡£

# choose an action epsilon greedily

readout_t = readout.eval(feed_dict={s: [s_t]})[0]

a_t = np.zeros([ACTIONS])

action_index = 0

if t % FRAME_PER_ACTION == 0:

if random.random() <= epsilon:

print("----------Random Action----------")

action_index = random.randrange(ACTIONS)

a_t[random.randrange(ACTIONS)] = 1

else:

action_index = np.argmax(readout_t)

a_t[action_index] = 1

else:

a_t[0] = 1 # do nothing

ÕâÀreadout_tÊÇѵÁ·Êý¾ÝΪ֮ǰÌáµ½µÄËÄͨµÀͼÏñµÄÄ£ÐÍÊä³ö¡£a_tÊǸù¾Ý¦Å ¸ÅÂÊÑ¡ÔñµÄAction¡£

Æä´Î£¬Ö´ÐÐÑ¡ÔñµÄ¶¯×÷£¬²¢±£´æ·µ»ØµÄ״̬¡¢µÃ·Ö¡£

# run the selected action and observe next state and reward

x_t1_colored, r_t, terminal = game_state.frame_step(a_t)

x_t1 = cv2.cvtColor(cv2.resize(x_t1_colored, (80, 80)), cv2.COLOR_BGR2GRAY)

ret, x_t1 = cv2.threshold(x_t1, 1, 255, cv2.THRESH_BINARY)

x_t1 = np.reshape(x_t1, (80, 80, 1))

# s_t1 = np.append(x_t1, s_t[:,:,1:], axis = 2)

s_t1 = np.append(x_t1, s_t[:, :, :3], axis=2)

# store the transition in D

D.append((s_t, a_t, r_t, s_t1, terminal))

¾­Ñ鳨D±£´æµÄÊÇÒ»¸öÂí¶û¿Æ·òÐòÁС£(s_t, a_t, r_t, s_t1, terminal)·Ö±ð±íʾtʱµÄ״̬s_t£¬Ö´Ðе͝×÷a_t£¬µÃµ½µÄ·´À¡r_t£¬ÒÔ¼°µÃµ½µÄÏÂÒ»²½µÄ״̬s_t1ºÍÓÎÏ·ÊÇ·ñ½áÊøµÄ±êÖ¾terminal¡£

ÔÚÏÂһѵÁ·¹ý³ÌÖУ¬¸üе±Ç°×´Ì¬¼°²½Êý£º

# update the old values

s_t = s_t1

t += 1

ÖØ¸´ÉÏÊö¹ý³Ì£¬ÊµÏÖ·´¸´ÊµÑé¼°Ñù±¾´æ´¢¡£

v. ͨ¹ýÌݶÈϽµ½øÐÐÄ£ÐÍѵÁ·

ÔÚʵÑéÒ»¶Îʱ¼äºó£¬¾­Ñ鳨DÖÐÒѾ­±£´æÁËһЩÑù±¾Êý¾Ýºó£¬¾Í¿ÉÒÔ´ÓÕâЩÑù±¾Êý¾ÝÖÐËæ»ú³éÑù£¬½øÐÐÄ£ÐÍѵÁ·ÁË¡£ÕâÀïÉèÖÃÑù±¾ÊýΪOBSERVE = 100000.¡£Ëæ»ú³éÑùµÄÑù±¾ÊýΪBATCH = 32¡£

if t > OBSERVE:

# sample a minibatch to train on

minibatch = random.sample(D, BATCH)

# get the batch variables

s_j_batch = [d[0] for d in minibatch]

a_batch = [d[1] for d in minibatch]

r_batch = [d[2] for d in minibatch]

s_j1_batch = [d[3] for d in minibatch]

y_batch = []

readout_j1_batch = readout.eval(feed_dict={s: s_j1_batch})

for i in range(0, len(minibatch)):

terminal = minibatch[i][4]

# if terminal, only equals reward

if terminal:

y_batch.append(r_batch[i])

else:

y_batch.append(r_batch[i] + GAMMA * np.max(readout_j1_batch[i]))

# perform gradient step

train_step.run(feed_dict={

y: y_batch,

a: a_batch,

s: s_j_batch}

)

s_j_batch¡¢a_batch¡¢r_batch¡¢s_j1_batchÊÇ´Ó¾­Ñ鳨DÖÐÌáÈ¡µ½µÄÂí¶û¿Æ·òÐòÁУ¨JavaͯЬÏÛĽPythonµÄÁбíÍÆµ¼Ê½°¡£©£¬y_batchΪ±êǩֵ£¬ÈôÓÎÏ·½áÊø£¬Ôò²»´æÔÚÏÂÒ»²½ÖÐ״̬¶ÔÓ¦µÄQÖµ£¨»ØÒäQÖµ¸üйý³Ì£©£¬Ö±½ÓÌí¼Ór_batch£¬Èôδ½áÊø£¬ÔòÓÃÕÛºÏÒò×Ó£¨0.99£©ºÍÏÂÒ»²½ÖÐ״̬µÄ×î´óQÖµµÄ³Ë»ý£¬Ìí¼ÓÖÁy_batch¡£

×îºó£¬Ö´ÐÐÌݶÈϽµÑµÁ·£¬train_stepµÄÈë²ÎÊÇs_j_batch¡¢a_batchºÍy_batch¡£²î²»¶à¾­¹ý2000000²½£¨ÔÚ±¾»úÉÏ´ó¸Å10¸öСʱ£©ÑµÁ·Ö®ºó£¬¾ÍÄÜ´ïµ½±¾ÎÄ¿ªÍ·¶¯Í¼ÖеÄЧ¹ûÀ²¡£

   
3308 ´Îä¯ÀÀ       31
Ïà¹ØÎÄÕÂ

»ùÓÚͼ¾í»ýÍøÂçµÄͼÉî¶Èѧϰ
×Ô¶¯¼ÝÊ»ÖеÄ3DÄ¿±ê¼ì²â
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ
ÏîĿʵս£ºÈçºÎ¹¹½¨ÖªÊ¶Í¼Æ×
 
Ïà¹ØÎĵµ

5GÈ˹¤ÖÇÄÜÎïÁªÍøµÄµäÐÍÓ¦ÓÃ
Éî¶ÈѧϰÔÚ×Ô¶¯¼ÝÊ»ÖеÄÓ¦ÓÃ
ͼÉñ¾­ÍøÂçÔÚ½»²æÑ§¿ÆÁìÓòµÄÓ¦ÓÃÑо¿
ÎÞÈË»úϵͳԭÀí
Ïà¹Ø¿Î³Ì

È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰ&TensorFlow
»úÆ÷ÈËÈí¼þ¿ª·¢¼¼Êõ
È˹¤ÖÇÄÜ£¬»úÆ÷ѧϰºÍÉî¶Èѧϰ
ͼÏñ´¦ÀíËã·¨·½·¨Óëʵ¼ù
×îл¼Æ»®
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢ 6-12[ÏÃÃÅ]
È˹¤ÖÇÄÜ.»úÆ÷ѧϰTensorFlow 6-22[Ö±²¥]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 6-30[±±¾©]
ǶÈëʽÈí¼þ¼Ü¹¹-¸ß¼¶Êµ¼ù 7-9[±±¾©]
Óû§ÌåÑé¡¢Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À 7-25[Î÷°²]
ͼÊý¾Ý¿âÓë֪ʶͼÆ× 8-23[±±¾©]

¶àÄ¿±ê¸ú×Ù£ºAI²úÆ·¾­ÀíÐèÒªÁ˽âµÄCVͨʶ
Éî¶Èѧϰ¼Ü¹¹
¾í»ýÉñ¾­ÍøÂç֮ǰÏò´«²¥Ëã·¨
´Ó0µ½1´î½¨AIÖÐ̨
¹¤Òµ»úÆ÷ÈË¿ØÖÆÏµÍ³¼Ü¹¹½éÉÜ

HadoopÓëSpark´óÊý¾Ý¼Ü¹¹
HadoopÔ­ÀíÓë¸ß¼¶Êµ¼ù
HadoopÔ­Àí¡¢Ó¦ÓÃÓëÓÅ»¯
´óÊý¾ÝÌåϵ¿ò¼ÜÓëÓ¦ÓÃ
´óÊý¾ÝµÄ¼¼ÊõÓëʵ¼ù
Spark´óÊý¾Ý´¦Àí¼¼Êõ

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí