Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Pandas ʹÓÃ
 
  2616  次浏览      27
 2019-7-17
 
±à¼­ÍƼö:
±¾ÎÄÀ´×ÔÓÚjikexueyuan,ÎÄÕÂÖ÷Òª½éÉÜÁË»ùÓÚ NumPy µÄÒ»¸ö·Ç³£ºÃÓõĿ⡪¡ªPandasÒÔ¼°»ù±¾Ê¹Óõȣ¬Ï£Íû¶ÔÄúÄÜÓÐËù°ïÖú¡£

Pandas ÊÇ»ùÓÚ NumPy µÄÒ»¸ö·Ç³£ºÃÓõĿ⣬ÕýÈçÃû×ÖÒ»Ñù£¬È˼ûÈ˰®¡£Ö®ËùÒÔÈç´Ë£¬¾ÍÔÚÓÚ²»ÂÛÊǶÁÈ¡¡¢´¦ÀíÊý¾Ý£¬ÓÃËü¶¼·Ç³£¼òµ¥¡£

»ù±¾µÄÊý¾Ý½á¹¹

Pandas ÓÐÁ½ÖÖ×Ô¼º¶ÀÓеĻù±¾Êý¾Ý½á¹¹¡£¶ÁÕßÓ¦¸Ã×¢ÒâµÄÊÇ£¬Ëü¹ÌÈ»ÓÐ×ÅÁ½ÖÖÊý¾Ý½á¹¹£¬ÒòΪËüÒÀÈ»ÊÇ Python µÄÒ»¸ö¿â£¬ËùÒÔ£¬Python ÖÐÓеÄÊý¾ÝÀàÐÍÔÚÕâÀïÒÀÈ»ÊÊÓã¬Ò²Í¬Ñù»¹¿ÉÒÔʹÓÃÀà×Ô¼º¶¨ÒåÊý¾ÝÀàÐÍ¡£Ö»²»¹ý£¬Pandas ÀïÃæÓÖ¶¨ÒåÁËÁ½ÖÖÊý¾ÝÀàÐÍ£ºSeries ºÍ DataFrame£¬ËüÃÇÈÃÊý¾Ý²Ù×÷¸ü¼òµ¥ÁË¡£

ÒÔϲÙ×÷¶¼ÊÇ»ùÓÚ£º

ΪÁËʡʣ¬ºóÃæ¾Í²»ÔÚÏÔʾÁË¡£²¢ÇÒÈç¹ûÄã¸úÎÒÒ»ÑùÊÇʹÓà ipython notebook£¬Ö»ÐèÒª¿ªÊ¼ÒýÈëÄ£¿é¼´¿É¡£

Series

Series ¾ÍÈçͬÁбíÒ»Ñù£¬Ò»ÏµÁÐÊý¾Ý£¬Ã¿¸öÊý¾Ý¶ÔÓ¦Ò»¸öË÷ÒýÖµ¡£±ÈÈçÕâÑùÒ»¸öÁÐ±í£º[9, 3, 8]£¬Èç¹û¸úË÷Òýֵдµ½Ò»Æð£¬¾ÍÊÇ£º

ÕâÖÖÑùʽÎÒÃÇÒѾ­ÊìϤÁË£¬²»¹ý£¬ÔÚÓÐЩʱºò£¬ÐèÒª°ÑËüÊú¹ýÀ´±íʾ£º

ÉÏÃæÁ½ÖÖ£¬Ö»ÊDZíÏÖÐÎʽÉϵIJî±ð°ÕÁË¡£

Series ¾ÍÊÇ¡°ÊúÆðÀ´¡±µÄ list£º

ÁíÍâÒ»µãÒ²ºÜÏñÁÐ±í£¬¾ÍÊÇÀïÃæµÄÔªËØµÄÀàÐÍ£¬ÓÉÄãÈÎÒâ¾ö¶¨£¨ÆäʵÊÇÓÉÐèÒªÀ´¾ö¶¨£©¡£

ÕâÀÎÒÃÇʵÖÊÉÏ´´½¨ÁËÒ»¸ö Series ¶ÔÏó£¬Õâ¸ö¶ÔÏóµ±È»¾ÍÓÐÆäÊôÐԺͷ½·¨ÁË¡£±ÈÈ磬ÏÂÃæµÄÁ½¸öÊôÐÔÒÀ´Î¿ÉÒÔÏÔʾ Series ¶ÔÏóµÄÊý¾ÝÖµºÍË÷Òý£º

ÁбíµÄË÷ÒýÖ»ÄÜÊÇ´Ó 0 ¿ªÊ¼µÄÕûÊý£¬Series Êý¾ÝÀàÐÍÔÚĬÈÏÇé¿öÏ£¬ÆäË÷ÒýÒ²ÊÇÈç´Ë¡£²»¹ý£¬Çø±ðÓÚÁбíµÄÊÇ£¬Series ¿ÉÒÔ×Ô¶¨ÒåË÷Òý£º

×Ô¶¨ÒåË÷Òý£¬µÄÈ·±È½ÏÓÐÒâ˼¡£¾ÍƾÕâ¸ö£¬Ò²ÊDZØÐëµÄ¡£

ÿ¸öÔªËØ¶¼ÓÐÁËË÷Òý£¬¾Í¿ÉÒÔ¸ù¾ÝË÷Òý²Ù×÷ÔªËØÁË¡£»¹¼ÇµÃ list ÖеIJÙ×÷Âð£¿Series ÖУ¬Ò²ÓÐÀàËÆµÄ²Ù×÷¡£ÏÈ¿´¼òµ¥µÄ£¬¸ù¾ÝË÷Òý²é¿´ÆäÖµºÍÐÞ¸ÄÆäÖµ£º

ÕâÊDz»ÊÇÓÖÓеãÀàËÆ dict Êý¾ÝÁËÄØ£¿µÄÈ·Èç´Ë¡£¿´ÏÂÃæ¾ÍÀí½âÁË¡£

¶ÁÕßÊÇ·ñ×¢Òâµ½£¬Ç°Ã涨Òå Series ¶ÔÏóµÄʱºò£¬ÓõÄÊÇÁÐ±í£¬¼´ Series() ·½·¨µÄ²ÎÊýÖУ¬µÚÒ»¸öÁбí¾ÍÊÇÆäÊý¾ÝÖµ£¬Èç¹ûÐèÒª¶¨Òå index£¬·ÅÔÚºóÃæ£¬ÒÀÈ»ÊÇÒ»¸öÁÐ±í¡£³ýÁËÕâÖÖ·½·¨Ö®Í⣬»¹¿ÉÒÔÓÃÏÂÃæµÄ·½·¨¶¨Òå Series ¶ÔÏó£º

ÏÖÔÚÊÇ·ñÀí½âÎªÊ²Ã´Ç°ÃæÄǸöÀàËÆ dict ÁË£¿ÒòΪ±¾À´¾ÍÊÇ¿ÉÒÔÕâÑù¶¨ÒåµÄ¡£

Õâʱºò£¬Ë÷ÒýÒÀÈ»¿ÉÒÔ×Ô¶¨Òå¡£Pandas µÄÓÅÊÆÔÚÕâÀïÌåÏÖ³öÀ´£¬Èç¹û×Ô¶¨ÒåÁËË÷Òý£¬×Ô¶¨µÄË÷Òý»á×Ô¶¯Ñ°ÕÒÔ­À´µÄË÷Òý£¬Èç¹ûÒ»ÑùµÄ£¬¾ÍȡԭÀ´Ë÷Òý¶ÔÓ¦µÄÖµ£¬Õâ¸ö¿ÉÒÔ¼ò³ÆÎª¡°×Ô¶¯¶ÔÆë¡±¡£

ÔÚ sd ÖУ¬Ö»ÓÐ'python':8000, 'c++':8100, 'c#':4000£¬Ã»ÓÐ"java"£¬µ«ÊÇÔÚË÷Òý²ÎÊýÖÐÓУ¬ÓÚÊÇÆäËüÄܹ»¡°×Ô¶¯¶ÔÆë¡±µÄÕÕ°áÔ­Öµ£¬Ã»ÓеÄÄǸö"java"£¬ÒÀÈ»ÔÚРSeries ¶ÔÏóµÄË÷ÒýÖдæÔÚ£¬²¢ÇÒ×Ô¶¯ÎªÆä¸³Öµ NaN¡£ÔÚ Pandas ÖУ¬Èç¹ûûÓÐÖµ£¬¶¼¶ÔÆë¸³¸ø NaN¡£À´Ò»¸ö¸üÌØÊâµÄ£º

еõ½µÄ Series ¶ÔÏóË÷ÒýÓë sd ¶ÔÏóÒ»¸öÒ²²»¶ÔÓ¦£¬ËùÒÔ¶¼ÊÇ NaN¡£

Pandas ÓÐרÃŵķ½·¨À´ÅжÏÖµÊÇ·ñΪ¿Õ¡£

´ËÍ⣬Series ¶ÔÏóÒ²ÓÐͬÑùµÄ·½·¨£º

Æäʵ£¬¶ÔË÷ÒýµÄÃû×Ö£¬ÊÇ¿ÉÒÔ´Óж¨ÒåµÄ£º

¶ÔÓÚ Series Êý¾Ý£¬Ò²¿ÉÒÔ×öÀàËÆÏÂÃæµÄÔËË㣨¹ØÓÚÔËË㣬ºóÃæ»¹ÒªÏêϸ½éÉÜ£©£º

ÉÏÃæµÄÑÝʾÖУ¬¶¼ÊÇÔÚ ipython notebook ÖнøÐеģ¬ËùÒÔ½ØÍ¼ÁË¡£ÔÚѧϰ Series Êý¾ÝÀàÐÍͬʱÁ˽âÁË ipyton notebook¡£¶ÔÓÚºóÃæµÄËùÓвÙ×÷£¬¶ÁÕß¶¼¿ÉÒÔÔÚ ipython notebook ÖнøÐС£µ«ÊÇ£¬ÎҵĽ²Êö¿ÉÄÜ»áÔÚ Python ½»»¥Ä£Ê½ÖнøÐС£

DataFrame

DataFrame ÊÇÒ»ÖÖ¶þάµÄÊý¾Ý½á¹¹£¬·Ç³£½Ó½üÓÚµç×Ó±í¸ñ»òÕßÀàËÆ mysql Êý¾Ý¿âµÄÐÎʽ¡£ËüµÄÊúÐгÆÖ®Îª columns£¬ºáÐиúÇ°ÃæµÄ Series Ò»Ñù£¬³ÆÖ®Îª index£¬Ò²¾ÍÊÇ˵¿ÉÒÔͨ¹ý columns ºÍ index À´È·¶¨Ò»¸öÖ÷¾äµÄλÖᣣ¨ÓÐÈ衄 DataFrame ·­ÒëΪ¡°Êý¾Ý¿ò¡±£¬ÊDz»ÊÇ»¹¿ÉÒÔ³ÆÖ®Îª¡°¿ð¡±ÄØ£¿ÏòÀïÃæ×°Êý¾ÝÂï¡£)

ÏÂÃæµÄÑÝʾ£¬ÊÇÔÚ Python ½»»¥Ä£Ê½Ï½øÐУ¬¶ÁÕßÈÔÈ»¿ÉÒÔÔÚ ipython notebook »·¾³ÖвâÊÔ¡£

>>> import pandas as pd
>>> from pandas import Series, DataFrame

>>> data = {"name":["yahoo","google","facebook"], "marks":[200,400,800], "price":[9, 3, 7]}
>>> f1 = DataFrame(data)
>>> f1
marks name price
0 200 yahoo 9
1 400 google 3
2 800 facebook 7

ÕâÊǶ¨ÒåÒ»¸ö DataFrame ¶ÔÏóµÄ³£Ó÷½·¨¡ª¡ªÊ¹Óà dict ¶¨Òå¡£×ÖµäµÄ¡°¼ü¡±£¨"name"£¬"marks"£¬"price"£©¾ÍÊÇ DataFrame µÄ columns µÄÖµ£¨Ãû³Æ£©£¬×ÖµäÖÐÿ¸ö¡°¼ü¡±µÄ¡°Öµ¡±ÊÇÒ»¸öÁÐ±í£¬ËüÃǾÍÊÇÄÇÒ»ÊúÁÐÖеľßÌåÌî³äÊý¾Ý¡£ÉÏÃæµÄ¶¨ÒåÖÐûÓÐÈ·¶¨Ë÷Òý£¬ËùÒÔ£¬°´ÕÕ¹ßÀý£¨Series ÖÐÒѾ­ÐγɵĹßÀý£©¾ÍÊÇ´Ó 0 ¿ªÊ¼µÄÕûÊý¡£´ÓÉÏÃæµÄ½á¹ûÖкÜÃ÷ÏÔ±íʾ³öÀ´£¬Õâ¾ÍÊÇÒ»¸ö¶þάµÄÊý¾Ý½á¹¹£¨ÀàËÆ excel »òÕß mysql ÖеIJ鿴Ч¹û£©¡£

ÉÏÃæµÄÊý¾ÝÏÔʾÖУ¬columns µÄ˳ÐòûÓй涨£¬¾ÍÈçͬ×ÖµäÖмüµÄ˳ÐòÒ»Ñù£¬µ«ÊÇÔÚ DataFrame ÖУ¬columns ¸ú×Öµä¼üÏà±È£¬ÓÐÒ»¸öÃ÷ÏÔ²»Í¬£¬¾ÍÊÇÆä˳Ðò¿ÉÒÔ±»¹æ¶¨£¬ÏòÏÂÃæÕâÑù×ö£º

>>> f2 = DataFrame(data, columns=['name','price','marks'])
>>> f2
name price marks
0 yahoo 9 200
1 google 3 400
2 facebook 7 800

¸ú Series ÀàËÆµÄ£¬DataFrame Êý¾ÝµÄË÷ÒýÒ²Äܹ»×Ô¶¨Òå¡£

>>> f3 = DataFrame(data, columns=['name', 'price', 'marks', 'debt'], index=['a','b','c','d'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 283, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 368, in _init_dict
mgr = BlockManager(blocks, axes)
File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 285, in __init__
self._verify_integrity()
File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 367, in _verify_integrity
assert(block.values.shape[1:] == mgr_shape[1:])
AssertionError

±¨´íÁË¡£Õâ¸ö±¨´íÐÅÏ¢¾ÍÌ«²»ÓѺÃÁË£¬Ò²Ã»ÓÐÌṩʲôÏßË÷¡£Õâ¾ÍÊǽ»»¥Ä£Ê½µÄ²»ÀûÖ®´¦¡£ÐÞ¸ÄÖ®£¬´íÎóÔÚÓÚ index µÄÖµ¡ª¡ªÁÐ±í¡ª¡ªµÄÊý¾ÝÏî¶àÁËÒ»¸ö£¬data ÖÐÊÇÈýÐУ¬ÕâÀï¸ø³öÁËËĸöÏ['a','b','c','d']£©¡£

>>> f3 = DataFrame(data, columns=['name', 'price', 'marks', 'debt'], index=['a','b','c'])
>>> f3
name price marks debt
a yahoo 9 200 NaN
b google 3 400 NaN
c facebook 7 800 NaN

¶ÁÕß»¹Òª×¢Òâ¹Û²ìÉÏÃæµÄÏÔʾ½á¹û¡£ÒòΪÔÚ¶¨Òå f3 µÄʱºò£¬columns µÄ²ÎÊýÖУ¬±ÈÒÔÍù¶àÁËÒ»Ïî('debt')£¬µ«ÊÇÕâÏîÔÚ data Õâ¸ö×ÖµäÖв¢Ã»ÓУ¬ËùÒÔ debt ÕâÒ»ÊúÁеÄÖµ¶¼Êǿյģ¬ÔÚ Pandas ÖУ¬¿Õ¾ÍÓà NaN À´´ú±íÁË¡£

¶¨Òå DataFrame µÄ·½·¨£¬³ýÁËÉÏÃæµÄÖ®Í⣬»¹¿ÉÒÔʹÓá°×ÖµäÌ××ֵ䡱µÄ·½Ê½¡£

>>> newdata = {"lang":{"firstline":"python","secondline":"java"}, "price":{"firstline":8000}}
>>> f4 = DataFrame(newdata)
>>> f4
lang price
firstline python 8000
secondline java NaN

ÔÚ×ÖµäÖо͹涨ºÃÊýÁÐÃû³Æ£¨µÚÒ»²ã¼ü£©ºÍÿºáÐÐË÷Òý£¨µÚ¶þ²ã×Öµä¼ü£©ÒÔ¼°¶ÔÓ¦µÄÊý¾Ý£¨µÚ¶þ²ã×ÖµäÖµ£©£¬Ò²¾ÍÊÇÔÚ×ÖµäÖй涨ºÃÁËÿ¸öÊý¾Ý¸ñ×ÓÖеÄÊý¾Ý£¬Ã»Óй涨µÄ¶¼Êǿա£

>>> DataFrame(newdata, index=["firstline","secondline","thirdline"])
lang price
firstline python 8000
secondline java NaN
thirdline NaN NaN

Èç¹û¶îÍâÈ·¶¨ÁËË÷Òý£¬¾ÍÈçͬÉÏÃæÏÔʾһÑù£¬³ý·ÇÔÚ×ÖµäÖÐÓÐÏàÓ¦µÄË÷ÒýÄÚÈÝ£¬·ñÔò¶¼ÊÇ NaN¡£

Ç°Ãæ¶¨ÒåÁË DataFrame Êý¾Ý£¨¿ÉÒÔͨ¹ýÁ½ÖÖ·½·¨£©£¬ËüÒ²ÊÇÒ»ÖÖ¶ÔÏóÀàÐÍ£¬±ÈÈç±äÁ¿ f3 ÒýÓÃÁËÒ»¸ö¶ÔÏó£¬ËüµÄÀàÐÍÊÇ DataFrame¡£³Ð½ÓÒÔǰµÄ˼ά·½·¨£º¶ÔÏóÓÐÊôÐԺͷ½·¨¡£

>>> f3.columns
Index(['name', 'price', 'marks', 'debt'], dtype=object)

DataFrame ¶ÔÏóµÄ columns ÊôÐÔ£¬Äܹ»ÏÔÊ¾ËØÓÐµÄ columns Ãû³Æ¡£²¢ÇÒ£¬»¹ÄÜÓÃÏÂÃæÀàËÆ×ÖµäµÄ·½Ê½£¬µÃµ½Ä³ÊúÁеÄÈ«²¿ÄÚÈÝ£¨µ±È»°üº¬Ë÷Òý£©£º

>>> f3['name']
a yahoo
b google
c facebook
Name: name

ÕâÊÇʲô£¿ÕâÆäʵ¾ÍÊÇÒ»¸ö Series£¬»òÕß˵£¬¿ÉÒÔ½« DataFrame Àí½âΪÊÇÓÐÒ»¸öÒ»¸öµÄ Series ×é³ÉµÄ¡£

Ò»Ö±¹¢¹¢ÓÚ»³Ã»ÓÐÊýÖµµÄÄÇÒ»ÁУ¬ÏÂÃæµÄ²Ù×÷ÊÇͳһ¸øÄÇÒ»Áи³Öµ£º

>>> f3['debt'] = 89.2
>>> f3
name price marks debt
a yahoo 9 200 89.2
b google 3 400 89.2
c facebook 7 800 89.2

³ýÁËÄܹ»Í³Ò»¸³ÖµÖ®Í⣬»¹Äܹ»¡°µã¶Ôµã¡±Ìí¼ÓÊýÖµ£¬½áºÏÇ°ÃæµÄ Series£¬¼ÈÈ» DataFrame ¶ÔÏóµÄÿÊúÁж¼ÊÇÒ»¸ö Series ¶ÔÏó£¬ÄÇô¿ÉÒÔÏȶ¨ÒåÒ»¸ö Series ¶ÔÏó£¬È»ºó°ÑËü·Åµ½ DataFrame ¶ÔÏóÖС£ÈçÏ£º

>>> sdebt = Series([2.2, 3.3], index=["a","c"]) #×¢ÒâË÷Òý
>>> f3['debt'] = sdebt

½« Series ¶ÔÏó(sdebt ±äÁ¿ËùÒýÓÃ) ¸³¸ø f3['debt']ÁУ¬Pandas µÄÒ»¸öÖØÒªÌØÐÔ¡ª¡ª×Ô¶¯¶ÔÆë¡ª¡ªÔÚÕâÀïÆð×öÓÃÁË£¬ÔÚ Series ÖУ¬Ö»ÓÐÁ½¸öË÷Òý£¨"a","c"£©£¬ËüÃǽ«ºÍ DataFrame ÖеÄË÷Òý×Ô¶¯¶ÔÆë¡£ÓÚÊǺõ£º

>>> f3
name price marks debt
a yahoo 9 200 2.2
b google 3 400 NaN
c facebook 7 800 3.3

×Ô¶¯¶ÔÆëÖ®ºó£¬Ã»Óб»¸´ÖƵÄÒÀÈ»±£³Ö NaN¡£

»¹¿ÉÒÔ¸ü¾«×¼µÄÐÞ¸ÄÊý¾ÝÂ𣿵±È»¿ÉÒÔ£¬ÍêÈ«·ÂÕÕ×ÖµäµÄ²Ù×÷£º

>>> f3["price"]["c"]= 300
>>> f3
name price marks debt
a yahoo 9 200 2.2
b google 3 400 NaN
c facebook 300 800 3.3

ÕâЩ²Ù×÷ÊDz»ÊǶ¼²»Ä°Éúѽ£¬Õâ¾ÍÊÇ Pandas ÖеÄÁ½ÖÖÊý¾Ý¶ÔÏó¡£

 

 
   
2616 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖÓë̽ÌÖ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
Ïà¹ØÎĵµ

Android_UI¹Ù·½Éè¼Æ½Ì³Ì
ÊÖ»ú¿ª·¢Æ½Ì¨½éÉÜ
androidÅÄÕÕ¼°ÉÏ´«¹¦ÄÜ
Android½²ÒåÖÇÄÜÊÖ»ú¿ª·¢
Ïà¹Ø¿Î³Ì

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
Androidϵͳ¿ª·¢
AndroidÓ¦Óÿª·¢
ÊÖ»úÈí¼þ²âÊÔ