±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚjikexueyuan,ÎÄÕÂÖ÷Òª½éÉÜÁË»ùÓÚ
NumPy µÄÒ»¸ö·Ç³£ºÃÓõĿ⡪¡ªPandasÒÔ¼°»ù±¾Ê¹Óõȣ¬Ï£Íû¶ÔÄúÄÜÓÐËù°ïÖú¡£ |
|
Pandas ÊÇ»ùÓÚ NumPy µÄÒ»¸ö·Ç³£ºÃÓõĿ⣬ÕýÈçÃû×ÖÒ»Ñù£¬È˼ûÈ˰®¡£Ö®ËùÒÔÈç´Ë£¬¾ÍÔÚÓÚ²»ÂÛÊǶÁÈ¡¡¢´¦ÀíÊý¾Ý£¬ÓÃËü¶¼·Ç³£¼òµ¥¡£
»ù±¾µÄÊý¾Ý½á¹¹
Pandas ÓÐÁ½ÖÖ×Ô¼º¶ÀÓеĻù±¾Êý¾Ý½á¹¹¡£¶ÁÕßÓ¦¸Ã×¢ÒâµÄÊÇ£¬Ëü¹ÌÈ»ÓÐ×ÅÁ½ÖÖÊý¾Ý½á¹¹£¬ÒòΪËüÒÀÈ»ÊÇ
Python µÄÒ»¸ö¿â£¬ËùÒÔ£¬Python ÖÐÓеÄÊý¾ÝÀàÐÍÔÚÕâÀïÒÀÈ»ÊÊÓã¬Ò²Í¬Ñù»¹¿ÉÒÔʹÓÃÀà×Ô¼º¶¨ÒåÊý¾ÝÀàÐÍ¡£Ö»²»¹ý£¬Pandas
ÀïÃæÓÖ¶¨ÒåÁËÁ½ÖÖÊý¾ÝÀàÐÍ£ºSeries ºÍ DataFrame£¬ËüÃÇÈÃÊý¾Ý²Ù×÷¸ü¼òµ¥ÁË¡£
ÒÔϲÙ×÷¶¼ÊÇ»ùÓÚ£º

ΪÁËʡʣ¬ºóÃæ¾Í²»ÔÚÏÔʾÁË¡£²¢ÇÒÈç¹ûÄã¸úÎÒÒ»ÑùÊÇʹÓà ipython notebook£¬Ö»ÐèÒª¿ªÊ¼ÒýÈëÄ£¿é¼´¿É¡£
Series
Series ¾ÍÈçͬÁбíÒ»Ñù£¬Ò»ÏµÁÐÊý¾Ý£¬Ã¿¸öÊý¾Ý¶ÔÓ¦Ò»¸öË÷ÒýÖµ¡£±ÈÈçÕâÑùÒ»¸öÁÐ±í£º[9, 3, 8]£¬Èç¹û¸úË÷Òýֵдµ½Ò»Æð£¬¾ÍÊÇ£º

ÕâÖÖÑùʽÎÒÃÇÒѾÊìϤÁË£¬²»¹ý£¬ÔÚÓÐЩʱºò£¬ÐèÒª°ÑËüÊú¹ýÀ´±íʾ£º

ÉÏÃæÁ½ÖÖ£¬Ö»ÊDZíÏÖÐÎʽÉϵIJî±ð°ÕÁË¡£
Series ¾ÍÊÇ¡°ÊúÆðÀ´¡±µÄ list£º

ÁíÍâÒ»µãÒ²ºÜÏñÁÐ±í£¬¾ÍÊÇÀïÃæµÄÔªËØµÄÀàÐÍ£¬ÓÉÄãÈÎÒâ¾ö¶¨£¨ÆäʵÊÇÓÉÐèÒªÀ´¾ö¶¨£©¡£
ÕâÀÎÒÃÇʵÖÊÉÏ´´½¨ÁËÒ»¸ö Series ¶ÔÏó£¬Õâ¸ö¶ÔÏóµ±È»¾ÍÓÐÆäÊôÐԺͷ½·¨ÁË¡£±ÈÈ磬ÏÂÃæµÄÁ½¸öÊôÐÔÒÀ´Î¿ÉÒÔÏÔʾ
Series ¶ÔÏóµÄÊý¾ÝÖµºÍË÷Òý£º

ÁбíµÄË÷ÒýÖ»ÄÜÊÇ´Ó 0 ¿ªÊ¼µÄÕûÊý£¬Series Êý¾ÝÀàÐÍÔÚĬÈÏÇé¿öÏ£¬ÆäË÷ÒýÒ²ÊÇÈç´Ë¡£²»¹ý£¬Çø±ðÓÚÁбíµÄÊÇ£¬Series
¿ÉÒÔ×Ô¶¨ÒåË÷Òý£º


×Ô¶¨ÒåË÷Òý£¬µÄÈ·±È½ÏÓÐÒâ˼¡£¾ÍƾÕâ¸ö£¬Ò²ÊDZØÐëµÄ¡£
ÿ¸öÔªËØ¶¼ÓÐÁËË÷Òý£¬¾Í¿ÉÒÔ¸ù¾ÝË÷Òý²Ù×÷ÔªËØÁË¡£»¹¼ÇµÃ list ÖеIJÙ×÷Âð£¿Series ÖУ¬Ò²ÓÐÀàËÆµÄ²Ù×÷¡£ÏÈ¿´¼òµ¥µÄ£¬¸ù¾ÝË÷Òý²é¿´ÆäÖµºÍÐÞ¸ÄÆäÖµ£º

ÕâÊDz»ÊÇÓÖÓеãÀàËÆ dict Êý¾ÝÁËÄØ£¿µÄÈ·Èç´Ë¡£¿´ÏÂÃæ¾ÍÀí½âÁË¡£
¶ÁÕßÊÇ·ñ×¢Òâµ½£¬Ç°Ã涨Òå Series ¶ÔÏóµÄʱºò£¬ÓõÄÊÇÁÐ±í£¬¼´ Series() ·½·¨µÄ²ÎÊýÖУ¬µÚÒ»¸öÁбí¾ÍÊÇÆäÊý¾ÝÖµ£¬Èç¹ûÐèÒª¶¨Òå
index£¬·ÅÔÚºóÃæ£¬ÒÀÈ»ÊÇÒ»¸öÁÐ±í¡£³ýÁËÕâÖÖ·½·¨Ö®Í⣬»¹¿ÉÒÔÓÃÏÂÃæµÄ·½·¨¶¨Òå Series ¶ÔÏó£º

ÏÖÔÚÊÇ·ñÀí½âÎªÊ²Ã´Ç°ÃæÄǸöÀàËÆ dict ÁË£¿ÒòΪ±¾À´¾ÍÊÇ¿ÉÒÔÕâÑù¶¨ÒåµÄ¡£
Õâʱºò£¬Ë÷ÒýÒÀÈ»¿ÉÒÔ×Ô¶¨Òå¡£Pandas µÄÓÅÊÆÔÚÕâÀïÌåÏÖ³öÀ´£¬Èç¹û×Ô¶¨ÒåÁËË÷Òý£¬×Ô¶¨µÄË÷Òý»á×Ô¶¯Ñ°ÕÒÔÀ´µÄË÷Òý£¬Èç¹ûÒ»ÑùµÄ£¬¾ÍÈ¡ÔÀ´Ë÷Òý¶ÔÓ¦µÄÖµ£¬Õâ¸ö¿ÉÒÔ¼ò³ÆÎª¡°×Ô¶¯¶ÔÆë¡±¡£

ÔÚ sd ÖУ¬Ö»ÓÐ'python':8000, 'c++':8100, 'c#':4000£¬Ã»ÓÐ"java"£¬µ«ÊÇÔÚË÷Òý²ÎÊýÖÐÓУ¬ÓÚÊÇÆäËüÄܹ»¡°×Ô¶¯¶ÔÆë¡±µÄÕÕ°áÔÖµ£¬Ã»ÓеÄÄǸö"java"£¬ÒÀÈ»ÔÚÐÂ
Series ¶ÔÏóµÄË÷ÒýÖдæÔÚ£¬²¢ÇÒ×Ô¶¯ÎªÆä¸³Öµ NaN¡£ÔÚ Pandas ÖУ¬Èç¹ûûÓÐÖµ£¬¶¼¶ÔÆë¸³¸ø
NaN¡£À´Ò»¸ö¸üÌØÊâµÄ£º

еõ½µÄ Series ¶ÔÏóË÷ÒýÓë sd ¶ÔÏóÒ»¸öÒ²²»¶ÔÓ¦£¬ËùÒÔ¶¼ÊÇ NaN¡£
Pandas ÓÐרÃŵķ½·¨À´ÅжÏÖµÊÇ·ñΪ¿Õ¡£

´ËÍ⣬Series ¶ÔÏóÒ²ÓÐͬÑùµÄ·½·¨£º

Æäʵ£¬¶ÔË÷ÒýµÄÃû×Ö£¬ÊÇ¿ÉÒÔ´Óж¨ÒåµÄ£º

¶ÔÓÚ Series Êý¾Ý£¬Ò²¿ÉÒÔ×öÀàËÆÏÂÃæµÄÔËË㣨¹ØÓÚÔËË㣬ºóÃæ»¹ÒªÏêϸ½éÉÜ£©£º


ÉÏÃæµÄÑÝʾÖУ¬¶¼ÊÇÔÚ ipython notebook ÖнøÐеģ¬ËùÒÔ½ØÍ¼ÁË¡£ÔÚѧϰ Series
Êý¾ÝÀàÐÍͬʱÁ˽âÁË ipyton notebook¡£¶ÔÓÚºóÃæµÄËùÓвÙ×÷£¬¶ÁÕß¶¼¿ÉÒÔÔÚ ipython
notebook ÖнøÐС£µ«ÊÇ£¬ÎҵĽ²Êö¿ÉÄÜ»áÔÚ Python ½»»¥Ä£Ê½ÖнøÐС£
DataFrame
DataFrame ÊÇÒ»ÖÖ¶þάµÄÊý¾Ý½á¹¹£¬·Ç³£½Ó½üÓÚµç×Ó±í¸ñ»òÕßÀàËÆ mysql Êý¾Ý¿âµÄÐÎʽ¡£ËüµÄÊúÐгÆÖ®Îª
columns£¬ºáÐиúÇ°ÃæµÄ Series Ò»Ñù£¬³ÆÖ®Îª index£¬Ò²¾ÍÊÇ˵¿ÉÒÔͨ¹ý columns
ºÍ index À´È·¶¨Ò»¸öÖ÷¾äµÄλÖᣣ¨ÓÐÈ衄 DataFrame ·ÒëΪ¡°Êý¾Ý¿ò¡±£¬ÊDz»ÊÇ»¹¿ÉÒÔ³ÆÖ®Îª¡°¿ð¡±ÄØ£¿ÏòÀïÃæ×°Êý¾ÝÂï¡£)

ÏÂÃæµÄÑÝʾ£¬ÊÇÔÚ Python ½»»¥Ä£Ê½Ï½øÐУ¬¶ÁÕßÈÔÈ»¿ÉÒÔÔÚ ipython
notebook »·¾³ÖвâÊÔ¡£
>>>
import pandas as pd
>>> from pandas import Series, DataFrame
>>> data = {"name":["yahoo","google","facebook"],
"marks":[200,400,800], "price":[9,
3, 7]}
>>> f1 = DataFrame(data)
>>> f1
marks name price
0 200 yahoo 9
1 400 google 3
2 800 facebook 7 |
ÕâÊǶ¨ÒåÒ»¸ö DataFrame ¶ÔÏóµÄ³£Ó÷½·¨¡ª¡ªÊ¹Óà dict ¶¨Òå¡£×ÖµäµÄ¡°¼ü¡±£¨"name"£¬"marks"£¬"price"£©¾ÍÊÇ
DataFrame µÄ columns µÄÖµ£¨Ãû³Æ£©£¬×ÖµäÖÐÿ¸ö¡°¼ü¡±µÄ¡°Öµ¡±ÊÇÒ»¸öÁÐ±í£¬ËüÃǾÍÊÇÄÇÒ»ÊúÁÐÖеľßÌåÌî³äÊý¾Ý¡£ÉÏÃæµÄ¶¨ÒåÖÐûÓÐÈ·¶¨Ë÷Òý£¬ËùÒÔ£¬°´ÕÕ¹ßÀý£¨Series
ÖÐÒѾÐγɵĹßÀý£©¾ÍÊÇ´Ó 0 ¿ªÊ¼µÄÕûÊý¡£´ÓÉÏÃæµÄ½á¹ûÖкÜÃ÷ÏÔ±íʾ³öÀ´£¬Õâ¾ÍÊÇÒ»¸ö¶þάµÄÊý¾Ý½á¹¹£¨ÀàËÆ
excel »òÕß mysql ÖеIJ鿴Ч¹û£©¡£
ÉÏÃæµÄÊý¾ÝÏÔʾÖУ¬columns µÄ˳ÐòûÓй涨£¬¾ÍÈçͬ×ÖµäÖмüµÄ˳ÐòÒ»Ñù£¬µ«ÊÇÔÚ
DataFrame ÖУ¬columns ¸ú×Öµä¼üÏà±È£¬ÓÐÒ»¸öÃ÷ÏÔ²»Í¬£¬¾ÍÊÇÆä˳Ðò¿ÉÒÔ±»¹æ¶¨£¬ÏòÏÂÃæÕâÑù×ö£º
>>>
f2 = DataFrame(data, columns=['name','price','marks'])
>>> f2
name price marks
0 yahoo 9 200
1 google 3 400
2 facebook 7 800 |
¸ú Series ÀàËÆµÄ£¬DataFrame Êý¾ÝµÄË÷ÒýÒ²Äܹ»×Ô¶¨Òå¡£
>>>
f3 = DataFrame(data, columns=['name', 'price',
'marks', 'debt'], index=['a','b','c','d'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/pandas/core/frame.py",
line 283, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/lib/pymodules/python2.7/pandas/core/frame.py",
line 368, in _init_dict
mgr = BlockManager(blocks, axes)
File "/usr/lib/pymodules/python2.7/pandas/core/internals.py",
line 285, in __init__
self._verify_integrity()
File "/usr/lib/pymodules/python2.7/pandas/core/internals.py",
line 367, in _verify_integrity
assert(block.values.shape[1:] == mgr_shape[1:])
AssertionError |
±¨´íÁË¡£Õâ¸ö±¨´íÐÅÏ¢¾ÍÌ«²»ÓѺÃÁË£¬Ò²Ã»ÓÐÌṩʲôÏßË÷¡£Õâ¾ÍÊǽ»»¥Ä£Ê½µÄ²»ÀûÖ®´¦¡£ÐÞ¸ÄÖ®£¬´íÎóÔÚÓÚ
index µÄÖµ¡ª¡ªÁÐ±í¡ª¡ªµÄÊý¾ÝÏî¶àÁËÒ»¸ö£¬data ÖÐÊÇÈýÐУ¬ÕâÀï¸ø³öÁËËĸöÏ['a','b','c','d']£©¡£
>>>
f3 = DataFrame(data, columns=['name', 'price',
'marks', 'debt'], index=['a','b','c'])
>>> f3
name price marks debt
a yahoo 9 200 NaN
b google 3 400 NaN
c facebook 7 800 NaN |
¶ÁÕß»¹Òª×¢Òâ¹Û²ìÉÏÃæµÄÏÔʾ½á¹û¡£ÒòΪÔÚ¶¨Òå f3 µÄʱºò£¬columns µÄ²ÎÊýÖУ¬±ÈÒÔÍù¶àÁËÒ»Ïî('debt')£¬µ«ÊÇÕâÏîÔÚ
data Õâ¸ö×ÖµäÖв¢Ã»ÓУ¬ËùÒÔ debt ÕâÒ»ÊúÁеÄÖµ¶¼Êǿյģ¬ÔÚ Pandas ÖУ¬¿Õ¾ÍÓà NaN
À´´ú±íÁË¡£
¶¨Òå DataFrame µÄ·½·¨£¬³ýÁËÉÏÃæµÄÖ®Í⣬»¹¿ÉÒÔʹÓá°×ÖµäÌ××ֵ䡱µÄ·½Ê½¡£
>>>
newdata = {"lang":{"firstline":"python","secondline":"java"},
"price":{"firstline":8000}}
>>> f4 = DataFrame(newdata)
>>> f4
lang price
firstline python 8000
secondline java NaN |
ÔÚ×ÖµäÖо͹涨ºÃÊýÁÐÃû³Æ£¨µÚÒ»²ã¼ü£©ºÍÿºáÐÐË÷Òý£¨µÚ¶þ²ã×Öµä¼ü£©ÒÔ¼°¶ÔÓ¦µÄÊý¾Ý£¨µÚ¶þ²ã×ÖµäÖµ£©£¬Ò²¾ÍÊÇÔÚ×ÖµäÖй涨ºÃÁËÿ¸öÊý¾Ý¸ñ×ÓÖеÄÊý¾Ý£¬Ã»Óй涨µÄ¶¼Êǿա£
>>>
DataFrame(newdata, index=["firstline","secondline","thirdline"])
lang price
firstline python 8000
secondline java NaN
thirdline NaN NaN |
Èç¹û¶îÍâÈ·¶¨ÁËË÷Òý£¬¾ÍÈçͬÉÏÃæÏÔʾһÑù£¬³ý·ÇÔÚ×ÖµäÖÐÓÐÏàÓ¦µÄË÷ÒýÄÚÈÝ£¬·ñÔò¶¼ÊÇ NaN¡£
Ç°Ãæ¶¨ÒåÁË DataFrame Êý¾Ý£¨¿ÉÒÔͨ¹ýÁ½ÖÖ·½·¨£©£¬ËüÒ²ÊÇÒ»ÖÖ¶ÔÏóÀàÐÍ£¬±ÈÈç±äÁ¿
f3 ÒýÓÃÁËÒ»¸ö¶ÔÏó£¬ËüµÄÀàÐÍÊÇ DataFrame¡£³Ð½ÓÒÔǰµÄ˼ά·½·¨£º¶ÔÏóÓÐÊôÐԺͷ½·¨¡£
>>>
f3.columns
Index(['name', 'price', 'marks', 'debt'], dtype=object)
|
DataFrame ¶ÔÏóµÄ columns ÊôÐÔ£¬Äܹ»ÏÔÊ¾ËØÓеÄ
columns Ãû³Æ¡£²¢ÇÒ£¬»¹ÄÜÓÃÏÂÃæÀàËÆ×ÖµäµÄ·½Ê½£¬µÃµ½Ä³ÊúÁеÄÈ«²¿ÄÚÈÝ£¨µ±È»°üº¬Ë÷Òý£©£º
>>>
f3['name']
a yahoo
b google
c facebook
Name: name |
ÕâÊÇʲô£¿ÕâÆäʵ¾ÍÊÇÒ»¸ö Series£¬»òÕß˵£¬¿ÉÒÔ½« DataFrame Àí½âΪÊÇÓÐÒ»¸öÒ»¸öµÄ Series
×é³ÉµÄ¡£
Ò»Ö±¹¢¹¢ÓÚ»³Ã»ÓÐÊýÖµµÄÄÇÒ»ÁУ¬ÏÂÃæµÄ²Ù×÷ÊÇͳһ¸øÄÇÒ»Áи³Öµ£º
>>>
f3['debt'] = 89.2
>>> f3
name price marks debt
a yahoo 9 200 89.2
b google 3 400 89.2
c facebook 7 800 89.2 |
³ýÁËÄܹ»Í³Ò»¸³ÖµÖ®Í⣬»¹Äܹ»¡°µã¶Ôµã¡±Ìí¼ÓÊýÖµ£¬½áºÏÇ°ÃæµÄ Series£¬¼ÈÈ»
DataFrame ¶ÔÏóµÄÿÊúÁж¼ÊÇÒ»¸ö Series ¶ÔÏó£¬ÄÇô¿ÉÒÔÏȶ¨ÒåÒ»¸ö Series ¶ÔÏó£¬È»ºó°ÑËü·Åµ½
DataFrame ¶ÔÏóÖС£ÈçÏ£º
>>>
sdebt = Series([2.2, 3.3], index=["a","c"])
#×¢ÒâË÷Òý
>>> f3['debt'] = sdebt |
½« Series ¶ÔÏó(sdebt ±äÁ¿ËùÒýÓÃ) ¸³¸ø f3['debt']ÁУ¬Pandas
µÄÒ»¸öÖØÒªÌØÐÔ¡ª¡ª×Ô¶¯¶ÔÆë¡ª¡ªÔÚÕâÀïÆð×öÓÃÁË£¬ÔÚ Series ÖУ¬Ö»ÓÐÁ½¸öË÷Òý£¨"a","c"£©£¬ËüÃǽ«ºÍ
DataFrame ÖеÄË÷Òý×Ô¶¯¶ÔÆë¡£ÓÚÊǺõ£º
>>>
f3
name price marks debt
a yahoo 9 200 2.2
b google 3 400 NaN
c facebook 7 800 3.3 |
×Ô¶¯¶ÔÆëÖ®ºó£¬Ã»Óб»¸´ÖƵÄÒÀÈ»±£³Ö NaN¡£
»¹¿ÉÒÔ¸ü¾«×¼µÄÐÞ¸ÄÊý¾ÝÂ𣿵±È»¿ÉÒÔ£¬ÍêÈ«·ÂÕÕ×ÖµäµÄ²Ù×÷£º
>>>
f3["price"]["c"]= 300
>>> f3
name price marks debt
a yahoo 9 200 2.2
b google 3 400 NaN
c facebook 300 800 3.3 |
ÕâЩ²Ù×÷ÊDz»ÊǶ¼²»Ä°Éúѽ£¬Õâ¾ÍÊÇ Pandas ÖеÄÁ½ÖÖÊý¾Ý¶ÔÏó¡£
|