Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ʹÓÃPandas&NumPy½øÐÐÊý¾ÝÇåÏ´µÄ6´ó³£Ó÷½·¨
 
  2816  次浏览      27
 2019-9-17 
 
±à¼­ÍƼö:
±¾ÎÄÀ´×ÔÓÚ΢ÐŹ«ÖںţºPythonÊý¾Ý¿ÆÑ§£¬ÎÄÕÂÖ÷Òª½²½âÁËÔõôɾ³ýDataFrameµÄÁУ¬¸Ä±äDataFrameµÄË÷Òý£¬Êý¾ÝÇåÏ´µÃ´Ó¼òµ¥µÃ×ֶε½ÇåÏ´Õû¸öÊý¾Ý¼¯µÈµÈ¡£

Êý¾Ý¿ÆÑ§¼Ò»¨ÁË´óÁ¿µÄʱ¼äÇåÏ´Êý¾Ý¼¯£¬²¢½«ÕâЩÊý¾Ýת»»ÎªËûÃÇ¿ÉÒÔ´¦ÀíµÄ¸ñʽ¡£ÊÂʵÉÏ£¬ºÜ¶àÊý¾Ý¿ÆÑ§¼ÒÉù³Æ¿ªÊ¼»ñÈ¡ºÍÇåÏ´Êý¾ÝµÄ¹¤×÷Á¿ÒªÕ¼Õû¸ö¹¤×÷µÄ80%¡£

Òò´Ë£¬Èç¹ûÄãÕýÇÉÒ²ÔÚÕâ¸öÁìÓòÖУ¬»òÕ߼ƻ®½øÈëÕâ¸öÁìÓò£¬ÄÇô´¦ÀíÕâЩÔÓÂÒ²»¹æÔòÊý¾ÝÊǷdz£ÖØÒªµÄ£¬ÕâЩÔÓÂÒÊý¾Ý°üÀ¨Ò»Ð©È±Ê§Öµ£¬²»Á¬Ðø¸ñʽ£¬´íÎó¼Ç¼£¬»òÕßÊÇûÓÐÒâÒåµÄÒì³£Öµ¡£

ÔÚÕâ¸ö½Ì³ÌÖУ¬ÎÒÃǽ«ÀûÓÃPythonµÄPandasºÍNumpy°üÀ´½øÐÐÊý¾ÝÇåÏ´¡£

Ö÷ÒªÄÚÈÝÈçÏ£º

ɾ³ý DataFrame ÖеIJ»±ØÒª columns

¸Ä±ä DataFrame µÄ index

ʹÓà .str() ·½·¨À´ÇåÏ´ columns

ʹÓà DataFrame.applymap() º¯Êý°´ÔªËصÄÇåÏ´Õû¸öÊý¾Ý¼¯

ÖØÃüÃû columns Ϊһ×é¸üÒ×ʶ±ðµÄ±êÇ©

Â˳ý CSVÎļþÖв»±ØÒªµÄ rows

ÏÂÃæÊÇÒªÓõ½µÄÊý¾Ý¼¯£º

BL-Flickr-Images-Book.csv - Ò»·ÝÀ´×ÔÓ¢¹úͼÊé¹Ý°üº¬¹ØÓÚÊé¼®ÐÅÏ¢µÄCSVÎĵµ

university_towns.txt - Ò»·Ý°üº¬ÃÀ¹ú¸÷´óÖÞ´óѧ³ÇÃû³ÆµÄtextÎĵµ

olympics.csv - Ò»·Ý×ܽáÁ˸÷¹ú¼Ò²Î¼ÓÏļ¾Ó붬¼¾°ÂÁÖÆ¥¿ËÔ˶¯»áÇé¿öµÄCSVÎĵµ

Äã¿ÉÒÔ´ÓReal Python µÄ GitHub repository ÏÂÔØÊý¾Ý¼¯À´½øÐÐÏÂÃæµÄÀý×Ó¡£

×¢Ò⣺½¨ÒéʹÓÃJupter NotebooksÀ´Ñ§Ï°ÏÂÃæµÄ֪ʶ¡£

ѧϰ֮ǰ¼ÙÉèÄãÒѾ­ÓÐÁ˶ÔPandasºÍNumpy¿âµÄ»ù±¾ÈÏʶ£¬°üÀ¨PandasµÄ¹¤×÷»ù´¡SeriesºÍDataFrame¶ÔÏó£¬Ó¦Óõ½ÕâЩ¶ÔÏóÉϵij£Ó÷½·¨£¬ÒÔ¼°ÊìϤÁËNumPyµÄNaNÖµ¡£

ÈÃÎÒÃǵ¼ÈëÕâЩģ¿é¿ªÊ¼ÎÒÃǵÄѧϰ¡£

>>> import pandas as pd
>>> import numpy as np

ɾ³ýDataFrameµÄÁÐ

¾­³£µÄ£¬Äã»á·¢ÏÖÊý¾Ý¼¯Öв»ÊÇËùÓеÄ×Ö¶ÎÀàÐͶ¼ÊÇÓÐÓõġ£ÀýÈ磬Äã¿ÉÄÜÓÐÒ»¸ö¹ØÓÚѧÉúÐÅÏ¢µÄÊý¾Ý¼¯£¬°üº¬ÐÕÃû£¬·ÖÊý£¬±ê×¼£¬¸¸Ä¸ÐÕÃû£¬×¡Ö·µÈ¾ßÌåÐÅÏ¢£¬µ«ÊÇÄãÖ»Ïë·ÖÎöѧÉúµÄ·ÖÊý¡£

Õâ¸öÇé¿öÏ£¬×¡Ö·»òÕ߸¸Ä¸ÐÕÃûÐÅÏ¢¶ÔÄãÀ´Ëµ¾Í²»ÊǺÜÖØÒª¡£ÕâЩûÓÐÓõÄÐÅÏ¢»áÕ¼Óò»±ØÒªµÄ¿Õ¼ä£¬²¢»áʹÔËÐÐʱ¼ä¼õÂý¡£

PandasÌṩÁËÒ»¸ö·Ç³£±ã½ÝµÄ·½·¨drop()º¯ÊýÀ´ÒƳýÒ»¸öDataFrameÖв»ÏëÒªµÄÐлòÁС£ÈÃÎÒÃÇ¿´Ò»¸ö¼òµ¥µÄÀý×ÓÈçºÎ´ÓDataFrameÖÐÒÆ³ýÁС£

Ê×ÏÈ£¬ÎÒÃÇÒýÈëBL-Flickr-Images-Book.csvÎļþ£¬²¢´´½¨Ò»¸ö´ËÎļþµÄDataFrame¡£ÔÚÏÂÃæÕâ¸öÀý×ÓÖУ¬ÎÒÃÇÉèÖÃÁËÒ»¸öpd.read_csvµÄÏà¶Ô·¾¶£¬Òâζ×ÅËùÓеÄÊý¾Ý¼¯¶¼ÔÚDatasetsÎļþ¼Ðϵĵ±Ç°¹¤×÷Ŀ¼ÖУº

ÎÒÃÇʹÓÃÁËhead()·½·¨µÃµ½ÁËǰÎå¸öÐÐÐÅÏ¢£¬ÕâЩÁÐÌṩÁ˶ÔͼÊé¹ÝÓаïÖúµÄ¸¨ÖúÐÅÏ¢£¬µ«ÊDz¢²»ÄܺܺõÄÃèÊöÕâЩÊé¼®£ºEdition Statement, Corporate Author, Corporate Contributors, Former owner, Engraver, Issuance type and Shelfmarks¡£

Òò´Ë£¬ÎÒÃÇ¿ÉÒÔÓÃÏÂÃæµÄ·½·¨ÒƳýÕâЩÁУº

>>> to_drop = ['Edition Statement',
... 'Corporate Author',
... 'Corporate Contributors',
... 'Former owner',
... 'Engraver',
... 'Contributors',
... 'Issuance type',
... 'Shelfmarks']

>>> df.drop(to_drop, inplace=True, axis=1)

 

ÔÚÉÏÃæ£¬ÎÒÃǶ¨ÒåÁËÒ»¸ö°üº¬ÎÒÃDz»ÒªµÄÁеÄÃû³ÆÁÐ±í¡£½Ó×Å£¬ÎÒÃÇÔÚ¶ÔÏóÉϵ÷ÓÃdrop()º¯Êý£¬ÆäÖÐinplace²ÎÊýÊÇTrue£¬axis²ÎÊýÊÇ1¡£Õâ¸æËßÁËPandasÎÒÃÇÏëÒªÖ±½ÓÔÚÎÒÃǵĶÔÏóÉÏ·¢Éú¸Ä±ä£¬²¢ÇÒËüÓ¦¸Ã¿ÉÒÔѰÕÒ¶ÔÏóÖб»ÒƳýÁеÄÐÅÏ¢¡£

ÎÒÃÇÔٴο´Ò»ÏÂDataFrame£¬ÎÒÃǻῴµ½²»ÒªÏëµÄÐÅÏ¢ÒѾ­±»ÒƳýÁË¡£

ͬÑùµÄ£¬ÎÒÃÇÒ²¿ÉÒÔͨ¹ý¸øcolumns²ÎÊý¸³ÖµÖ±½ÓÒÆ³ýÁУ¬¶ø¾Í²»Ó÷ֱð¶¨Òåto_dropÁбíºÍaxisÁË¡£

>>> df.drop(columns=to_drop, inplace=True)

ÕâÖÖÓï·¨¸üÖ±¹Û¸ü¿É¶Á¡£ÎÒÃÇÕâÀォҪ×öʲô¾ÍºÜÃ÷ÏÔÁË¡£

¸Ä±äDataFrameµÄË÷Òý

PandasË÷ÒýindexÀ©Õ¹ÁËNumpyÊý×éµÄ¹¦ÄÜ£¬ÒÔÔÊÐí¸ü¶à¶àÑù»¯µÄÇзֺͱê¼Ç¡£ÔںܶàÇé¿öÏ£¬Ê¹ÓÃΨһµÄÖµ×÷ΪË÷Òýֵʶ±ðÊý¾Ý×Ö¶ÎÊǷdz£ÓаïÖúµÄ¡£

ÀýÈ磬ÈÔȻʹÓÃÉÏÒ»½ÚµÄÊý¾Ý¼¯£¬¿ÉÒÔÏëÏóµ±Ò»¸öͼÊé¹ÜÀíԱѰÕÒÒ»¸ö¼Ç¼£¬ËûÃÇÒ²Ðí»áÊäÈëÒ»¸öΨһ±êʶÀ´¶¨Î»Ò»±¾Êé¡£

>>> df['Identifier'].is_unique
True

ÈÃÎÒÃÇÓÃset_index°ÑÒѾ­´æÔÚµÄË÷Òý¸ÄΪÕâ¸öÁС£

¼¼Êõϸ½Ú£º²»ÏñÔÚSQLÖеÄÖ÷¼üÒ»Ñù£¬pandasµÄË÷Òý²»±£Ö¤Î¨Ò»ÐÔ£¬¾¡¹ÜÐí¶àË÷ÒýºÍºÏ²¢²Ù×÷½«»áʹÔËÐÐʱ¼ä±ä³¤Èç¹ûÊÇÕâÑù¡£

ÎÒÃÇ¿ÉÒÔÓÃÒ»¸öÖ±½ÓµÄ·½·¨loc[]À´»ñȡÿһÌõ¼Ç¼¡£¾¡¹Üloc[]Õâ¸ö´Ê¿ÉÄÜ¿´ÉÏȥûÓÐÄÇôֱ¹Û£¬µ«ËüÔÊÐíÎÒÃÇʹÓûùÓÚ±êÇ©µÄË÷Òý£¬Õâ¸öË÷ÒýÊÇÐеıêÇ©»òÕß²»¿¼ÂÇλÖõļǼ¡£

»»¾ä»°Ëµ£¬206ÊÇË÷ÒýµÄµÚÒ»¸ö±êÇ©¡£Èç¹ûÏëͨ¹ýλÖûñÈ¡Ëü£¬ÎÒÃÇ¿ÉÒÔʹÓÃdf.iloc[0]£¬ÊÇÒ»¸ö»ùÓÚλÖõÄË÷Òý¡£

֮ǰ£¬ÎÒÃǵÄË÷ÒýÊÇÒ»¸ö·¶Î§Ë÷Òý£º´Ó0¿ªÊ¼µÄÕûÊý£¬ÀàËÆPythonµÄÄÚ½¨range¡£Í¨¹ý¸øset_indexÒ»¸öÁÐÃû£¬ÎÒÃǾͰÑË÷Òý±ä³ÉÁËIdentifierÖеÄÖµ¡£

ÄãÒ²Ðí×¢Òâµ½ÁËÎÒÃÇͨ¹ýdf = df.set_index(...)µÄ·µ»Ø±äÁ¿ÖØÐ¸ø¶ÔÏó¸³ÁËÖµ¡£ÕâÊÇÒòΪ£¬Ä¬ÈϵÄÇé¿öÏ£¬Õâ¸ö·½·¨·µ»ØÒ»¸ö±»¸Ä±ä¶ÔÏóµÄ¿½±´£¬²¢ÇÒËü²»»áÖ±½Ó¶ÔÔ­¶ÔÏó×öÈκθı䡣ÎÒÃÇ¿ÉÒÔͨ¹ýÉèÖòÎÊýinplaceÀ´±ÜÃâÕâ¸öÎÊÌâ¡£

df.set_index('Identifier', inplace=True)

ÇåÏ´Êý¾Ý×Ö¶Î

µ½ÏÖÔÚΪֹ£¬ÎÒÃÇÒÆ³ýÁ˲»±ØÒªµÄÁв¢¸Ä±äÁËÎÒÃǵÄË÷Òý±äµÃ¸üÓÐÒâÒå¡£Õâ¸ö²¿·Ö£¬ÎÒÃǽ«ÇåÏ´ÌØÊâµÄÁУ¬²¢Ê¹ËüÃDZä³ÉͳһµÄ¸ñʽ£¬ÕâÑù¿ÉÒÔ¸üºÃµÄÀí½âÊý¾Ý¼¯ºÍ¼ÓÇ¿Á¬ÐøÐÔ¡£ÌرðµÄ£¬ÎÒÃǽ«ÇåÏ´Date of PublicationºÍPlace of Publication¡£

¸ù¾ÝÉÏÃæ¹Û²ì£¬ËùÓеÄÊý¾ÝÀàÐͶ¼ÊÇÏÖÔÚµÄobjectdtypeÀàÐÍ£¬²î²»¶àÀàËÆÓÚPythonÖеÄstr¡£

Ëü°üº¬ÁËһЩ²»Äܱ»ÊÊÓÃÓÚÊýÖµ»òÊÇ·ÖÀàµÄÊý¾Ý¡£ÕâÒ²Õý³££¬ÒòΪÎÒÃÇÕýÔÚ´¦ÀíÕâЩ³õʼֵ¾ÍÊÇÔÓÂÒÎÞÕÂ×Ö·û´®µÄÊý¾Ý¡£

>>> df.get_dtype_counts()
object 6

 

Ò»¸öÐèÒª±»¸Ä±äΪÊýÖµµÄµÄ×Ö¶ÎÊÇthe date of publicationËùÒÔÎÒÃÇ×öÈçϲÙ×÷£º

>>> df.loc[1905:, 'Date of Publication'].head(10)
Identifier
1905   1888
1929  1839, 38-54
2836  [1897?]
2854  1865
2956  1860-63
2957  1873
3017   1866
3131  1899
4598  1814
4884  1820
Name: Date of Publication, dtype: object

 

 

Ò»±¾ÊéÖ»ÄÜÓÐÒ»¸ö³ö°æÈÕÆÚdata of publication¡£Òò´Ë£¬ÎÒÃÇÐèÒª×öÒÔϵÄһЩÊÂÇ飺

ÒÆ³ýÔÚ·½À¨ºÅÄڵĶîÍâÈÕÆÚ£¬ÈκδæÔڵģº1879[1878]¡£

½«ÈÕÆÚ·¶Î§×ª»¯ÎªËüÃÇµÄÆðʼÈÕÆÚ£¬ÈκδæÔڵģº1860-63;1839,38-54¡£

ÍêÈ«ÒÆ³ýÎÒÃDz»¹ØÐĵÄÈÕÆÚ£¬²¢ÓÃNumpyµÄNaNÌæ»»£º[1879?]¡£

½«×Ö·û´®nanת»¯ÎªNumpyµÄNaNÖµ¡£

¿¼ÂÇÕâЩģʽ£¬ÎÒÃÇ¿ÉÒÔÓÃÒ»¸ö¼òµ¥µÄÕýÔò±í´ïʽÀ´ÌáÈ¡³ö°æÈÕÆÚ£º

regex = r'^(\d{4})'

ÉÏÃæÕýÔò±í´ïʽµÄÒâ˼ÔÚ×Ö·û´®¿ªÍ·Ñ°ÕÒÈκÎËÄλÊý×Ö£¬·ûºÏÎÒÃǵÄÇé¿ö¡£

\d´ú±íÈκÎÊý×Ö£¬{4}ÖØ¸´Õâ¸ö¹æÔòËĴΡ£^·ûºÅÆ¥ÅäÒ»¸ö×Ö·û´®×ʼµÄ²¿·Ö£¬Ô²À¨ºÅ±íʾһ¸ö·Ö×飬ÌáʾpandasÎÒÃÇÏëÒªÌáÈ¡ÕýÔò±í´ïʽµÄ²¿·Ö¡£

ÈÃÎÒÃÇ¿´¿´ÔËÐÐÕâ¸öÕýÔòÔÚÊý¾Ý¼¯ÉÏÖ®ºó»á·¢Éúʲô¡£

>>> extr = df['Date of Publication'].str.extract(r'^(\d{4})', expand=False)
>>> extr.head()
Identifier
206 1879
216 1868
218 1869
472 1851
480 1857
Name: Date of Publication, dtype: object

ÆäʵÕâ¸öÁÐÈÔÈ»ÊÇÒ»¸öobjectÀàÐÍ£¬µ«ÊÇÎÒÃÇ¿ÉÒÔʹÓÃpd.to_numericÇáËɵĵõ½Êý×ֵİ汾£º

>>> df['Date of Publication'] = pd.to_numeric(extr)
>>> df['Date of Publication'].dtype
dtype('float64')

Õâ¸ö½á¹ûÖУ¬10¸öÖµÀï´óÔ¼ÓÐ1¸öֵȱʧ£¬ÕâÈÃÎÒÃǸ¶³öÁ˺ÜСµÄ´ú¼ÛÀ´¶ÔÊ£ÓàÓÐЧµÄÖµ×ö¼ÆËã¡£

>>> df['Date of Publication'].isnull().sum() / len(df)
0.11717147339205986

½áºÏstr·½·¨ÓëNumpyÇåÏ´ÁÐ

ÉÏÃæ£¬Äã¿ÉÒԹ۲쵽df['Date of Publication'].str. µÄʹÓá£Õâ¸öÊôÐÔÊÇpandasÀïµÄÒ»ÖÖÌáÉý×Ö·û´®²Ù×÷Ëٶȵķ½·¨£¬²¢ÓдóÁ¿µÄPython×Ö·û´®»ò±àÒëµÄÕýÔò±í´ïʽÉϵÄС²Ù×÷£¬ÀýÈç.split(),.replace(),ºÍ.capitalize()¡£

ΪÁËÇåÏ´Place of Publication×ֶΣ¬ÎÒÃÇ¿ÉÒÔ½áºÏpandasµÄstr·½·¨ºÍnumpyµÄnp.whereº¯ÊýÅäºÏÍê³É¡£

ËüµÄÓï·¨ÈçÏ£º

>>> np.where(condition, then, else)

ÕâÀcondition¿ÉÒÔʹһ¸öÀàÊý×éµÄ¶ÔÏó£¬Ò²¿ÉÒÔÊÇÒ»¸ö²¼¶û±í´ï¡£Èç¹ûconditionÖµÎªÕæ£¬ÄÇôthen½«±»Ê¹Ó㬷ñÔòʹÓÃelse¡£

ËüÒ²¿ÉÒÔ×éÍøÊ¹Óã¬ÔÊÐíÎÒÃÇ»ùÓÚ¶à¸öÌõ¼þ½øÐмÆËã¡£

>>> np.where(condition1, x1,
np.where(condition2, x2,
np.where(condition3, x3, ...)))

 

ÎÒÃǽ«Ê¹ÓÃÕâÁ½¸ö·½³ÌÀ´ÇåÏ´Place of PublicationÓÉÓÚÕâÁÐÓÐ×Ö·û´®¶ÔÏó¡£ÒÔÏÂÊÇÕâ¸öÁеÄÄÚÈÝ£º

ÎÒÃÇ¿´µ½£¬¶ÔÓÚһЩÐУ¬place of publication»¹±»Ò»Ð©ÆäËüûÓÐÓõÄÐÅÏ¢Î§ÈÆ×Å¡£Èç¹ûÎÒÃÇ¿´¸ü¶àµÄÖµ£¬ÎÒÃÇ·¢ÏÖÕâÖÖÇé¿öÖÐÓÐЩÐÐ

ÈÃÎÒÃÇ¿´¿´Á½¸öÌØÊâµÄ£º

ÕâÁ½±¾ÊéÔÚͬһ¸öµØ·½³ö°æ£¬µ«ÊÇÒ»¸öÓÐÁ¬×Ö·û£¬ÁíÒ»¸öûÓС£

ΪÁËÒ»´ÎÐÔÇåÏ´Õâ¸öÁУ¬ÎÒÃÇʹÓÃstr.contains()À´»ñȡһ¸ö²¼¶ûÖµ¡£

ÎÒÃÇÇåÏ´µÄÁÐÈçÏ£º

>>> pub = df['Place of Publication']
>>> london = pub.str.contains('London')
>>> london[:5]
Identifier
206 True
216 True
218 True
472 True
480 True
Name: Place of Publication, dtype: bool

>>> oxford = pub.str.contains('Oxford')

 

ÎÒÃǽ«ËüÓënp.where½áºÏ¡£

df['Place of Publication'] = np.where(london, 'London',
np.where(oxford, 'Oxford',
pub.str.replace('-', ' ')))

>>> df['Place of Publication'].head()
Identifier
206 London
216 London
218 London
472 London
480 London
Name: Place of Publication, dtype: object

ÕâÀnp.where·½³ÌÔÚÒ»¸öǶÌ׵ĽṹÖб»µ÷Óã¬conditionÊÇÒ»¸öͨ¹ýst.contains()µÃµ½µÄ²¼¶ûµÄSeries¡£contains()·½·¨ÓëPythonÄÚ½¨µÄin¹Ø¼ü×ÖÒ»Ñù£¬ÓÃÓÚ·¢ÏÖÒ»¸ö¸öÌåÊÇ·ñ·¢ÉúÔÚÒ»¸öµü´úÆ÷ÖС£

ʹÓõÄÌæ´úÎïÊÇÒ»¸ö´ú±íÎÒÃÇÆÚÍûµÄ³ö°æÉçµØÖ·×Ö·û´®¡£ÎÒÃÇҲʹÓÃstr.replace()½«Á¬×Ö·ûÌæ»»Îª¿Õ¸ñ£¬È»ºó¸øDataFrameÖеÄÁÐÖØÐ¸³Öµ¡£

¾¡¹ÜÊý¾Ý¼¯Öл¹Óиü¶àµÄ²»¸É¾»Êý¾Ý£¬µ«ÊÇÎÒÃÇÏÖÔÚ½öÌÖÂÛÕâÁ½ÁС£

ÈÃÎÒÃÇ¿´¿´Ç°ÎåÐУ¬ÏÖÔÚ¿´ÆðÀ´±ÈÎÒÃǸտªÊ¼µÄʱºòºÃµãÁË¡£

ÔÚÕâÒ»µãÉÏ£¬Place of Publication¾ÍÊÇÒ»¸öºÜºÃµÄÐèÒª±»×ª»»³É·ÖÀàÊý¾ÝµÄÀàÐÍ£¬ÒòΪÎÒÃÇ¿ÉÒÔÓÃÕûÊý½«ÕâÏ൱СµÄΨһ³ÇÊм¯±àÂë¡££¨·ÖÀàÊý¾ÝµÄʹÓÃÄÚ´æÓë·ÖÀàµÄÊýÁ¿ÒÔ¼°Êý¾ÝµÄ³¤¶È³ÉÕý±È£©

ʹÓÃapplymap·½·¨ÇåÏ´Õû¸öÊý¾Ý¼¯

ÔÚÒ»¶¨µÄÇé¿öÏ£¬Ä㽫¿´µ½²¢²»Êǽö½öÓÐÒ»ÌõÁв»¸É¾»£¬¶øÊǸü¶àµÄ¡£

ÔÚһЩʵÀýÖУ¬Ê¹ÓÃÒ»¸ö¶¨ÖƵĺ¯Êýµ½DataFrameµÄÿһ¸öÔªËØ½«»áÊǺÜÓаïÖúµÄ¡£pandasµÄapplyma()·½·¨ÓëÄÚ½¨µÄmap()º¯ÊýÏàËÆ£¬²¢ÇÒ¼òµ¥µÄÓ¦Óõ½Ò»¸öDataFrameÖеÄËùÓÐÔªËØÉÏ¡£

ÈÃÎÒÃÇ¿´Ò»¸öÀý×Ó¡£ÎÒÃǽ«»ùÓÚ"university_towns.txt"Îļþ´´½¨Ò»¸öDataFrame¡£

$ head Datasets/univerisity_towns.txt
Alabama[edit]
Auburn (Auburn University)[1]
Florence (University of North Alabama)
Jacksonville (Jacksonville State University)[2]
Livingston (University of West Alabama)[2]
Montevallo (University of Montevallo)[2]
Troy (Troy University)[2]
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3][4]
Tuskegee (Tuskegee University)[5]
Alaska[edit]

ÎÒÃÇ¿ÉÒÔ¿´µ½Ã¿¸östateºó±ß¶¼ÓÐһЩÔÚÄǸöstateµÄ´óѧ³Ç£ºStateA TownA1 TownA2 StateB TownB1 TownB2...¡£Èç¹ûÎÒÃÇ×Ðϸ¹Û²ìstateÃû×ÖµÄд·¨£¬ÎÒÃǻᷢÏÖËüÃǶ¼ÓÐ"[edit]"µÄ×Ô×Ö·û´®¡£

ÎÒÃÇ¿ÉÒÔÀûÓÃÕâ¸öÌØÕ÷´´½¨Ò»¸öº¬ÓÐ(state,city)Ôª×éµÄÁÐ±í£¬²¢½«Õâ¸öÁбíǶÈëµ½DdataFrameÖУ¬

>>> university_towns = []
>>> with open('Datasets/university_towns.txt') as file:
... for line in file:
... if '[edit]' in line:
... # Remember this `state` until the next is found
... state = line
... else:
... # Otherwise, we have a city; keep `state` as last-seen
... university_towns.append((state, line))

>>> university_towns[:5]
[('Alabama[edit]\n', 'Auburn (Auburn University)[1]\n'),
('Alabama[edit]\n', 'Florence (University of North Alabama)\n'),
('Alabama[edit]\n', 'Jacksonville (Jacksonville State University)[2]\n'),
('Alabama[edit]\n', 'Livingston (University of West Alabama)[2]\n'),
('Alabama[edit]\n', 'Montevallo (University of Montevallo)[2]\n')]

ÎÒÃÇ¿ÉÒÔÔÚDataFrameÖаü×°Õâ¸öÁÐ±í£¬²¢ÉèÁÐÃûΪ"State"ºÍ"RegionName"¡£pandas½«»áʹÓÃÁбíÖеÄÿ¸öÔªËØ£¬È»ºóÉèÖÃStateµ½×ó±ßµÄÁУ¬RegionNameµ½ÓұߵÄÁС£

×îÖÕµÄDataFrameÊÇÕâÑùµÄ£º

ÎÒÃÇ¿ÉÒÔÏñÉÏÃæÊ¹ÓÃfor loopÀ´½øÐÐÇåÏ´£¬µ«ÊÇpandasÌṩÁ˸ü¼òµ¥µÄ°ì·¨¡£ÎÒÃÇÖ»ÐèÒªstate nameºÍtown name£¬È»ºó¾Í¿ÉÒÔÒÆ³ýËùÒÔÆäËûµÄÁË¡£ÕâÀïÎÒÃÇ¿ÉÒÔÔÙ´ÎʹÓÃpandasµÄ.str()·½·¨£¬Í¬Ê±ÎÒÃÇÒ²¿ÉÒÔʹÓÃapplymap()½«Ò»¸öpython callableÓ³Éäµ½DataFrameÖеÄÿ¸öÔªËØÉÏ¡£

ÎÒÃÇÒ»Ö±ÔÚʹÓÃ"ÔªËØ"Õâ¸öÉãÓÚ£¬µ«ÊÇÎÒÃǵ½µ×ÊÇʲôÒâË¼ÄØ£¿¿´¿´ÏÂÃæÕâ¸ö"toy"µÄDataFrame£º

ÔÚÕâ¸öÀý×ÓÖУ¬Ã¿¸öµ¥Ôª (¡®Mock¡¯, ¡®Dataset¡¯, ¡®Python¡¯, ¡®Pandas¡¯, etc.) ¶¼ÊÇÒ»¸öÔªËØ¡£Òò´Ë£¬applymap()½«·Ö±ðÓ¦ÓÃÒ»¸öº¯Êýµ½ÕâÐ©ÔªËØÉÏ¡£ÈÃÎÒÃǶ¨ÒåÕâ¸öº¯Êý¡£

pandasµÄapplymap()Ö»ÓÃÒ»¸ö²ÎÊý£¬¾ÍÊÇÒªÓ¦Óõ½Ã¿¸öÔªËØÉϵĺ¯Êý£¨callable£©¡£

>>> towns_df = towns_df.applymap(get_citystate)

Ê×ÏÈ£¬ÎÒÃǶ¨ÒåÒ»¸öº¯Êý£¬Ëü½«´ÓDataFrameÖлñȡÿһ¸öÔªËØ×÷Ϊ×Ô¼ºµÄ²ÎÊý¡£ÔÚÕâ¸öº¯ÊýÖУ¬¼ìÑéÔªËØÖÐÊÇ·ñÓÐÒ»¸ö(»òÕß[¡£

»ùÓÚÉÏÃæµÄ¼ì²é£¬º¯Êý·µ»ØÏàÓ¦µÄÖµ¡£×îºó£¬applymap()º¯Êý±»ÓÃÔÚÎÒÃǵĶÔÏóÉÏ¡£ÏÖÔÚDataFrame¾Í¿´ÆðÀ´¸ü¸É¾²ÁË¡£

applymap()·½·¨´ÓDataFrameÖÐÌáȡÿ¸öÔªËØ£¬´«µÝµ½º¯ÊýÖУ¬È»ºó¸²¸ÇÔ­À´µÄÖµ¡£¾ÍÊÇÕâô¼òµ¥£¡

¼¼Êõϸ½Ú£ºËäÈ».applymapÊÇÒ»¸ö·½±ãºÍÁé»îµÄ·½·¨£¬µ«ÊǶÔÓÚ´óµÄÊý¾Ý¼¯Ëü½«»á»¨·ÑºÜ³¤Ê±¼äÔËÐУ¬ÒòΪËüÐèÒª½«python callableÓ¦Óõ½Ã¿¸öÔªËØÉÏ¡£Ò»Ð©Çé¿öÖУ¬Ê¹ÓÃCython»òÕßNumPYµÄÏòÁ¿»¯µÄ²Ù×÷»á¸ü¸ßЧ¡£

ÖØÃüÃûÁкÍÒÆ³ýÐÐ

¾­³£µÄ£¬Äã´¦ÀíµÄÊý¾Ý¼¯»áÓÐÈÃÄ㲻̫ÈÝÒ×Àí½âµÄÁÐÃû£¬»òÕßÔÚÍ·¼¸Ðлò×îºó¼¸ÐÐÓÐһЩ²»ÖØÒªµÄÐÅÏ¢£¬ÀýÈçÊõÓﶨÒ壬»òÊǸ½×¢¡£

ÕâÖÖÇé¿öÏ£¬ÎÒÃÇÏëÖØÐÂÃüÃûÁкÍÒÆ³ýÒ»¶¨µÄÐÐÒÔÈÃÎÒÃÇÖ»ÁôÏÂÕýÈ·ºÍÓÐÒâÒåµÄÐÅÏ¢¡£

ΪÁËÖ¤Ã÷ÎÒÃÇÈçºÎ´¦ÀíËü£¬ÎÒÃÇÏÈ¿´Ò»ÏÂ"olympics.csv"Êý¾Ý¼¯µÄÍ·5ÐУº

$ head -n 5 Datasets/olympics.csv
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
,? Summer,01 !,02 !,03 !,Total,? Winter,01 !,02 !,03 !,Total,? Games,01 !,02 !,03 !,Combined total
Afghanistan (AFG),13,0,0,2,2,0,0,0,0,0,13,0,0,2,2
Algeria (ALG),12,5,2,8,15,3,0,0,0,0,15,5,2,8,15
Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70

ÏÖÔÚÎÒÃǽ«Ëü¶ÁÈëpandasµÄDataFrame¡£

ÕâµÄÈ·ÓеãÂÒ£¡ÁÐÃûÊÇÒÔÕûÊýµÄ×Ö·û´®ÐÎʽË÷ÒýµÄ£¬ÒÔ0¿ªÊ¼¡£±¾Ó¦¸ÃÊÇÁÐÃûµÄÐÐÈ´´¦ÔÚolympics_df.iloc[0]¡£·¢ÉúÕâ¸öÊÇÒòΪCSVÎļþÒÔ0, 1, 2, ¡­, 15ÆðʼµÄ¡£

ͬÑù£¬Èç¹ûÎÒÃÇÈ¥Êý¾Ý¼¯µÄÔ´Îļþ¹Û²ì£¬ÉÏÃæµÄNaNÕæµÄÓ¦¸ÃÊÇÏñ"Country"ÕâÑùµÄ£¬? SummerÓ¦¸Ã´ú±í"Summer Games", ¶ø01 !Ó¦¸ÃÊÇ"Gold"Ö®ÀàµÄ¡£

Òò´Ë£¬ÎÒÃÇÐèÒª×öÁ½¼þÊ£º

ÒÆ³ýµÚÒ»Ðв¢ÉèÖÃheaderΪµÚÒ»ÐÐ

ÖØÐÂÃüÃûÁÐ

µ±ÎÒÃǶÁCSVÎļþµÄʱºò£¬¿ÉÒÔͨ¹ý´«µÝһЩ²ÎÊýµ½read_csvº¯ÊýÀ´ÒƳýÐкÍÉèÖÃÁÐÃû³Æ¡£

Õâ¸öº¯ÊýÓкܶà¿ÉÑ¡èñÊ÷£¬µ«ÊÇÕâÀïÎÒÃÇÖ»ÐèÒªheader

À´ÒƳýµÚ0ÐУº

ÎÒÃÇÏÖÔÚÓÐÁËÉèÖÃΪheaderµÄÕýÈ·ÐУ¬²¢ÇÒËùÓÐûÓõÄÐж¼±»ÒƳýÁË¡£¼Ç¼һÏÂpandasÊÇÈçºÎ½«°üº¬¹ú¼ÒµÄÁÐÃûNaN¸Ä±äΪUnnamed:0µÄ¡£

ΪÁËÖØÃüÃûÁУ¬ÎÒÃǽ«Ê¹ÓÃDataFrameµÄrename()·½·¨£¬ÔÊÐíÄãÒÔÒ»¸öÓ³É䣨ÕâÀïÊÇÒ»¸ö×ֵ䣩֨бê¼ÇÒ»¸öÖá¡£

ÈÃÎÒÃÇ¿ªÊ¼¶¨ÒåÒ»¸ö×ÖµäÀ´½«ÏÖÔÚµÄÁÐÃû³Æ£¨¼ü£©Ó³Éäµ½¸ü¶àµÄ¿ÉÓÃÁÐÃû³Æ£¨×ÖµäµÄÖµ£©¡£

ÎÒÃÇÔÚ¶ÔÏóÉϵ÷ÓÃrename()º¯Êý£º

>>> olympics_df.rename (columns=new_names, inplace=True)

ÉèÖÃinplaceΪTrue¿ÉÒÔÈÃÎÒÃǵĸıäÖ±½Ó·´Ó³ÔÚ¶ÔÏóÉÏ¡£ÈÃÎÒÃÇ¿´¿´ÊÇ·ñÕýÈ·£º

 

PythonÊý¾ÝÇåÏ´£º»Ø¹Ë

Õâ¸ö½Ì³ÌÖУ¬Äãѧ»áÁË´ÓÊý¾Ý¼¯ÖÐÈçºÎʹÓÃdrop()º¯ÊýÈ¥³ý²»±ØÒªµÄÐÅÏ¢£¬Ò²Ñ§»áÁËÈçºÎΪÊý¾Ý¼¯ÉèÖÃË÷Òý£¬ÒÔÈÃitems¿ÉÒÔ±»ÈÝÒ×µÄÕÒµ½¡£

¸ü¶àµÄ£¬Äãѧ»áÁËÈçºÎʹÓÃ.str()ÇåÏ´¶ÔÏó×ֶΣ¬ÒÔ¼°ÈçºÎʹÓÃapplymap¶ÔÕû¸öÊý¾Ý¼¯ÇåÏ´¡£×îºó£¬ÎÒÃÇ̽Ë÷ÁËÈçºÎÒÆ³ýCSVÎļþµÄÐУ¬²¢ÇÒʹÓÃrename()·½·¨ÖØÃüÃûÁС£

ÕÆÎÕÊý¾ÝÇåÏ´·Ç³£ÖØÒª£¬ÒòΪËüÊÇÊý¾Ý¿ÆÑ§µÄÒ»¸ö´óµÄ²¿·Ö¡£ÄãÏÖÔÚÓ¦¸ÃÓÐÁËÒ»¸öÈçºÎʹÓÃpandasºÍnumpy½øÐÐÊý¾ÝÇåÏ´µÄ»ù±¾Àí½âÁË¡£

 
   
2816 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖÓë̽ÌÖ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
Ïà¹ØÎĵµ

Android_UI¹Ù·½Éè¼Æ½Ì³Ì
ÊÖ»ú¿ª·¢Æ½Ì¨½éÉÜ
androidÅÄÕÕ¼°ÉÏ´«¹¦ÄÜ
Android½²ÒåÖÇÄÜÊÖ»ú¿ª·¢
Ïà¹Ø¿Î³Ì

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
Androidϵͳ¿ª·¢
AndroidÓ¦Óÿª·¢
ÊÖ»úÈí¼þ²âÊÔ