Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÀûÓÃPython Pandas½øÐÐÊý¾ÝÔ¤´¦Àí-Êý¾ÝÇåÏ´
 
  3143  次浏览      27
 2019-9-23
 
±à¼­ÍƼö:
±¾ÎÄÀ´×ÔÓÚcsdn,±¾ÎÄÖ÷Òª½²½âÈçºÎÀûÓÃÊý¾Ýȱʧ¡¢¼ì²âºÍ¹ýÂËÒì³£Öµ¡¢ÒƳýÖØ¸´Êý¾Ý¶ÔPython Pandas½øÐÐÊý¾ÝÔ¤´¦Àí,Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£

Êý¾Ýȱʧ

Êý¾ÝȱʧÔڴ󲿷ÖÊý¾Ý·ÖÎöÓ¦ÓÃÖж¼ºÜ³£¼û£¬PandasʹÓø¡µãÖµNaN±íʾ¸¡µãºÍ·Ç¸¡µãÊý×éÖеÄȱʧÊý¾Ý£¬ËûÖ»ÊÇÒ»¸ö±ãÓÚ±»¼ì²â³öÀ´µÄÊý¾Ý¶øÒÑ¡£

from pandas import Series,DataFrame
string_data=Series(['abcd','efgh','ijkl','mnop'])
print(string_data)
print("...........\n")
print(string_data.isnull())

PythonÄÚÖõÄNoneÖµÒ²»á±»µ±×÷NA´¦Àí

from pandas import Series,DataFrame
string_data=Series(['abcd','efgh','ijkl','mnop'])
print(string_data)
print("...........\n")
string_data[0]=None
print(string_data.isnull())

´¦ÀíNAµÄ·½·¨ÓÐËÄÖÖ£ºdropna,fillna,isnull,notnull

is(not)null£¬ÕâÒ»¶Ô·½·¨¶Ô¶ÔÏó×ö³öÔªËØ¼¶µÄÓ¦Óã¬È»ºó·µ»ØÒ»¸ö²¼¶ûÐÍÊý×飬һ°ã¿ÉÓÃÓÚ²¼¶ûÐÍË÷Òý¡£

dropna£¬¶ÔÓÚÒ»¸öSeries£¬dropna·µ»ØÒ»¸ö½öº¬·Ç¿ÕÊý¾ÝºÍË÷ÒýÖµµÄSeries¡£

ÎÊÌâÔÚÓÚDataFrameµÄ´¦Àí·½Ê½£¬ÒòΪһµ©dropµÄ»°£¬ÖÁÉÙÒª¶ªµôÒ»ÐУ¨ÁУ©¡£ÕâÀï½â¾ö·½·¨ÓëÇ°ÃæÀàËÆ£¬»¹ÊÇͨ¹ýÒ»¸ö¶îÍâµÄ²ÎÊý£ºdropna(axis=0,how=¡¯any¡¯,thresh=None)£¬how²ÎÊý¿ÉÑ¡µÄֵΪany»òÕßall.all½öÔÚÇÐÆ¬ÔªËØÈ«ÎªNAʱ²ÅÅׯú¸ÃÐÐ(ÁÐ)¡£threshΪÕûÊýÀàÐÍ£¬eg:thresh=3,ÄÇôһÐе±ÖÐÖÁÉÙÓÐÈý¸öNAֵʱ²Å½«Æä±£Áô¡£

fillna,fillna(value=None,method=None,axis=0)ÖеÄvalue³ýÁË»ù±¾ÀàÐÍÍ⣬»¹¿ÉÒÔʹÓÃ×ֵ䣬ÕâÑù¿ÉÒÔʵÏÖ¶Ô²»Í¬ÁÐÌî³ä²»Í¬µÄÖµ¡£

¹ýÂËÊý¾Ý£º

¶ÔÓÚÒ»¸öSeries£¬dropna·µ»ØÒ»¸ö½öº¬·Ç¿ÕÊý¾ÝºÍË÷ÒýÖµµÄSeries£º

from pandas import Series,DataFrame
from numpy import nan as NA
data=Series([1,NA,3.5,NA,7])
print(data.dropna())

ÁíÒ»¸ö¹ýÂËDataFrameÐеÄÎÊÌâÉæ¼°ÎÊÌâÐòÁÐÊý¾Ý¡£¼ÙÉèÖ»ÏëÁôÒ»²¿·Ö¹Û²ìÊý¾Ý£¬¿ÉÒÔÓÃthresh²ÎÊýʵÏÖ´ËÄ¿µÄ£º

from pandas import Series,DataFrame, np
from numpy import nan as NA
data=DataFrame(np.random.randn(7,3))
data.ix[:4,1]=NA
data.ix[:2,2]=NA
print(data)
print("...........")
print(data.dropna(thresh=2))

²»ÏëÂ˳ýȱʧµÄÊý¾Ý£¬¶øÊÇͨ¹ýÆäËû·½Ê½Ìî²¹¡°¿Õ¶´¡±£¬fillnaÊÇ×îÖ÷ÒªµÄº¯Êý¡£

ͨ¹ýÒ»¸ö³£Êýµ÷ÓÃfillna¾Í»á½«È±Ê§ÖµÌ滻ΪÄǸö³£ÊýÖµ£º

from pandas import Series,DataFrame, np
from numpy import nan as NA
data=DataFrame(np.random.randn(7,3))
data.ix[:4,1]=NA
data.ix[:2,2]=NA
print(data)
print("...........")
print(data.fillna(0))

ÈôÊÇͨ¹ýÒ»¸ö×Öµäµ÷ÓÃfillna£¬¾Í¿ÉÒÔʵÏÖ¶Ô²»Í¬ÁÐÌî³ä²»Í¬µÄÖµ¡£

from pandas import Series,DataFrame, np
from numpy import nan as NA
data=DataFrame(np.random.randn(7,3))
data.ix[:4,1]=NA
data.ix[:2,2]=NA
print(data)
print("...........")
print(data.fillna({1:111,2:222}))

¿ÉÒÔÀûÓÃfillnaʵÏÖÐí¶à±ðµÄ¹¦ÄÜ£¬±ÈÈç¿ÉÒÔ´«ÈëSeriesµÄƽ¾ùÖµ»òÖÐλÊý£º

from pandas import Series,DataFrame, np
from numpy import nan as NA
data=Series([1.0,NA,3.5,NA,7])
print(data)
print("...........\n")
print(data.fillna(data.mean()))

¼ì²âºÍ¹ýÂËÒì³£Öµ

Òì³£Öµ(outlier)µÄ¹ýÂË»ò±ä»»ÔËËãÔںܴó³Ì¶ÈÉϾÍÊÇÊý×éÔËËã¡£ÈçÏÂÒ»¸ö(1000,4)µÄ±ê×¼Õý̬·Ö²¼Êý×飺

from pandas import Series,DataFrame, np
from numpy import nan as NA
data=DataFrame(np.random.randn(1000,4))
print(data.describe())
print("\n....ÕÒ³öijһÁÐÖоø¶ÔÖµ´óС³¬¹ý3µÄÏî...\n")
col=data[3]
print(col[np.abs(col) > 3] )
print("\n....ÕÒ³öÈ«²¿¾ø¶ÔÖµ³¬¹ý3µÄÖµµÄÐÐ...\n")
print(col[(np.abs(data) > 3).any(1)] )

ÒÆ³ýÖØ¸´Êý¾Ý

DataFrameµÄduplicated·½·¨·µ»ØÒ»¸ö²¼¶ûÐÍSeries£¬±íʾ¸÷ÐÐÊÇ·ñÊÇÖØ¸´ÐС£

from pandas import Series,DataFrame, np
from numpy import nan as NA
import pandas as pd
import numpy as np
data=pd.DataFrame({'k1':['one']*3+['two']*4, 'k2':[1,1,2,2,3,3,4]})
print(data)
print("........\n")
print(data.duplicated())

Óë´ËÏà¹ØµÄ»¹ÓÐÒ»¸ödrop_duplicated·½·¨£¬ËüÓÃÓÚ·µ»ØÒ»¸öÒÆ³ýÁËÖØ¸´ÐеÄDataFrame£º

from pandas import Series,DataFrame, np
from numpy import nan as NA
import pandas as pd
import numpy as np
data=pd.DataFrame({'k1':['one']*3+['two']*4, 'k2':[1,1,2,2,3,3,4]})
print(data)
print("........\n")
print(data.drop_duplicates())

ÉÏÃæµÄÁ½¸ö·½·¨»áĬÈÏÅжÏÈ«²¿ÁУ¬Ò²¿ÉÒÔÖ¸¶¨²¿·ÖÁнøÐÐÖØ¸´ÏîÅжϣ¬¼ÙÉ軹ÓÐÒ»ÁÐÖµ£¬¶øÖ»Ï£Íû¸ù¾Ýk1ÁйýÂËÖØ¸´Ïî¡£

from pandas import Series,DataFrame, np
from numpy import nan as NA
import pandas as pd
import numpy as np
data=pd.DataFrame({'k1':['one']*3+['two']*4, 'k2':[1,1,2,2,3,3,4]})
data['v1']=range(7)
print(data)
print("........\n")
print(data.drop_duplicates(['k1']))

10duplicatesºÍdrop_duplicatesĬÈϱ£ÁôµÚÒ»¸ö³öÏÖµÄÖµ×éºÏ¡£´«Èëtake_last=TrueÔò±£Áô×îºóÒ»¸ö£º

from pandas import Series,DataFrame, np
from numpy import nan as NA
import pandas as pd
import numpy as np
data=pd.DataFrame({'k1':['one']*3+['two']*4, 'k2':[1,1,2,2,3,3,4]})
data['v1']=range(7)
print(data)
print("........\n")
print(data.drop_duplicates(['k1','k2'],take_last=True))

 
   
3143 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖÓë̽ÌÖ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
Ïà¹ØÎĵµ

Android_UI¹Ù·½Éè¼Æ½Ì³Ì
ÊÖ»ú¿ª·¢Æ½Ì¨½éÉÜ
androidÅÄÕÕ¼°ÉÏ´«¹¦ÄÜ
Android½²ÒåÖÇÄÜÊÖ»ú¿ª·¢
Ïà¹Ø¿Î³Ì

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
Androidϵͳ¿ª·¢
AndroidÓ¦Óÿª·¢
ÊÖ»úÈí¼þ²âÊÔ