Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
PythonÖ®Pandas֪ʶµã
 
  7159  次浏览      28
 2019-7-18
 
±à¼­ÍƼö:
±¾ÎÄÀ´×ÔÓÚcnblogs,ÎÄÕ½éÉÜÁËNumpy£¬Scipy£¬pandasµÄÇø±ð£¬PandasµÄÊý¾Ý¶ÁÈ¡ÒÔ¼°Êý¾Ýд³öµÄÏà¹ØÄÚÈÝ£¬Ï£Íû¶ÔÄúÄÜÓÐËù°ïÖú¡£

ºÜ¶àÈ˶¼·Ö²»ÇåNumpy£¬Scipy£¬pandasÈý¸ö¿âµÄÇø±ð¡£

ÔÚÕâÀï¼òµ¥·Ö±ðһϣº

NumPy£ºÊýѧ¼ÆËã¿â£¬ÒÔ¾ØÕóΪ»ù´¡µÄÊýѧ¼ÆËãÄ£¿é£¬°üÀ¨»ù±¾µÄËÄÔòÔËÐУ¬·½³ÌʽÒÔ¼°ÆäËû·½ÃæµÄ¼ÆËãʲôµÄ£¬´¿Êýѧ£»

SciPy £º¿ÆÑ§¼ÆËã¿â£¬ÓÐһЩ¸ß½×³éÏóºÍÎïÀíÄ£ÐÍ£¬ÔÚNumPy»ù´¡ÉÏ£¬·â×°ÁËÒ»²ã£¬Ã»ÓÐÄÇô´¿Êýѧ£¬Ìṩ·½·¨Ö±½Ó¼ÆËã½á¹û£»

±ÈÈ磺

×ö¸ö¸µÁ¢Ò¶±ä»»£¬ÕâÊÇ´¿ÊýѧµÄ£¬ÓÃNumpy£»

×ö¸öÂ˲¨Æ÷£¬ÕâÊôÓÚÐźŴ¦ÀíÄ£ÐÍÁË£¬ÓÃScipy¡£

Pandas£ºÌṩÃûΪDataFrameµÄÊý¾Ý½á¹¹£¬±È½ÏÆõºÏͳ¼Æ·ÖÎöÖеıí½á¹¹£¬×öÊý¾Ý·ÖÎöÓõģ¬Ö÷ÒªÊÇ×ö±í¸ñÊý¾Ý³ÊÏÖ¡£

ĿǰÀ´Ëµ£¬Ëæ×ÅPandas¸üУ¬Numpy´ó²¿·Ö¹¦ÄÜÒѾ­Ö±½ÓºÍPandasÈÚºÏÁË¡£

µ«Èç¹ûÄã²»ÊÇ´¿Êýѧרҵ£¬¶øÇÒÏë×öÊý¾Ý·ÖÎöµÄ»°£¬³¢ÊÔ×Å´Ó Pandas ÈëÊֱȽϺá£

½ÓÏÂÀ´½²Pandas¡£

1Êý¾Ý½á¹¹

Series£ºÒ»Î¬Êý×飬ÓëNumpyÖеÄһάarrayÀàËÆ¡£

Time- Series£ºÒÔʱ¼äΪË÷ÒýµÄSeries¡£

DataFrame£º¶þάµÄ±í¸ñÐÍÊý¾Ý½á¹¹¡£¿ÉÒÔ½«DataFrameÀí½âΪSeriesµÄÈÝÆ÷¡£

Panel £ºÈýάµÄÊý×飬¿ÉÒÔÀí½âΪDataFrameµÄÈÝÆ÷¡£

# µ¼Èë±ðÃû
import pandas as pd
pd.Series([1,2,3,4])

2Êý¾Ý¶ÁÈ¡

2.1 csvÎļþ¶ÁÈ¡

read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=False, compact_ints=False, use_unsigned=False, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)

filepath_or_buffer£ºÎļþ·¾¶£¬½¨ÒéʹÓÃÏà¶Ô·¾¶

header£º ĬÈÏ×Ô¶¯Ê¶±ðÊ×ÐÐΪÁÐÃû£¨ÌØÕ÷Ãû£©£¬ÔÚÊý¾ÝûÓÐÁÐÃûµÄÇé¿öÏ header = none, »¹¿ÉÒÔÉèÖÃΪÆäËûÐУ¬ÀýÈç header = 5 ±íʾË÷ÒýλÖÃΪ5µÄÐÐ×÷ΪÆðʼÁÐÃû

sep£º ±íʾcsvÎļþµÄ·Ö¸ô·û£¬Ä¬ÈÏΪ','

names£º ±íʾÉèÖõÄ×Ö¶ÎÃû£¬Ä¬ÈÏΪ'infer'

index_col£º±íʾ×÷ΪË÷ÒýµÄÁУ¬Ä¬ÈÏΪ0-ÐÐÊýµÄµÈ²îÊýÁС£

engine£º±íʾ½âÎöÒýÇæ£¬¿ÉÒÔΪ'c'»òÕß'python'

encoding£º±íʾÎļþµÄ±àÂ룬ĬÈÏΪ'utf-8'¡£

nrows£º±íʾ¶ÁÈ¡µÄÐÐÊý£¬Ä¬ÈÏΪȫ²¿¶ÁÈ¡

# ¶ÁÈ¡csv£¬²ÎÊý¿Éɾ
data = pd.read_csv('./data/test.csv',sep = ',',header = 'infer',names = range(5,18),index_col = [0,2],engine = 'python',encoding = 'gbk',nrows = 100)

# ¶ÁÈ¡csv£¬²ÎÊý¿Éɾ
data = pd.read_table('./data/test.csv',sep = ',',header = 'infer',names = range(5,18),index_col = [0,2],engine = 'python',encoding = 'gbk',nrows = 100)

2.2Excel Êý¾Ý¶ÁÈ¡

read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)

io£ºÎļþ·¾¶+È«³Æ£¬ÎÞĬÈÏ

sheetname£º¹¤×÷²¾µÄÃû×Ö£¬Ä¬ÈÏΪ0

header£º ĬÈÏ×Ô¶¯Ê¶±ðÊ×ÐÐΪÁÐÃû£¨ÌØÕ÷Ãû£©£¬ÔÚÊý¾ÝûÓÐÁÐÃûµÄÇé¿öÏ header = none, »¹¿ÉÒÔÉèÖÃΪÆäËûÐУ¬ÀýÈç header = 5 ±íʾË÷ÒýλÖÃΪ5µÄÐÐ×÷ΪÆðʼÁÐÃû

names£º ±íʾÉèÖõÄ×Ö¶ÎÃû£¬Ä¬ÈÏΪ'infer'

index_col£º±íʾ×÷ΪË÷ÒýµÄÁУ¬Ä¬ÈÏΪ0-ÐÐÊýµÄµÈ²îÊýÁÐ

engine£º±íʾ½âÎöÒýÇæ£¬¿ÉÒÔΪ'c'»òÕß'python'

data = pd.read_excel('./data/test.xls',sheetname='ԭʼÊý¾Ý',header = 0,index_col = [5,6])

 

2.3Êý¾Ý¿âÊý¾Ý¶ÁÈ¡

read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None)

sql£º±íʾ³éÈ¡Êý¾ÝµÄSQLÓï¾ä£¬ÀýÈç'select * from ±íÃû'

con£º±íʾÊý¾Ý¿âÁ¬½ÓµÄÃû³Æ

index_col£º±íʾ×÷ΪË÷ÒýµÄÁУ¬Ä¬ÈÏΪ0-ÐÐÊýµÄµÈ²îÊýÁÐ

read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize=None)

table_name£º±íʾ³éÈ¡Êý¾ÝµÄ±íÃû

con£º±íʾÊý¾Ý¿âÁ¬½ÓµÄÃû³Æ

index_col£º±íʾ×÷ΪË÷ÒýµÄÁУ¬Ä¬ÈÏΪ0-ÐÐÊýµÄµÈ²îÊýÁÐ

columns£ºÊý¾Ý¿âÊý¾Ý¶ÁÈ¡ºóµÄÁÐÃû¡£

read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

sql£º±íʾ³éÈ¡Êý¾ÝµÄ±íÃû»òÕß³éÈ¡Êý¾ÝµÄSQLÓï¾ä£¬ÀýÈç'select * from ±íÃû'

con£º±íʾÊý¾Ý¿âÁ¬½ÓµÄÃû³Æ

index_col£º±íʾ×÷ΪË÷ÒýµÄÁУ¬Ä¬ÈÏΪ0-ÐÐÊýµÄµÈ²îÊýÁÐ

columns£ºÊý¾Ý¿âÊý¾Ý¶ÁÈ¡ºóµÄÁÐÃû¡£

½¨Ò飺ÓÃǰÁ½¸ö

# ¶ÁÈ¡Êý¾Ý¿â
from sqlalchemy import create_engine
conn = create_engine('mysql+pymysql://root:root@127.0.0.1/test?charset=utf8', encoding='utf-8', echo=True)
# data1 = pd.read_sql_query('select * from data', con=conn)
# print(data1.head())

data2 = pd.read_sql_table('data', con=conn)
print(data2.tail())
print(data2['X'][1])

Êý¾Ý¿âÁ¬½Ó×Ö·û´®¸÷²ÎÊý˵Ã÷

'mysql+pymysql://root:root@127.0.0.1/test?charset=utf8'

Á¬½ÓÆ÷://Óû§Ãû:ÃÜÂë@Êý¾Ý¿âËùÔÚIP/·ÃÎʵÄÊý¾Ý¿âÃû³Æ?×Ö·û¼¯

3Êý¾Ýд³ö

3.1½«Êý¾Ýд³öΪcsv

DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, doublequote=True, escapechar=None, decimal='.')

path_or_buf£ºÊý¾Ý´æ´¢Â·¾¶£¬º¬ÎļþÈ«ÃûÀýÈç'./data.csv'

sep£º±íʾÊý¾Ý´æ´¢Ê±Ê¹Óõķָô·û

header£ºÊÇ·ñµ¼³öÁÐÃû£¬Trueµ¼³ö£¬False²»µ¼³ö

index£º ÊÇ·ñµ¼³öË÷Òý£¬Trueµ¼³ö£¬False²»µ¼³ö

mode£ºÊý¾Ýµ¼³öģʽ£¬'w'Ϊд

encoding£ºÊý¾Ýµ¼³öµÄ±àÂë

import pandas as pd
data.to_csv('data.csv',index = False)

3.2½«Êý¾Ýд³öΪexcel

DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)

excel_writer£ºÊý¾Ý´æ´¢Â·¾¶£¬º¬ÎļþÈ«ÃûÀýÈç'./data.xlsx'

sheet_name£º±íʾÊý¾Ý´æ´¢µÄ¹¤×÷²¾Ãû³Æ

header£ºÊÇ·ñµ¼³öÁÐÃû£¬Trueµ¼³ö£¬False²»µ¼³ö

index£º ÊÇ·ñµ¼³öË÷Òý£¬Trueµ¼³ö£¬False²»µ¼³ö

encoding£ºÊý¾Ýµ¼³öµÄ±àÂë

data.to_excel('data.xlsx',index=False)

3.3½«Êý¾ÝдÈëÊý¾Ý¿â

DataFrame.to_sql(name, con, flavor=None, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)

name£ºÊý¾Ý´æ´¢±íÃû

con£º±íʾÊý¾ÝÁ¬½Ó

if_exists£ºÅжÏÊÇ·ñÒѾ­´æÔÚ¸Ã±í£¬'fail'±íʾ´æÔھͱ¨´í£»'replace'±íʾ´æÔھ͸²¸Ç£»'append'±íʾÔÚβ²¿×·¼Ó

index£º ÊÇ·ñµ¼³öË÷Òý£¬Trueµ¼³ö£¬False²»µ¼³ö

from sqlalchemy import create_engine
conn =create_engine('mysql+pymysql: //root:root@127.0.0.1/data?charset=utf8', encoding='utf-8', echo=True)
data.to_sql('data',con = conn)

4Êý¾Ý´¦Àí

4.1Êý¾Ý²é¿´

# ²é¿´Ç°5ÐÐ,5ΪÊýÄ¿£¬²»ÊÇË÷Òý£¬Ä¬ÈÏΪ5
data.head()
# ²é¿´×îºó6ÐУ¬6ΪÊýÄ¿£¬²»ÊÇË÷Òý£¬Ä¬ÈÏΪ5
data.tail(6)
# ²é¿´Êý¾ÝµÄÐÎ×´
data.shape
# ²é¿´Êý¾ÝµÄÁÐÊý£¬0ΪÐÐ1λÁÐ
data.shape[1]
# ²é¿´ËùÓеÄÁÐÃû
data.columns
# ²é¿´Ë÷Òý
data.index
# ²é¿´Ã¿Ò»ÁÐÊý¾ÝµÄÀàÐÍ
data.dtypes
# ²é¿´Êý¾ÝµÄά¶È
data.ndim

## ²é¿´Êý¾Ý»ù±¾Çé¿ö
data.describe()
'''
count£º·Ç¿ÕÖµµÄÊýÄ¿
mean£ºÊýÖµÐÍÊý¾ÝµÄ¾ùÖµ
std£ºÊýÖµÐÍÊý¾ÝµÄ±ê×¼²î
min£ºÊýÖµÐÍÊý¾ÝµÄ×îСֵ
25%£ºÊýÖµÐÍÊý¾ÝµÄÏÂËÄ·ÖλÊý
50%£ºÊýÖµÐÍÊý¾ÝµÄÖÐλÊý
75%£ºÊýÖµÐÍÊý¾ÝµÄÉÏËÄ·ÖλÊý
max£ºÊýÖµÐÍÊý¾ÝµÄ×î´óÖµ
'''

4.2Êý¾ÝË÷Òý

# È¡³öµ¥¶ÀijһÁÐ
X = data['X']
# È¡³ö¶àÁÐ
XY = data[['X','Y']]
# È¡³öijÁеÄijһÐÐ
data['X'][1]
# È¡³öijÁеÄij¼¸ÐÐ
data['X'][:10]
# È¡³öij¼¸ÁеÄij¼¸ÐÐ
data[['X','Y']][:10]

# loc·½·¨Ë÷Òý
'''
DataFrame.loc[ÐÐÃû,ÁÐÃû]
'''
# È¡³öij¼¸ÁеÄijһÐÐ
data.loc[1,['X','Ô·Ý']]
# È¡³öij¼¸ÁеÄij¼¸ÐУ¨Á¬Ðø£©
data.loc[1:5,['X','Ô·Ý']]
# È¡³öij¼¸ÁеÄij¼¸ÐУ¨Á¬Ðø£©
data.loc[[1,3,5],['X','Ô·Ý']]
# È¡³ö x ,FFMC ,DCµÄ0-20ÐÐËùÓÐË÷ÒýÃû³ÆÎªÅ¼ÊýµÄÊý¾Ý
data.loc[range(0,21,2),['X','FFMC','DC']]

# iloc·½·¨Ë÷Òý
'''
DataFrame.iloc[ÐÐλÖÃ,ÁÐλÖÃ]
'''
# È¡³öij¼¸ÁеÄijһÐÐ
data.iloc[1,[1,4]]
# È¡³öÁÐλÖÃΪżÊý£¬ÐÐλÖÃΪ0-20µÄżÊýµÄÊý¾Ý
data.iloc[0:21:2,0:data.shape[1]:2]

# ix·½·¨Ë÷Òý
'''
DataFrame.ix[ÐÐλÖÃ/ÐÐÃû,ÁÐλÖÃ/ÁÐÃû]
'''
## È¡³öij¼¸ÁеÄijһÐÐ
data.ix[1:4,[1,4]]
data.ix[1:4,1:4]

loc,iloc,ixµÄÇø±ð

locʹÓÃÃû³ÆË÷Òý£¬±ÕÇø¼ä

ilocʹÓÃλÖÃË÷Òý£¬Ç°±Õºó¿ªÇø¼ä

ixʹÓÃÃû³Æ»òλÖÃË÷Òý£¬ÇÒÓÅÏÈʶ±ðÃû³Æ£¬ÆäÇø¼ä¸ù¾ÝÃû³Æ/λÖÃÀ´¸Ä±ä

×ÛºÏÉÏÊöËùÑÔ£¬²»½¨ÒéʹÓÃix£¬ÈÝÒ×·¢Éú»ìÏýµÄÇé¿ö£¬²¢ÇÒÔËÐÐЧÂʵÍÓÚlocºÍiloc£¬pandas¿¼ÂÇÔÚºóÆÚ»áÒÆ³ýÕâÒ»Ë÷Òý·½·¨

4.3Êý¾ÝÐÞ¸Ä

# ÐÞ¸ÄÁÐÃû
list1 = list(data.columns)
list1[0] = 'µÚÒ»ÁÐ'
data.columns = list1

data['ÐÂÔöÁÐ'] = True

data.loc['ÐÂÔöÒ»ÐÐ',:] = True

data.drop('ÐÂÔöÁÐ',axis=1,inplace=True)

data.drop('ÐÂÔöÒ»ÐÐ',axis=0,inplace=True)

import pandas as pd
data = pd.read_excel('./data/test.xls')

# ʱ¼äÀàÐÍÊý¾Ýת»»
data['·¢Éúʱ¼ä'] = pd.to_datetime(data['·¢Éúʱ¼ä'],format='%Y%m%d%H%M%S')

# ÌáÈ¡day
data.loc[1,'·¢Éúʱ¼ä'].day
# ÌáÈ¡ÈÕÆÚÐÅϢн¨Ò»ÁÐ
data['ÈÕÆÚ'] = [i.day for i in data['·¢Éúʱ¼ä']]

year_data = [i.is_leap_year for i in data['·¢Éúʱ¼ä']]

4.4·Ö×é¾ÛºÏ

4.4.1·Ö×é

# ·Ö×é
group1 = data.groupby('ÐÔ±ð')
group2 = data.groupby(['Èëְʱ¼ä','ÐÔ±ð'])

# ²é¿´ÓжàÉÙ×é
group1.size()

±Ê¼Ç£º

ÓÃgroupby·½·¨·Ö×éºóµÄ½á¹û²¢²»ÄÜÖ±½Ó²é¿´£¬¶øÊDZ»´æÔÚÄÚ´æÖУ¬Êä³öµÄÊÇÄÚ´æµØÖ·¡£Êµ¼ÊÉÏ·Ö×éºóµÄÊý¾Ý¶Ô ÏóGroupByÀàËÆSeriesÓëDataFrame£¬ÊÇpandasÌṩµÄÒ»ÖÖ¶ÔÏó¡£

4.4.2Groupby¶ÔÏó³£¼û·½·¨

4.4.3Grouped¶ÔÏóµÄagg·½·¨

Grouped.agg(º¯Êý»ò°üº¬ÁË×Ö¶ÎÃûºÍº¯ÊýµÄ×Öµä)

# µÚÒ»ÖÖÇé¿ö
group[['ÄêÁä','¹¤×Ê']].agg(min)

# ¶Ô²»Í¬µÄÁнøÐв»Í¬µÄ¾ÛºÏ²Ù×÷
group.agg({'ÄêÁä':max,'¹¤×Ê':sum})

# ÉÏÊö¹ý³ÌÖÐʹÓõĺ¯Êý¾ùΪϵͳmath¿âËù´øµÄº¯Êý£¬ÈôÐèҪʹÓÃpandasµÄº¯ÊýÔòÐèÒª×öÈçϲÙ×÷
group.agg({'ÄêÁä':lambda x:x.max(),'¹¤×Ê':lambda x:x.sum()})

4.4.4Grouped¶ÔÏóµÄapply¾ÛºÏ·½·¨

Grouped.apply(º¯Êý²Ù×÷)

Ö»ÄܶÔËùÓÐÁÐÖ´ÐÐͬһÖÖ²Ù×÷

group.apply(lambda x:x.max())

4.4.5Grouped¶ÔÏóµÄtransform·½·¨

grouped.transform(º¯Êý²Ù×÷)

transform²Ù×÷ʱµÄ¶ÔÏó²»ÔÙÊÇÿһ×飬¶øÊÇÿһ¸öÔªËØ

# ÿһ¿ÕÌí¼Ó×Ö·û
group['ÄêÁä'].transform(lambda x: x.astype(str)+'Ëê').head()

# ×éÄÚ±ê×¼»¯
group1['¹¤×Ê'].transform(lambda x:(x.mean()-x.min())/(x.max()-x.min())).head()

 

 
   
7159 ´Îä¯ÀÀ       28
Ïà¹ØÎÄÕÂ

ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖÓë̽ÌÖ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
Ïà¹ØÎĵµ

Android_UI¹Ù·½Éè¼Æ½Ì³Ì
ÊÖ»ú¿ª·¢Æ½Ì¨½éÉÜ
androidÅÄÕÕ¼°ÉÏ´«¹¦ÄÜ
Android½²ÒåÖÇÄÜÊÖ»ú¿ª·¢
Ïà¹Ø¿Î³Ì

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
Androidϵͳ¿ª·¢
AndroidÓ¦Óÿª·¢
ÊÖ»úÈí¼þ²âÊÔ