Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
pythonÊý¾Ý·ÖÎöÈëÃÅѧϰ±Ê¼Ç
 
  3255  次浏览      28
 2018-8-3 
 
±à¼­ÍƼö:
À´Ô´ÓÚcnblogs£¬½éÉÜÁËÊý¾Ýµ¼ÈëºÍµ¼³ö£¬ÌáÈ¡ºÍɸѡÐèÒªµÄÊý¾Ý£¬Í³¼ÆÃèÊö£¬Êý¾Ý´¦ÀíµÈ¡£

ǰÑÔ£º¸÷ÖÖºÍÊý¾Ý·ÖÎöÏà¹Øpython¿âµÄ½éÉÜ

1.Numpy£º

NumpyÊÇpython¿ÆÑ§¼ÆËãµÄ»ù´¡°ü£¬ËüÌṩÒÔϹ¦ÄÜ£¨²»ÏÞÓÚ´Ë£©£º

(1)¿ìËÙ¸ßЧµÄ¶àάÊý×é¶ÔÏónaarray

(2)ÓÃÓÚ¶ÔÊý×éÖ´ÐÐÔªËØ¼¶¼ÆËãÒÔ¼°Ö±½Ó¶ÔÊý×éÖ´ÐÐÊýѧÔËËãµÄº¯Êý

(3)ÓÃÓÚ¶ÁдӲÅÌÉÏ»ùÓÚÊý×éµÄÊý¾Ý¼¯µÄ¹¤¾ß

(4)ÏßÐÔ´úÊýÔËËã¡¢¸µÀïÒ¶±ä»»£¬ÒÔ¼°Ëæ»úÊýÉú³É

(5)ÓÃÓÚ½«C¡¢C++¡¢Fortran´úÂ뼯³Éµ½pythonµÄ¹¤¾ß

2.pandas

pandasÌṩÁËʹÎÒÃÇÄܹ»¿ìËÙ±ã½ÝµØ´¦Àí½á¹¹»¯Êý¾ÝµÄ´óÁ¿Êý¾Ý½á¹¹ºÍº¯Êý¡£pandas¼æ¾ßNumpy¸ßÐÔÄܵÄÊý×鼯Ë㹦ÄÜÒÔ¼°µç×Ó±í¸ñºÍ¹ØÏµÐÍÊý¾Ý£¨ÈçSQL£©Áé»îµÄÊý¾Ý´¦ÀíÄÜÁ¦¡£ËüÌṩÁ˸´ÔÓ¾«Ï¸µÄË÷Òý¹¦ÄÜ£¬ÒÔ±ã¸üΪ±ã½ÝµØÍê³ÉÖØËÜ¡¢ÇÐÆ¬ºÍÇп顢¾ÛºÏÒÔ¼°Ñ¡È¡Êý¾Ý×Ó¼¯µÈ²Ù×÷¡£

¶ÔÓÚ½ðÈÚÐÐÒµµÄÓû§£¬pandasÌṩÁË´óÁ¿ÊÊÓÃÓÚ½ðÈÚÊý¾ÝµÄ¸ßÐÔÄÜʱ¼äÐòÁй¦Äܺ͹¤¾ß¡£

DataFrameÊÇpandasµÄÒ»¸ö¶ÔÏó£¬ËüÊÇÒ»¸öÃæÏòÁеĶþά±í½á¹¹£¬ÇÒº¬ÓÐÐбêºÍÁбꡣ

ps.ÒýÓÃÒ»¶ÎÍøÉϵĻ°ËµÃ÷DataFrameµÄÇ¿´óÖ®´¦£º

Excel 2007¼°ÆäÒÔºóµÄ°æ±¾µÄ×î´óÐÐÊýÊÇ1048576£¬×î´óÁÐÊýÊÇ16384£¬³¬¹ýÕâ¸ö¹æÄ£µÄÊý¾ÝExcel¾Í»áµ¯³ö¸ö¿ò¿ò¡°´ËÎı¾°üº¬¶àÐÐÎı¾£¬ÎÞ·¨·ÅÖÃÔÚÒ»¸ö¹¤×÷±íÖС±¡£Pandas´¦ÀíÉÏǧÍòµÄÊý¾ÝÊÇÒ×Èç·´ÕÆµÄÊÂÇé£¬Í¬Ê±ËæºóÎÒÃÇÒ²½«¿´µ½Ëü±ÈSQLÓиüÇ¿µÄ±í´ïÄÜÁ¦£¬¿ÉÒÔ×öºÜ¶à¸´ÔӵIJÙ×÷£¬ÒªÐ´µÄcodeÒ²¸üÉÙ¡£ ˵ÁËÒ»´ó¶ÑËüµÄºÃ´¦£¬ÒªÊµ¼Ê¸Ð´¥»¹µÃ¶¯ÊÖÂë´úÂë¡£

3.matplotlib

matplotlibÊÇ×îÁ÷ÐеÄÓÃÓÚ»æÖÆÊý¾Ýͼ±íµÄpython¿â¡£

4.Scipy

ScipyÊÇÒ»×éרÃŽâ¾ö¿ÆÑ§¼ÆËãÖи÷ÖÖ±ê×¼ÎÊÌâÓòµÄ°üµÄ¼¯ºÏ¡£

5.statsmodels£º https://github.com/statsmodels/statsmodels

6.scikit-learn£º http://scikit-learn.org/stable/

Ò».Êý¾Ýµ¼ÈëºÍµ¼³ö

£¨Ò»£©¶ÁÈ¡csvÎļþ

1.±¾µØ¶ÁÈ¡

import pandas as pd
df = pd.read_csv('E:\\tips.csv') #¸ù¾Ý×Ô¼ºÊý¾ÝÎļþ±£´æµÄ·¾¶Ìîд(p.s. pythonÌîд·¾¶Ê±£¬ÒªÃ´Ê¹ÓÃ/£¬ÒªÃ´Ê¹ÓÃ\\)
#Êä³ö£º
total_bill tip sex smoker day time size
16.99 1.01 Female No Sun Dinner 2
10.34 1.66 Male No Sun Dinner 3
21.01 3.50 Male No Sun Dinner 3
23.68 3.31 Male No Sun Dinner 2
24.59 3.61 Female No Sun Dinner 4
25.29 4.71 Male No Sun Dinner 4
.. ... ... ... ... ... ... ...
27.18 2.00 Female Yes Sat Dinner 2
22.67 2.00 Male Yes Sat Dinner 2
17.82 1.75 Male No Sat Dinner 2
18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]

2.ÍøÂç¶ÁÈ¡

import pandas as pd
data_url = "https: //raw. githubusercontent .com / mwaskom /seaborn- data/master /tips.csv" #Ìîдurl¶ÁÈ¡
df = pd.read_csv(data_url)
#Êä³öͬÉÏ

3.read_csvÏê½â

¹¦ÄÜ£º Read CSV (comma-separated) file into DataFrame

read_ csv(filepath_ or_buffer, sep =',', dialect =None , compression= 'infer', doublequote= True, escapechar= None, quotechar ='"', quoting= 0, skipinitialspace= False, lineterminator= None, header= 'infer', index_col= None, names= None, prefix= None, skiprows= None, skipfooter =None, skip_ footer= 0, na_values= None, true_values= None , false_ values= None, delimiter= None, converters =None, dtype= None, usecols None, engine =None, delim _whitespace =False, as_ recarray =False, na_ filter= True, compact_ ints= False, use_ unsigned =False, low _memory= True, buffer _lines= None, warn _bad_lines =True, error_ bad_lines =True, keep_ default _na= True, thousands = None, comment = None, decimal ='.', parse_ dates= False, keep _date_col =False, dayfirst = False, date_parser= None, memory _map= False, float _precision =None, nrows =None, iterator =False , chunksize= None, verbose= False, encoding= None, squeeze= False, mangle_dupe_cols = True, tupleize_ cols= False, infer_ datetime _ format = False, skip _blank_ lines= True)

²ÎÊýÏê½â£º

http: //pandas.pydata.org /pandas-docs /stable/generated /pandas.read_csv.html

(¶þ)¶ÁÈ¡MysqlÊý¾Ý

¼ÙÉèÊý¾Ý¿â°²×°ÔÚ±¾µØ£¬Óû§ÃûΪmyusername,ÃÜÂëΪmypassword,Òª¶ÁÈ¡mydbÊý¾Ý¿âÖеÄÊý¾Ý

import pandas as pd
import MySQLdb
mysql_cn= MySQLdb.connect (host='localhost', port= 3306,user ='myusername', passwd= 'mypassword', db= 'mydb ')
df = pd.read_sql('select * from test;', con= mysql_ cn)
mysql_ cn.close()

ÉÏÃæµÄ´úÂë¶ÁÈ¡ÁËtest±íÖÐËùÓеÄÊý¾Ýµ½dfÖУ¬¶ødfµÄÊý¾Ý½á¹¹ÎªDataframe¡£

ps.MySQL½Ì³Ì:http://www.runoob.com/mysql/mysql-tutorial.html

(Èý)¶ÁÈ¡excelÎļþ

Òª¶ÁÈ¡excelÎļþ»¹ÐèÒª°²×°xlrdÄ£¿é£¬pip install xlrd¼´¿É¡£

df = pd.read_excel('E:\\tips.xls')

(ËÄ)Êý¾Ýµ¼³öµ½csvÎļþ

df.to_csv('E:\\ demo.csv', encoding= 'utf-8', index = False)
#index=False ±íʾµ¼³öʱȥµôÐÐÃû³Æ£¬Èç¹ûÊý¾ÝÖк¬ÓÐÖÐÎÄ£¬Ò»°ãencoding Ö¸¶¨Îª¡®utf-8¡¯

(Îå)¶ÁдSQLÊý¾Ý¿â

import pandas as pd
import sqlite3
con = sqlite3.connect('...')
sql = '...'
df = pd.read_sql(sql,con)
#helpÎļþ
help (sqlite3.connect)
#Êä³ö
Help on built- in function connect in module _ sqlite3 :
connect(...)
connect(database[, timeout, isolation_level, detect _types, factory])
Opens a connection to the SQLite database file *database *. You can use
":memory :" to open a database connection to a database that resides in
RAM instead of on disk.
#############
help(pd.read_sql)
#Êä³ö
Help on function read_ sql in module pandas.io. sql :
read_sq l(sql, con, index_col= None, coerce_float= True, params= None, parse_ dates= None, columns= None, chunksize= None)
Read SQL query or database table into a DataFrame.

ps.Êý¾Ý¿âµÄ´úÂëÊÇÎÒÖ±½Ó´ÓÍøÂçÉÏÕ³Ìù¹ýÀ´µÄ£¬Ã»ÓвâÊÔ¹ýÊDz»ÊÇ¿ÉÐУ¬ÏÈÌùÉÏÀ´¡£

Êý¾Ý¿âÎÒ»¹ÔÚÃþË÷ÖУ¬Ñ§Ï°ÐĵÃѧϰ±Ê¼ÇÖ®ÀàµÄ´ó¼Ò¿ÉÒÔÒ»Æð·ÖÏí23333~

¶þ.ÌáÈ¡ºÍɸѡÐèÒªµÄÊý¾Ý

£¨Ò»£©ÌáÈ¡ºÍ²é¿´ÏàÓ¦Êý¾Ý £¨ÓõÄÊÇtips.csvµÄÊý¾Ý£¬Êý¾ÝÀ´Ô´£ºhttps: //github .com/mwaskom /seaborn- data£©

print df.head() #´òÓ¡Êý¾ÝǰÎåÐÐ
#Êä³ö
total_ bill tip sex smoker day time size
16.99 1.01 Female No Sun Dinner 2
10.34 1.66 Male No Sun Dinner 3
21.01 3.50 Male No Sun Dinner 3
23.68 3.31 Male No Sun Dinner 2
24.59 3.61 Female No Sun Dinner 4

print df.tail() #´òÓ¡Êý¾Ýºó5ÐÐ
#Êä³ö
total_bill tip sex smoker day time size
29.03 5.92 Male No Sat Dinner 3
27.18 2.00 Female Yes Sat Dinner 2
22.67 2.00 Male Yes Sat Dinner 2
17.82 1.75 Male No Sat Dinner 2
18.78 3.00 Female No Thur Dinner 2!

print df.columns #´òÓ¡ÁÐÃû
#Êä³ö
Index ([u'total_bill', u'tip', u'sex', u'smoker', u'day', u'time', u'size'], dtype ='object')

print df.index #´òÓ¡ÐÐÃû
#Êä³ö
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
234, 235, 236, 237, 238, 239, 240, 241, 242, 243],
dtype='int64', length= 244)

print df.ix[10:20, 0:3] #´òÓ¡10~20ÐÐǰÈýÁÐÊý¾Ý
#Êä³ö
total_bill tip sex
10.27 1.71 Male
35.26 5.00 Female
15.42 1.57 Male
18.43 3.00 Male
14.83 3.02 Female
21.58 3.92 Male
10.33 1.67 Female
16.29 3.71 Male
16.97 3.50 Female
20.65 3.35 Male
17.92 4.08 Male

#ÌáÈ¡²»Á¬ÐøÐкÍÁеÄÊý¾Ý£¬Õâ¸öÀý×ÓÌáÈ¡µÄÊǵÚ1,3,5ÐУ¬µÚ2,4ÁеÄÊý¾Ý
df.iloc[[1,3,5],[2,4]]
#Êä³ö
sex day
Male Sun
Male Sun
Male Sun

#רÃÅÌáȡijһ¸öÊý¾Ý£¬Õâ¸öÀý×ÓÌáÈ¡µÄÊǵÚÈýÐУ¬µÚ¶þÁÐÊý¾Ý£¨Ä¬ÈÏ´Ó0¿ªÊ¼Ëã¹þ£©
df.iat[3,2]
#Êä³ö
'Male'

print df.drop(df.columns[1, 2], axis = 1) #ÉáÆúÊý¾ÝǰÁ½ÁÐ
print df.drop(df.columns[[1, 2]], axis = 0) #ÉáÆúÊý¾ÝǰÁ½ÐÐ
#ΪÁ˽Úʡƪ·ù½á¹û¾Í²»Ìù³öÀ´Á˹þ~

print df.shape #´òӡά¶È
#Êä³ö
(244, 7)

df.iloc[3] #ѡȡµÚ3ÐÐ
#Êä³ö1
total_bill 23.68
tip 3.31
sex Male
smoker No
day Sun
time Dinner
size 2
Name: 3, dtype: object
df.iloc[2:4] #ѡȡµÚ2µ½µÚ3ÐÐ
#Êä³ö2
total_bill tip sex smoker day time size
21.01 3.50 Male No Sun Dinner 3
23.68 3.31 Male No Sun Dinner 2


df.iloc[0,1] #ѡȡµÚ0ÐÐ1ÁеÄÔªËØ
#Êä³ö3
1.01

(¶þ)ɸѡ³öÐèÒªµÄÊý¾Ý£¨ÓõÄÊÇtips.csvµÄÊý¾Ý£¬Êý¾ÝÀ´Ô´£ºhttps: //github.com /mwaskom /seaborn- data£©

#example:¼ÙÉèÎÒÃÇҪɸѡ³öС·Ñ´óÓÚ$8µÄÊý¾Ý
df[df.tip>8]
#Êä³ö
total_bill tip sex smoker day time size
50.81 10 Male Yes Sat Dinner 3
48.33 9 Male No Sat Dinner 4

#Êý¾ÝɸѡͬÑù¿ÉÒÔÓá±»ò¡°ºÍ¡±ÇÒ¡°×÷ΪɸѡÌõ¼þ£¬±ÈÈç
#1
df[(df.tip>7)|(df.total_bill>50)] #ɸѡ³öС·Ñ´óÓÚ$7 »ò×ÜÕ˵¥´óÓÚ$50µÄÊý¾Ý
#Êä³ö
total_bill tip sex smoker day time size
39.42 7.58 Male No Sat Dinner 4
50.81 10.00 Male Yes Sat Dinner 3
48.33 9.00 Male No Sat Dinner 4
#2
df [(df.tip>7)&(df.total_ bill>50)]#ɸѡ³öС·Ñ´óÓÚ$7ÇÒ×ÜÕ˵¥´óÓÚ$50µÄÊý¾Ý
#Êä³ö
total_bill tip sex smoker day time size
50.81 10 Male Yes Sat Dinner 3

#½ÓÉÏ
#¼ÙÈç¼ÓÈëÁËɸѡÌõ¼þºó£¬ÎÒÃÇÖ»¹ØÐÄdayºÍtime
df[['day','time']][(df.tip>7)|(df.total_bill>50)]
#Êä³ö
day time
Sat Dinner
Sat Dinner
Sat Dinner

Èý.ͳ¼ÆÃèÊö£¨ÓõÄÊÇtips.csvµÄÊý¾Ý£¬Êý¾ÝÀ´Ô´£ºhttps://github.com/mwaskom/seaborn-data£©

print df.describe() #ÃèÊöÐÔͳ¼Æ
#Êä³ö ¸÷Ö¸±ê¶¼±È½Ï¼òµ¥¾Í²»½âÊÍÁ˹þ
total_bill tip size
count 244.000000 244.000000 244.000000
mean 19.785943 2.998279 2.569672
std 8.902412 1.383638 0.951100
min 3.070000 1.000000 1.000000
25% 13.347500 2.000000 2.000000
50% 17.795000 2.900000 2.000000
75% 24.127500 3.562500 3.000000
max 50.810000 10.000000 6.000000

ËÄ.Êý¾Ý´¦Àí(Ò»)Êý¾ÝתÖã¨ÓõÄÊÇtips.csvµÄÊý¾Ý£¬Êý¾ÝÀ´Ô´£ºhttps: //github.com /mwaskom /seaborn- data£©

print df.T
#output
1 2 3 4 5 6 7 \
total_bill 16.99 10.34 21.01 23.68 24.59 25.29 8.77 26.88
tip 1.01 1.66 3.5 3.31 3.61 4.71 2 3.12
sex Female Male Male Male Female Male Male Male
smoker No No No No No No No No
day Sun Sun Sun Sun Sun Sun Sun Sun
time Dinner Dinner Dinner Dinner Dinner Dinner Dinner Dinner
size 2 3 3 2 4 4 2 4
9 ... 234 235 236 237 238 \
total_bill 15.04 14.78 ... 15.53 10.07 12.6 32.83 35.83
tip 1.96 3.23 ... 3 1.25 1 1.17 4.67
sex Male Male ... Male Male Male Male Female
smoker No No ... Yes No Yes Yes No
day Sun Sun ... Sat Sat Sat Sat Sat
time Dinner Dinner ... Dinner Dinner Dinner Dinner Dinner
size 2 2 ... 2 2 2 2 3
240 241 242 243
total_bill 29.03 27.18 22.67 17.82 18.78
tip 5.92 2 2 1.75 3
sex Male Female Male Male Female
smoker No Yes Yes No No
day Sat Sat Sat Sat Thur
time Dinner Dinner Dinner Dinner Dinner
size 3 2 2 2 2
[7 rows x 244 columns]

(¶þ)Êý¾ÝÅÅÐò£¨ÓõÄÊÇtips.csvµÄÊý¾Ý£¬Êý¾ÝÀ´Ô´£ºhttps: //github.com/mwaskom /seaborn-data £©

df.sort_values(by='tip') #°´tipÁÐÉýÐòÅÅÐò
#Êä³ö£¨ÎªÁ˲»Õ¼Æª·ùÎÒ¼ò»¯ÁËÒ»²¿·Ö£©
total_bill tip sex smoker day time size
3.07 1.00 Female Yes Sat Dinner 1
12.60 1.00 Male Yes Sat Dinner 2
5.75 1.00 Female Yes Fri Dinner 2
7.25 1.00 Female No Sat Dinner 1
16.99 1.01 Female No Sun Dinner 2
.. ... ... ... ... ... ... ...
28.17 6.50 Female Yes Sat Dinner 3
34.30 6.70 Male No Thur Lunch 6
48.27 6.73 Male No Sat Dinner 4
39.42 7.58 Male No Sat Dinner 4
48.33 9.00 Male No Sat Dinner 4
50.81 10.00 Male Yes Sat Dinner 3
[244 rows x 7 columns]

(Èý)ȱʧֵ´¦Àí1.Ìî³äȱʧֵ(Êý¾ÝÀ´×Ô¡¶ÀûÓÃpython½øÐÐÊý¾Ý·ÖÎö¡·µÚ¶þÕ usagov_ bitly_ data 2012-03-16- 1331923249.txt£¬ÐèÒªµÄͬѧ¿ÉÒÔÕÒÎÒÒª)

import json #pythonÓÐÐí¶àÄÚÖûòµÚÈý·½Ä£¿é¿ÉÒÔ½«JSON×Ö·û´®×ª»»³Épython×Öµä¶ÔÏó
import pandas as pd
import numpy as np
from pandas import DataFrame
path = 'F: \PycharmProjects\pydata-book-master\ ch02\ usagov_bitly_ data2012-03-16-1331923249.txt' #¸ù¾Ý×Ô¼ºµÄ·¾¶Ìîд
records = [json.loads(line) for line in open (path)]
frame = DataFrame(records)
frame ['tz']
#Êä³ö£¨ÎªÁ˽Úʡƪ·ùÎÒɾ³ýÁ˲¿·ÖÊä³ö½á¹û£©
America/New_York
America/Denver
America/New_York
America/Sao_Paulo
America/New_York
America/New_York
Europe/Warsaw
America/Los_Angeles
America/New_York
America/New_York
NaN
...
Name: tz, dtype: object

´ÓÒÔÉÏÊä³öÖµ¿ÉÒÔ¿´³öÊý¾Ý´æÔÚδ֪»òȱʧֵ£¬½Ó×ÅÔÛÃÇÀ´´¦Àíȱʧֵ¡£

print frame['tz'].fillna(1111111111111) #ÒÔÊý×Ö´úÌæÈ±Ê§Öµ
#Êä³ö½á¹û£¨ÎªÁ˽Úʡƪ·ùÎÒɾ³ýÁ˲¿·ÖÊä³ö½á¹û£©
America/New_York
America/Denver
America/New_York
America/Sao_Paulo
America/New_York
America/New_York
Europe/Warsaw
America/Los_Angeles
America/New_York
America/New_York
1111111111111
Name: tz, dtype: object

print frame ['tz'].fillna ('YuJie2333333333333') #ÓÃ×Ö·û´®´úÌæÈ±Ê§Öµ
#Êä³ö£¨ÎªÁ˽Úʡƪ·ùÎÒɾ³ýÁ˲¿·ÖÊä³ö½á¹û£©
America/New_York
America/Denver
America/New_York
America/Sao_Paulo
America/New_York
America/New_York
Europe/Warsaw
America/Los_Angeles
America/New_York
America/New_York
YuJie2333333333333
Name: tz, dtype: object

»¹ÓУº

print frame['tz'].fillna(method='pad') #ÓÃǰһ¸öÊý¾Ý´úÌæÈ±Ê§Öµ
print frame['tz'].fillna(method='bfill') #ÓúóÒ»¸öÊý¾Ý´úÌæÈ±Ê§Öµ

2.ɾ³ýȱʧֵ £¨Êý¾ÝͬÉÏ£©

print frame['tz'].dropna(axis=0) #ɾ³ýȱʧÐÐ
print frame['tz'].dropna(axis=1) #ɾ³ýȱʧÁÐ

3.²åÖµ·¨Ìȱʧֵ

ÓÉÓÚûÓÐÊý¾Ý£¬Õâ¶ù²å²¥Ò»¸öС֪ʶµã£º´´½¨Ò»¸öËæ»úµÄÊý¾Ý¿ò

import pandas as pd
import numpy as np
#´´½¨Ò»¸ö6*4µÄÊý¾Ý¿ò£¬randnº¯ÊýÓÃÓÚ´´½¨Ëæ»úÊý
czf_data = pd.DataFrame (np.random .randn (6,4),columns= list('ABCD'))
czf_ data
#Êä³ö
A B C D
0.355690 1.165004 0.810392 -0.818982
0.496757 -0.490954 -0.407960 -0.493502
-0.202123 -0.842278 -0.948464 0.223771
0.969445 1.357910 -0.479598 -1.199428
0.125290 0.943056 -0.082404 -0.363640
-1.762905 -1.471447 0.351570 -1.546152

ºÃÀ²£¬Êý¾Ý¾Í³öÀ´ÁË¡£½Ó×ÅÎÒÃÇÓÿÕÖµÌæ»»ÊýÖµ£¬´´Ôì³öÒ»¸öº¬ÓпÕÖµµÄDataFrame¡£

#°ÑµÚ¶þÁÐÊý¾ÝÉèÖÃΪȱʧֵ
czf_data.ix [2,:]=np.nan
czf_data
#Êä³ö
A B C D
0.355690 1.165004 0.810392 -0.818982
0.496757 -0.490954 -0.407960 -0.493502
NaN NaN NaN NaN
0.969445 1.357910 -0.479598 -1.199428
0.125290 0.943056 -0.082404 -0.363640
-1.762905 -1.471447 0.351570 -1.546152

#½ÓמͿÉÒÔÀûÓòåÖµ·¨Ìî²¹¿ÕȱֵÁË~
print czf_ data.interpolate()
#Êä³ö
A B C D
0.355690 1.165004 0.810392 -0.818982
0.496757 -0.490954 -0.407960 -0.493502
0.733101 0.433478 -0.443779 -0.846465
0.969445 1.357910 -0.479598 -1.199428
0.125290 0.943056 -0.082404 -0.363640
-1.762905 -1.471447 0.351570 -1.546152

(ËÄ)Êý¾Ý·Ö×飨ÓõÄÊÇtips.csvµÄÊý¾Ý£¬Êý¾ÝÀ´Ô´£ºhttps: //github.com/mwaskom /seaborn-data £©

group = df.groupby('day') #°´dayÕâÒ»ÁнøÐзÖ×é
#1
print group.first ()#´òӡÿһ×éµÄµÚÒ»ÐÐÊý¾Ý
#Êä³ö
total_bill tip sex smoker time size
day
Fri 28.97 3.00 Male Yes Dinner 2
Sat 20.65 3.35 Male No Dinner 3
Sun 16.99 1.01 Female No Dinner 2
Thur 27.20 4.00 Male No Lunch 4
#2
print group.last()#´òӡÿһ×éµÄ×îºóÒ»ÐÐÊý¾Ý
#Êä³ö
total_bill tip sex smoker time size
day
Fri 10.09 2.00 Female Yes Lunch 2
Sat 17.82 1.75 Male No Dinner 2
Sun 15.69 1.50 Male Yes Dinner 2
Thur 18.78 3.00 Female No Dinner 2

(Îå)ÖµÌæ»»

import pandas as pd
import numpy as np
#Ê×ÏÈ´´ÔìÒ»¸öSeries£¨Ã»ÓÐÊý¾ÝÇé¿öϵĸ£Òô233£©
Series = pd.Series([0,1,2,3,4,5])
#Êä³ö
Series
0
1
2
3
4
5
dtype: int64

#ÊýÖµÌæ»»£¬ÀýÈ罫0»»³É10000000000000
print Series.replace(0,10000000000000)
#Êä³ö
10000000000000
1
2
3
4
5
dtype: int64

#ÁкÍÁеÄÌæ»»Í¬Àí
print Series.replace([0,1,2,3,4,5]£¬[11111,222222,3333333,44444,55555,666666])
#Êä³ö
11111
222222
3333333
44444
55555
666666
dtype: int64

Îå.ͳ¼Æ·ÖÎö

(Ò»)t¼ìÑé

1.¶ÀÁ¢Ñù±¾t¼ìÑé

Á½¶ÀÁ¢Ñù±¾t¼ìÑé¾ÍÊǸù¾ÝÑù±¾Êý¾Ý¶ÔÁ½¸öÑù±¾À´×ÔµÄÁ½¶ÀÁ¢×ÜÌåµÄ¾ùÖµÊÇ·ñÓÐÏÔÖø²îÒì½øÐÐÍÆ¶Ï£»½øÐÐÁ½¶ÀÁ¢Ñù±¾t¼ìÑéµÄÌõ¼þÊÇ£¬Á½Ñù±¾µÄ×ÜÌåÏ໥¶ÀÁ¢ÇÒ·ûºÏÕý̬·Ö²¼¡£

¿ªÊ¼ÕÒ²»µ½ºÏÊʵÄÊý¾Ý£¬ÎÒ¾ÍÔÚÍøÉÏËæ±ãÕª³­Á˸öspss×ö¶ÀÁ¢Ñù±¾t¼ìÑéµÄʵÀýÊý¾Ý×÷ΪÀý×Ó´ó¼ÒÔÝʱ¿´×ŰÉÕÒµ½ºÏÊʵÄÀý×ÓÔÙ¸ø´ó¼Ò¾Ù~

Êý¾ÝÈçÏ£¬ÎÒ½«Êý¾Ý±£´æÎª±¾µØxlsx¸ñʽ£º

group data
1 34
1 37
1 28
1 36
1 30
2 43
2 45
2 47
2 49
2 39

import pandas as pd
from scipy.stats import ttest_ind
IS_t_test = pd.read_excel('E:\\IS_t_test.xlsx')
Group1 = IS_t_test[IS_t_test['group']==1]['data']
Group2 = IS_t_test[IS_t_test['group']==2]['data']
print ttest_ind(Group1,Group2)
#Êä³ö
(-4.7515451390104353, 0.0014423819408438474)

Êä³ö½á¹ûµÄµÚÒ»¸öÔªËØÎªtÖµ£¬µÚ¶þ¸öÔªËØÎªp-value

ttest_indĬÈÏÁ½×éÊý¾Ý·½²îÆëÐԵģ¬Èç¹ûÏëÒªÉèÖÃĬÈÏ·½²î²»Æë£¬¿ÉÒÔÉèÖÃequal_var=False

print ttest_ind(Group1,Group2,equal_var=True)
print ttest_ind(Group1,Group2,equal_var=False)
#Êä³ö
(-4.7515451390104353, 0.0014423819408438474)
(-4.7515451390104353, 0.0014425608643614844)

2.Åä¶ÔÑù±¾t¼ìÑé

ͬÑùÕÒ²»µ½Êý¾Ý£¬ÈÃÎÒÃÇÔÝÇÒ¼ÙÉèÉϱ߶ÀÁ¢Ñù±¾ÊÇÅä¶ÔÑù±¾°É£¬Ê¹ÓÃͬÑùµÄÊý¾Ý¡£

import pandas as pd
from scipy.stats import ttest_rel
IS_t_test = pd.read_excel('E:\\IS_t_test.xlsx')
Group1 = IS_t_test[IS_t_test['group']==1]['data']
Group2 = IS_t_test[IS_t_test['group']==2]['data']
print ttest_rel(Group1,Group2)
#Êä³ö
(-5.6873679190073361, 0.00471961872448184)

ͬÑùµÄ£¬Êä³ö½á¹ûµÄµÚÒ»¸öÔªËØÎªtÖµ£¬µÚ¶þ¸öÔªËØÎªp-value¡£

(¶þ)·½²î·ÖÎö

1.µ¥ÒòËØ·½²î·ÖÎö

ÕâÀïÒÀÈ»ÑØÓÃt¼ìÑéµÄÊý¾Ý

import pandas as pd
from scipy import stats
IS_t_test = pd.read_excel('E:\\IS_t_test.xlsx')
Group1 = IS_t_test[IS_t_test['group']==1]['data']
Group2 = IS_t_test[IS_t_test['group']==2]['data']
w,p = stats.levene(*args)
#levene·½²îÆëÐÔ¼ìÑé¡£levene(*args, **kwds) Perform Levene test for equal variances.Èç¹ûp<0.05£¬Ôò·½²î²»Æë
print w,p
#½øÐз½²î·ÖÎö
f,p = stats.f_oneway(*args)
print f,p
#Êä³ö
(0.019607843137254936, 0.89209916055865535)
22.5771812081 0.00144238194084

2.¶àÒòËØ·½²î·ÖÎö

Êý¾ÝÊÇÎÒ´ÓÍøÉÏÕҵĶàÒòËØ·½²î·ÖÎöµÄÒ»¸öÀý×Ó£¬Ñо¿Çø×éºÍÓªÑøËØ¶ÔÌåÖØµÄÓ°Ïì¡£ÎÒ×ö³ÉÁËexcelÎļþ£¬ÐèÒªµÄͬѧ¿ÉÒÔÎÊÎÒÒª¹þ~×ö¶àÒòËØ·½²î·ÖÎöÐèÒª¼ÓÔØstatsmodelsÄ£¿é£¬Èç¹ûµçÄÔûÓа²×°¿ÉÒÔpip installһϡ£

#Êý¾Ýµ¼Èë
import pandas as pd
MANOVA=pd.read_excel('E:\\MANOVA.xlsx')
MANOVA
#Êä³ö£¨ÎªÁ˽Úʡƪ·ùɾµôÁËÖм䲿·ÖµÄÊä³ö½á¹û£©
id nutrient weight
1 1 50.1
2 1 47.8
3 1 53.1
4 1 63.5
5 1 71.2
6 1 41.4
.......................
6 3 38.5
7 3 51.2
8 3 46.2

#¶àÒòËØ·½²î·ÖÎö
from statsmodels.formula.api import ols
from statsmodels. stats.anova import anova_lm
formula = 'weight~C (id)+ C(nutrient) +C(id): C (nutrient) '
anova_results = anova_lm (ols (formula ,MANOVA) .fit ())
print anova_results
#output
df sum_sq mean_sq F PR (>F)
C(id) 7 2.373613e +03 339.087619 0 NaN
C(nutrient) 2 1.456133e+02 72.806667 0 NaN
C(id):C(nutrient) 14 3.391667e +02 24.226190 0 NaN
Residual 0 8.077936e-27 inf NaN NaN

Ò²ÐíÊý¾ÝÑ¡µÃ²»¶Ô£¬p-valueÈ«ÊÇ¿ÕÖµ23333£¬´ýÎÒÕÒ¸öºÃµã¶ùµÄÊý¾ÝÔÙ×öÒ»´Î¶àÒòËØ·½²î·ÖÎö¡£

3.ÖØ¸´²âÁ¿Éè¼ÆµÄ·½²î·ÖÎö£¨µ¥ÒòËØ£© ********´ýÍêÉÆ

ÖØ¸´²âÁ¿Éè¼ÆÊǶÔͬһÒò±äÁ¿½øÐÐÖØ¸´²â¶È£¬Öظ´²âÁ¿Éè¼ÆµÄ·½²î·ÖÎö¿ÉÒÔÊÇͬһÌõ¼þϽøÐеÄÖØ¸´²â¶È£¬Ò²¿ÉÒÔÊDz»Í¬Ìõ¼þϵÄÖØ¸´²âÁ¿¡£

´úÂëºÍ¶àÒòËØ·½²î·ÖÎöÒ»Ñù£¬Ë¼Â·²»Ò»Ñù¶øÒÑ~µ«ÎÒ»¹ÕÒ²»µ½¶àÒòËØ·½²î·ÖÎöºÏÊʵÄÊý¾ÝËùÒÔÕâ¶ù¾ÍÏȲ»Ð´ÁË2333

4.»ìºÏÉè¼ÆµÄ·½²î·ÖÎö ********´ýÍêÉÆ

#########ͳ¼ÆÑ§Ñ§µÃºÃµÄͬѧÃÇ£¬½Ì½ÌÎÒ°É¡£¡£

(Èý)¿¨·½¼ìÑé

¿¨·½¼ìÑé¾ÍÊÇͳ¼ÆÑù±¾µÄʵ¼Ê¹Û²âÖµÓëÀíÂÛÍÆ¶ÏÖµÖ®¼äµÄÆ«Àë³Ì¶È£¬Êµ¼Ê¹Û²âÖµÓëÀíÂÛÍÆ¶ÏÖµÖ®¼äµÄÆ«Àë³Ì¶È¾Í¾ö¶¨¿¨·½ÖµµÄ´óС£¬¿¨·½ÖµÔ½´ó£¬Ô½²»·ûºÏ£»¿¨·½ÖµÔ½Ð¡£¬Æ«²îԽС£¬Ô½Ç÷ÓÚ·ûºÏ£¬ÈôÁ½¸öÖµÍêÈ«ÏàµÈʱ£¬¿¨·½Öµ¾ÍΪ0£¬±íÃ÷ÀíÂÛÖµÍêÈ«·ûºÏ¡££¨from °Ù¶È°Ù¿Æ2333£©

1.µ¥ÒòËØ¿¨·½¼ìÑé

Êý¾ÝÔ´ÓÚÍøÂ磬ÄÐÅ®»¯×±Óë²»»¯×±ÈËÊýµÄÀíÂÛÖµÓëʵ¼ÊÖµ¡£

import numpy as np
from scipy import stats
from scipy.stats import chisquare
observed = np.array([15,95])
#¹Û²âÖµ£º110ѧÉúÖл¯×±µÄÅ®Éú95ÈË£¬»¯×±µÄÄÐÉú15ÈË
expected = np.array([55,55])
#ÀíÂÛÖµ£º110ѧÉúÖл¯×±µÄÅ®Éú55ÈË£¬»¯×±µÄÄÐÉú55ÈË
chisquare(observed,expected)
#output
(58.18181818181818, 2.389775628860044e-14)

2.¶àÒòËØ¿¨·½¼ìÑé*****ÕýÔÚÑо¿ÖУ¬Ñ§»áÁËÍêÉÆÕâÒ»¿é~

(ËÄ)¼ÆÊýͳ¼Æ£¨ÓõÄÊý¾ÝΪtips.csv£©

#example£ºÍ³¼ÆÐÔ±ð
count = df['sex'].value_counts()
#Êä³ö
print count
Male 157
Female 87
Name: sex, dtype: int64

(Îå)»Ø¹é·ÖÎö *****´ýѧϰ£º Êý¾ÝÄâºÏ£¬¹ãÒåÏßÐԻع顣¡£¡£¡£µÈµÈ

Áù.¿ÉÊÓ»¯

ÎÒ¾õµÃ°É£¬Æäʵ¿´×Åexcel¾Í¿ÉÒÔʵÏֵŦÄÜΪºÎÄÇô¸´ÔÓ£¬excelȷʵ¹»Í¨Óù»±ã½Ý£¬µ«ÊÇ´¦ÀíºÜ´óÊý¾ÝÁ¿µÄ»°Ò²Ðí³Ô²»Ïû°É¡£Ñ§Ñ§python»æÍ¼Ò²²»Àµ£¬¶øÇÒ½²Õ棬ÓеijÉÐ§ÕæµÄͦºÃ¿´µÄ¡£

(Ò»)Seaborn

ÎÒѧÊý¾Ý·ÖÎö¿ÉÊÓ»¯ÊÇ´ÓѧϰSeabornÈëÃŵģ¬SeabornÊÇ»ùÓÚmatplotlibµÄPython¿ÉÊÓ»¯¿â£¬¸Õ¿ªÊ¼±ã½Ó´¥matplotlibÄÑÃâÓÐЩ³ÔÁ¦£¬²ÎÊý¶àÇÒÄÑÀí½â£¬µ«ÊÇÂýÂýÀ´×Ü»áѧ»áµÄ¡£»¹ÓйؼüµÄÒ»µãÊÇ£¬seaborn»­³öÀ´µÄͼºÃºÃ¿´¡£¡£

#»ù´¡µ¼Èë
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

#С·ÑÊý¾ÝÕæµÄͦºÃµÄ£¬Õâ¶ùÓÃtips×÷Ϊexample
tips = sns.load_dataset('tips') #´ÓÍøÂç»·¾³µ¼ÈëÊý¾Ýtips

1.lmplotº¯Êý

lmplot(x, y, data, hue =None, col=None, row=None, palette =None, col_wrap=None, size=5, aspect=1, markers ='o', sharex=True, sharey=True, hue _order=None, col_order=None, row_ order =None, legend=True, legend_out=True, x_ estimator= None, x_bins=None, x_ci='ci', scatter = True, fit_reg=True, ci=95, n_boot= 1000, units= None, order=1, logistic= False , lowess =False, robust =False, logx= False, x_partial=None, y_partial=None, truncate = False , x_ jitter=None, y_jitter=None, scatter_kws=None, line_kws= None)

¹¦ÄÜ£ºPlot data and regression model fits across a FacetGrid.

ÏÂÃæ¾Í²»Í¬µÄÀý×Ó£¬¶ÔlmplotµÄ²ÎÊý½øÐнâÊÍ

Àý×Ó1. »­³ö×ÜÕ˵¥ºÍС·Ñ»Ø¹é¹ØÏµÍ¼

Óõ½ÁËlmplot(x, y, data,scatter_kws£©

x,y,dataһĿÁËÈ»Õâ¶ù¾Í²»¶à½âÊÍÁË£¬scatter_kwsºÍline_kwsµÄ¹Ù·½½âÊÍÈçÏ£º

{scatter,line}_kws : dictionarie

Additional keyword arguments to pass to plt.scatter and plt.plot.

scatterΪµã£¬lineΪÏß¡£Æäʵ¾ÍÊÇÓÃ×ÖµäÈ¥ÏÞ¶¨µãºÍÏߵĸ÷ÖÖÊôÐÔ£¬ÈçÀý×ÓËùʾ£¬É¢µãµÄÑÕɫΪ»Òʯɫ£¬ÏßÌõµÄÑÕɫΪӡ¶Èºì£¬³ÉÏñЧ¹û¾ÍÊÇÕâÑùµãÏßÑÕÉ«·ÖÀ룬չÏÖЧ¹ûºÜºÃ¡£´ó¼ÒÒ²¿ÉÒÔ»»ÉÏ×Ô¼ºÏëÒªµÄͼƬÊôÐÔ¡£

sns.lmplot("total_bill", "tip", tips,
scatter_kws= {"marker": ".", "color": "slategray"},
line_ kws= {"linewidth": 1, "color": "indianred" }).savefig ('picture2')

ÁíÍ⣺ÑÕÉ«»¹¿ÉÒÔʹÓÃRGB´úÂ룬¾ßÌå¶ÔÕÕ±í¿ÉÒԲο¼Õâ¸öÍøÕ¾£¬¿ÉÒÔ×Ô¼º´îÅäÑÕÉ«£º

http: //www.114la.com /other/rgb.htm

markerÒ²¿ÉÒÔÓжàÖÖÑùʽ£¬¾ßÌåÈçÏ£º

. Point marker

, Pixel marker

o Circle marker

v Triangle down marker

^ Triangle up marker

< Triangle left marker

> Triangle right marker

1 Tripod down marker

2 Tripod up marker

3 Tripod left marker

4 Tripod right marker

s Square marker

p Pentagon marker

* Star marker

h Hexagon marker

H Rotated hexagon D Diamond marker

d Thin diamond marker

| Vertical line (vlinesymbol) marker

_ Horizontal line (hline symbol) marker

+ Plus marker

x Cross (x) marker

sns.lmplot("total_bill", "tip", tips,
scatter_ kws= {"marker": ".","color":"#FF7F00"},
line _ kws= {"linewidth": 1, "color": "#BF3EFF" }). savefig ('s1')
ps.ÎÒÐÞ¸ÄmakerÊôÐÔ²»³É¹¦²»ÖªÎªºÎ£¬Çó½â´ð

Àý×Ó2.ÓòÍÈËÊý(size)ºÍС·Ñ(tip)µÄ¹ØÏµÍ¼

¹Ù·½½âÊÍ£º

x_estimator : callable that maps vector -> scalar, optional

Apply this function to each unique value of x and plot the resulting estimate. This is useful when x is a discrete variable. If x_ci is not None, this estimate will be bootstrapped and a confidence interval will be drawn.

´ó¸Å½âÊ;ÍÊÇ£º¶ÔÓµÓÐÏàͬxˮƽµÄyÖµ½øÐÐÓ³Éä

plt.figure()
sns.lmplot ('size', 'tip', tips, x_estimator = np .mean ). savefig('picture3')

{x,y}_jitter : floats, optional

Add uniform random noise of this size to either the x or y variables. The noise is added to a copy of the data after fitting the regression, and only influences the look of the scatterplot. This can be helpful when plotting variables that take discrete values.

jitterÊǸöºÜÓÐÒâ˼µÄ²ÎÊý, ÌØ±ðÊÇ´¦Àí°ÐÊý¾ÝµÄoverlapping¹ýÓÚÑÏÖØµÄÇé¿öʱ, ͨ¹ýÔö¼ÓÒ»¶¨³Ì¶ÈµÄÔëÉù(noise)ʵÏÖÊý¾ÝµÄÇø¸ô»¯, ÕâÑùԭʼÊý¾ÝÊÇÈô¸É µã´Ø ±ä³ÉһϵÁÐÃܼ¯ÁÚ½üµÄµãȺ. ÁíÍâ, ÓеÄÈ˻ᾭ³£½« rug Óë jitter ½áºÏʹÓÃ. ÕâÒÀÈ˰É.¶ÔÓÚºáÖáÈ¡ÀëɢˮƽµÄʱºò, ÓÃx_jitter ¿ÉÒÔÈÃÊý¾Ýµã·¢ÉúˮƽµÄÈŶ¯.µ«ÈŶ¯µÄ·ù¶È²»Ò˹ý´ó¡£

sns.lmplot('size', 'tip', tips, x_jitter= .15). savefig ('picture4')

seaborn»¹¿ÉÒÔ×ö³öxkcd·ç¸ñµÄͼƬ£¬»¹Í¦ÓÐÒâ˼µÄ

with plt.xkcd():
sns.color_ palette('husl', 8)
sns.set_ context('paper')
sns.lmplot (x='total_bill', y='tip', data= tips, ci= 65).savefig ('picture1')

with plt.xkcd():
sns.lmplot('total_ bill', 'tip', data =tips, hue= 'day ')
plt.xlabel('hue = day')
plt.savefig('picture5')

with plt.xkcd():
sns.lmplot ('total_bill', 'tip', data=tips, hue= 'smoker')
plt.xlabel('hue = smoker')
plt.savefig('picture6')

sns.set_style('dark')
sns.set_context('talk')
sns.lmplot('size', 'total_ bill', tips, order=2)
plt.title('# poly order = 2')
plt.savefig ('picture7')
plt.figure()
sns.lmplot('size', 'total_bill', tips, order=3)
plt.title ('# poly order = 3')
plt.savefig('picture8')

sns.jointplot("total_bill", "tip", tips). savefig( 'picture9 ')

(¶þ)matplotlib ********´ýÍêÉÆ

Æß.ÆäËü~

(Ò»)µ÷ÓÃR

ÈÃPythonÖ±½Óµ÷ÓÃRµÄº¯Êý£¬ÏÂÔØ°²×°rpy2Ä£¿é¼´¿É~

¾ßÌå²½Ö裺http://www.geome.cn/posts/python-%E9%80%9A%E8%BF%87rpy2%E8%B0%83%E7%94%A8-r%E8%AF%AD%E8%A8%80/

   
3255 ´Îä¯ÀÀ       28
Ïà¹ØÎÄÕÂ

ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖÓë̽ÌÖ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
Ïà¹ØÎĵµ

Android_UI¹Ù·½Éè¼Æ½Ì³Ì
ÊÖ»ú¿ª·¢Æ½Ì¨½éÉÜ
androidÅÄÕÕ¼°ÉÏ´«¹¦ÄÜ
Android½²ÒåÖÇÄÜÊÖ»ú¿ª·¢
Ïà¹Ø¿Î³Ì

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
Androidϵͳ¿ª·¢
AndroidÓ¦Óÿª·¢
ÊÖ»úÈí¼þ²âÊÔ