ÒýÑÔºÍÊý¾Ý
»¶ÓÔĶÁ Python »úÆ÷ѧϰϵÁн̵̳Ļع鲿·Ö¡£ÕâÀÄãÓ¦¸ÃÒѾ°²×°ÁË Scikit-Learn¡£Èç¹ûûÓУ¬°²×°Ëü£¬ÒÔ¼°
Pandas ºÍ Matplotlib¡£
pip install numpy
pip install scipy
pip install scikit-learn
pip install matplotlib
pip install pandas |
³ýÁËÕâЩ½Ì³Ì·¶Î§µÄµ¼ÈëÖ®Í⣬ÎÒÃÇ»¹ÒªÔÚÕâÀïʹÓà Quandl£º
Ê×ÏÈ£¬¶ÔÓÚÎÒÃǽ«ÆäÓÃÓÚ»úÆ÷ѧϰ¶øÑÔ£¬Ê²Ã´ÊǻعéÄØ£¿ËüµÄÄ¿±êÊǽÓÊÜÁ¬ÐøÊý¾Ý£¬Ñ°ÕÒ×îÊʺÏÊý¾ÝµÄ·½³Ì£¬²¢Äܹ»¶ÔÌØ¶¨Öµ½øÐÐÔ¤²â¡£Ê¹Óüòµ¥µÄÏßÐԻع飬Äã¿ÉÒÔ½ö½öͨ¹ý´´½¨×î¼ÑÄâºÏÖ±Ïߣ¬À´ÊµÏÖËü¡£

ÕâÀÎÒÃÇ¿ÉÒÔʹÓÃÕâÌõÖ±Ïߵķ½³Ì£¬À´Ô¤²âδÀ´µÄ¼Û¸ñ£¬ÆäÖÐÈÕÆÚÊÇ x Öá¡£
»Ø¹éµÄÈÈÃÅÓ÷¨ÊÇÔ¤²â¹ÉƱ¼Û¸ñ¡£ÓÉÓÚÎÒÃǻῼÂǼ۸ñËæÊ±¼äµÄÁ÷¶¯£¬²¢ÇÒʹÓÃÁ¬ÐøµÄÊý¾Ý¼¯£¬³¢ÊÔÔ¤²âδÀ´µÄÏÂÒ»¸öÁ÷¶¯¼Û¸ñ£¬ËùÒÔ¿ÉÒÔÕâÑù×ö¡£
»Ø¹éÊǼලµÄ»úÆ÷ѧϰµÄÒ»ÖÖ£¬Ò²¾ÍÊÇ˵£¬¿ÆÑ§¼ÒÏòÆäÕ¹Ê¾ÌØÕ÷£¬Ö®ºóÏòÆäչʾÕýÈ·´ð°¸À´½Ì»á»úÆ÷¡£Ò»µ©½Ì»áÁË»úÆ÷£¬¿ÆÑ§¼Ò¾ÍÄܹ»Ê¹ÓÃһЩ²»¿É¼ûµÄÊý¾ÝÀ´²âÊÔ»úÆ÷£¬ÆäÖпÆÑ§¼ÒÖªµÀÕýÈ·´ð°¸£¬µ«ÊÇ»úÆ÷²»ÖªµÀ¡£»úÆ÷µÄ´ð°¸»áÓëÒÑÖª´ð°¸¶Ô±È£¬²¢ÇÒ¶ÈÁ¿»úÆ÷µÄ׼ȷÂÊ¡£Èç¹û׼ȷÂÊ×ã¹»¸ß£¬¿ÆÑ§¼Ò¾Í»á¿¼Âǽ«ÆäËã·¨ÓÃÓÚÕæÊµÊÀ½ç¡£
ÓÉÓڻعé¹ã·ºÓÃÓÚ¹ÉÆ±¼Û¸ñ£¬ÎÒÃÇ¿ÉÒÔʹÓÃÒ»¸öʾÀý´ÓÕâÀ↑ʼ¡£×ʼ£¬ÎÒÃÇÐèÒªÊý¾Ý¡£ÓÐʱºòÊý¾ÝÒ×ÓÚ»ñÈ¡£¬ÓÐʱÄãÐèÒª³öÈ¥²¢Ç××ÔÊÕ¼¯¡£ÎÒÃÇÕâÀÎÒÃÇÖÁÉÙÄܹ»ÒÔ¼òµ¥µÄ¹ÉƱ¼Û¸ñºÍ³É½»Á¿ÐÅÏ¢¿ªÊ¼£¬ËüÃÇÀ´×Ô
Quandl¡£ÎÒÃÇ»áץȡ Google µÄ¹ÉƱ¼Û¸ñ£¬ËüµÄ´úÂëÊÇGOOGL£º
import
pandas as pd
import quandl
df = quandl.get("WIKI/GOOGL")
print(df.head())
|
×¢Ò⣺дÕâÆªÎÄÕµÄʱºò£¬Quandl µÄÄ£¿éʹÓôóд Q ÒýÓ㬵«ÏÖÔÚÊÇСд q£¬ËùÒÔimport
quandl¡£
µ½ÕâÀÎÒÃÇÓµÓУº
Open
High Low Close Volume Ex-Dividend \
Date
2004-08-19 100.00 104.06 95.96 100.34 44659000
0
2004-08-20 101.01 109.08 100.50 108.31 22834300
0
2004-08-23 110.75 113.48 109.05 109.40 18256100
0
2004-08-24 111.24 111.60 103.57 104.87 15247300
0
2004-08-25 104.96 108.00 103.88 106.00 9188600
0
Split Ratio Adj. Open Adj. High Adj. Low Adj.
Close \
Date
2004-08-19 1 50.000 52.03 47.980 50.170
2004-08-20 1 50.505 54.54 50.250 54.155
2004-08-23 1 55.375 56.74 54.525 54.700
2004-08-24 1 55.620 55.80 51.785 52.435
2004-08-25 1 52.480 54.00 51.940 53.000
Adj. Volume
Date
2004-08-19 44659000
2004-08-20 22834300
2004-08-23 18256100
2004-08-24 15247300
2004-08-25 9188600
|
ÕâÊǸö·Ç³£ºÃµÄ¿ªÊ¼£¬ÎÒÃÇÓµÓÐÁËÊý¾Ý£¬µ«ÊÇÓеã¶àÁË¡£
ÕâÀÎÒÃÇÓкܶàÁУ¬Ðí¶à¶¼ÊǶàÓàµÄ£¬»¹ÓÐЩ²»Ôõô±ä»¯¡£ÎÒÃÇ¿ÉÒÔ¿´µ½£¬³£¹æºÍÐÞÕý£¨Adj£©µÄÁÐÊÇÖØ¸´µÄ¡£ÐÞÕýµÄÁп´ÆðÀ´¸ü¼ÓÀíÏë¡£³£¹æµÄÁÐÊǵ±ÌìµÄ¼Û¸ñ£¬µ«ÊÇ¹ÉÆ±Óиö½Ð×ö·Ö²ðµÄ¶«Î÷£¬ÆäÖÐÒ»¹ÉͻȻ¾Í±ä³ÉÁËÁ½¹É£¬ËùÒÔÒ»¹ÉµÄ¼Û¸ñÒª¼õ°ë£¬µ«Êǹ«Ë¾µÄ¼ÛÖµ²»±ä¡£ÐÞÕýµÄÁÐΪ¹ÉƱ·Ö²ð¶øµ÷Õû£¬ÕâʹµÃËüÃǶÔÓÚ·ÖÎö¸ü¼Ó¿É¿¿¡£
ËùÒÔ£¬ÈÃÎÒÃǼÌÐø£¬Ï÷¼õÔʼµÄ DataFrame¡£
df
= df[['Adj. Open', 'Adj. High', 'Adj. Low',
'Adj. Close', 'Adj. Volume']]
|
ÏÖÔÚÎÒÃÇÓµÓÐÁËÐÞÕýµÄÁУ¬ÒÔ¼°³É½»Á¿¡£ÓÐһЩ¶«Î÷ÐèҪעÒâ¡£Ðí¶àÈË̸ÂÛ»òÕßÌý˵»úÆ÷ѧϰ£¬¾ÍÏñÎÞÖÐÉúÓеĺÚħ·¨¡£»úÆ÷ѧϰ¿ÉÒÔÍ»³öÒÑÓеÄÊý¾Ý£¬µ«ÊÇÊý¾ÝÐèÒªÏÈ´æÔÚ¡£ÄãÐèÒªÓÐÒâÒåµÄÊý¾Ý¡£ËùÒÔÄãÔõô֪µÀÊÇ·ñÓÐÒâÒåÄØ£¿ÎÒµÄ×î¼Ñ½¨Òé¾ÍÊÇ£¬½ö½ö¼ò»¯ÄãµÄ´óÄÔ¡£¿¼ÂÇһϣ¬ÀúÊ·¼Û¸ñ»á¾ö¶¨Î´À´¼Û¸ñÂð£¿ÓÐЩÈËÕâôÈÏΪ£¬µ«ÊǾöø¾ÃÖ®Õⱻ֤ʵÊÇ´íÎóµÄ¡£µ«ÊÇÀúÊ·¹æÂÉÄØ£¿Í»³öµÄʱºò»áÓÐÒâÒ壨»úÆ÷ѧϰ»áÓÐËù°ïÖú£©£¬µ«ÊÇ»¹ÊÇÌ«ÈõÁË¡£ÄÇô£¬¼Û¸ñ±ä»¯ºÍ³É½»Á¿ËæÊ±¼äµÄ¹ØÏµ£¬ÔÙ¼ÓÉÏÀúÊ·¹æÂÉÄØ£¿¿ÉÄܸüºÃÒ»µã¡£ËùÒÔ£¬ÄãÒѾÄܹ»¿´µ½£¬²¢²»ÊÇÊý¾ÝÔ½¶àÔ½ºÃ£¬¶øÊÇÎÒÃÇÐèҪʹÓÃÓÐÓô¦µÄÊý¾Ý¡£Í¬Ê±£¬ÔʼÊý¾ÝÓ¦¸Ã×öһЩת»»¡£
¿¼ÂÇÿÈÕ²¨¶¯£¬ÀýÈç×î¸ß¼Û¼õ×îµÍ¼ÛµÄ°Ù·Ö±È²îÖµÈçºÎ£¿Ã¿Èյİٷֱȱ仯ÓÖÈçºÎÄØ£¿Äã¾õµÃOpen, High,
Low, CloseÕâÖÖ¼òµ¥Êý¾Ý£¬»¹ÊÇClose, Spread/Volatility, %change
daily¸üºÃ£¿ÎÒ¾õµÃºóÕ߸üºÃÒ»µã¡£Ç°Õß¶¼ÊǷdz£ÏàËÆµÄÊý¾Ýµã£¬ºóÕß»ùÓÚǰÕßµÄͳһÊý¾Ý´´½¨£¬µ«ÊÇ´øÓиü¼ÓÓмÛÖµµÄÐÅÏ¢¡£
ËùÒÔ£¬²¢²»ÊÇÄãÓµÓеÄËùÓÐÊý¾Ý¶¼ÊÇÓÐÓõ쬲¢ÇÒÓÐʱÄãÐèÒª¶ÔÄãµÄÊý¾ÝÖ´ÐнøÒ»²½µÄ²Ù×÷£¬²¢Ê¹Æä¸ü¼ÓÓмÛÖµ£¬Ö®ºó²ÅÄÜÌṩ¸ø»úÆ÷ѧϰËã·¨¡£ÈÃÎÒÃǼÌÐø²¢×ª»»ÎÒÃǵÄÊý¾Ý£º
df['HL_PCT']
= (df['Adj. High'] - df['Adj. Low']) / df['Adj.
Close'] * 100.0 |
Õâ»á´´½¨Ò»¸öеÄÁУ¬ËüÊÇ»ùÓÚÊÕÅ̼۵İٷֱȼ«²î£¬ÕâÊÇÎÒÃǶÔÓÚ²¨¶¯µÄ´Ö²Ú¶ÈÁ¿¡£ÏÂÃæ£¬ÎÒÃÇ»á¼ÆËãÿÈհٷֱȱ仯£º
df['PCT_change']
= (df['Adj. Close'] - df['Adj. Open']) / df['Adj.
Open'] * 100.0 |
ÏÖÔÚÎÒÃǻᶨÒåÒ»¸öÐ嵀 DataFrame£º
df
= df[['Adj. Close', 'HL_PCT', 'PCT_change',
'Adj. Volume']]
print(df.head()) |
Adj. Close HL_PCT PCT_change Adj. Volume
Date
2004-08-19 50.170 8.072553 0.340000 44659000
2004-08-20 54.155 7.921706 7.227007 22834300
2004-08-23 54.700 4.049360 -1.218962 18256100
2004-08-24 52.435 7.657099 -5.726357 15247300
2004-08-25 53.000 3.886792 0.990854 9188600 |
ÌØÕ÷ºÍ±êÇ©
»ùÓÚÉÏһƪ»úÆ÷ѧϰ»Ø¹é½Ì³Ì£¬ÎÒÃǽ«Òª¶ÔÎÒÃÇµÄ¹ÉÆ±¼Û¸ñÊý¾ÝÖ´Ðлع顣ĿǰµÄ´úÂ룺
import
quandl
import pandas as pd
df = quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low',
'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low'])
/ df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj.
Open']) / df['Adj. Open'] * 100.0
df = df[['Adj. Close', 'HL_PCT', 'PCT_change',
'Adj. Volume']]
print(df.head()) |
ÕâÀïÎÒÃÇÒѾ»ñÈ¡ÁËÊý¾Ý£¬ÅжϳöÓмÛÖµµÄÊý¾Ý£¬²¢Í¨¹ý²Ù×÷´´½¨ÁËһЩ¡£ÎÒÃÇÏÖÔÚÒѾ׼±¸ºÃʹÓûع鿪ʼ»úÆ÷ѧϰµÄ¹ý³Ì¡£Ê×ÏÈ£¬ÎÒÃÇÐèҪһЩ¸ü¶àµÄµ¼Èë¡£ËùÓеĵ¼ÈëÊÇ£º
import
quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation,
svm
from sklearn.linear_model import LinearRegression |
ÎÒÃÇ»áʹÓÃnumpyÄ£¿éÀ´½«Êý¾Ýת»»Îª NumPy Êý×飬ËüÊÇ Sklearn µÄÔ¤ÆÚ¡£ÎÒÃÇÔÚÓõ½preprocessingºÍcross_validationʱ£¬»áÉîÈë̸ÂÛËûÃÇ£¬µ«ÊÇÔ¤´¦ÀíÊÇÓÃÓÚÔÚ»úÆ÷ѧϰ֮ǰ£¬¶ÔÊý¾ÝÇåÏ´ºÍËõ·ÅµÄÄ£¿é¡£½»²æÑéÖ¤ÔÚ²âÊÔ½×¶ÎʹÓá£×îºó£¬ÎÒÃÇÒ²´Ó
Sklearn µ¼ÈëÁËLinearRegressionËã·¨£¬ÒÔ¼°svm¡£ËüÃÇÓÃ×÷ÎÒÃǵĻúÆ÷ѧϰËã·¨À´Õ¹Ê¾½á¹û¡£
ÕâÀÎÒÃÇÒѾ»ñÈ¡ÁËÎÒÃÇÈÏΪÓÐÓõÄÊý¾Ý¡£ÕæÊµµÄ»úÆ÷ѧϰÈçºÎ¹¤×÷ÄØ£¿Ê¹Óüලʽѧϰ£¬ÄãÐèÒªÌØÕ÷ºÍ±êÇ©¡£ÌØÕ÷¾ÍÊÇÃèÊöÐÔÊôÐÔ£¬±êÇ©¾ÍÊÇÄã³¢ÊÔÔ¤²âµÄ½á¹û¡£ÁíÒ»¸ö³£¼ûµÄ»Ø¹éʾÀý¾ÍÊdz¢ÊÔΪij¸öÈËÔ¤²â±£Ïյı£·Ñ¡£±£ÏÕ¹«Ë¾»áÊÕ¼¯ÄãµÄÄêÁä¡¢¼ÝʻΥ¹æÐÐΪ¡¢¹«¹²·¸×ï¼Ç¼£¬ÒÔ¼°ÄãµÄÐÅÓÃÆÀ·Ö¡£¹«Ë¾»áʹÓÃÀϿͻ§£¬»ñÈ¡Êý¾Ý£¬²¢µÃ³öÓ¦¸Ã¸ø¿Í»§µÄ¡°ÀíÏë±£·Ñ¡±£¬»òÕßÈç¹ûËûÃǾõµÃÓÐÀû¿ÉͼµÄ»°£¬ËûÃÇ»áʹÓÃʵ¼ÊʹÓõĿͻ§¡£
ËùÒÔ£¬¶ÔÓÚѵÁ·»úÆ÷ѧϰ·ÖÀàÆ÷À´Ëµ£¬ÌØÕ÷Êǿͻ§ÊôÐÔ£¬±êÇ©ÊǺÍÕâЩÊôÐÔÏà¹ØµÄ±£·Ñ¡£
ÎÒÃÇÕâÀʲôÊÇÌØÕ÷ºÍ±êÇ©ÄØ£¿ÎÒÃdz¢ÊÔÔ¤²â¼Û¸ñ£¬ËùÒÔ¼Û¸ñ¾ÍÊDZêÇ©£¿Èç¹ûÕâÑù£¬Ê²Ã´ÊÇÌØÕ÷ÄØ£¿¶ÔÓÚÔ¤²âÎÒÃǵļ۸ñÀ´Ëµ£¬ÎÒÃǵıêÇ©£¬¾ÍÊÇÎÒÃÇ´òËãÔ¤²âµÄ¶«Î÷£¬Êµ¼ÊÉÏÊÇδÀ´¼Û¸ñ¡£ÕâÑù£¬ÎÒÃǵÄÌØÕ÷ʵ¼ÊÉÏÊÇ£ºµ±Ç°¼Û¸ñ¡¢HL
°Ù·Ö±ÈºÍ°Ù·Ö±È±ä»¯¡£±êÇ©¼Û¸ñÊÇδÀ´Ä³¸öµãµÄ¼Û¸ñ¡£ÈÃÎÒÃǼÌÐøÌí¼ÓеÄÐУº
forecast_col
= 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df))) |
ÕâÀÎÒÃǶ¨ÒåÁËÔ¤²âÁУ¬Ö®ºóÎÒÃǽ«ÈκΠNaN Êý¾ÝÌî³äΪ -99999¡£¶ÔÓÚÈçºÎ´¦ÀíȱʧÊý¾Ý£¬ÄãÓÐһЩѡÔñ£¬Äã²»Äܽö½ö½«
NaN£¨²»ÊÇÊýÖµ£©Êý¾Ýµã´«¸ø»úÆ÷ѧϰ·ÖÀàÎ÷£¬ÄãÐèÒª´¦ÀíËü¡£Ò»¸öÖ÷Á÷Ñ¡Ïî¾ÍÊǽ«È±Ê§ÖµÌî³äΪ -99999¡£ÔÚÐí¶à»úÆ÷ѧϰ·ÖÀàÆ÷ÖУ¬»á½«ÆäÊDZ»ÎªÀëȺµã¡£ÄãÒ²¿ÉÒÔ½ö½ö¶ªÆú°üº¬È±Ê§ÖµµÄËùÓÐÌØÕ÷»ò±êÇ©£¬µ«ÊÇÕâÑùÄã¿ÉÄܻᶪµô´óÁ¿µÄÊý¾Ý¡£
ÕæÊµÊÀ½çÖУ¬Ðí¶àÊý¾Ý¼¯¶¼ºÜ»ìÂÒ¡£¶àÊý¹É¼Û»ò³É½»Á¿Êý¾Ý¶¼ºÜ¸É¾»£¬ºÜÉÙÓÐȱʧÊý¾Ý£¬µ«ÊÇÐí¶àÊý¾Ý¼¯»áÓдóÁ¿È±Ê§Êý¾Ý¡£ÎÒ¼û¹ýһЩÊý¾Ý¼¯£¬´óÁ¿µÄÐк¬ÓÐȱʧÊý¾Ý¡£Äã²¢²»Ò»¶¨ÏëҪʧȥËùÓв»´íµÄÊý¾Ý£¬Èç¹ûÄãµÄÑùÀýÊý¾ÝÓÐһЩȱʧ£¬Äã¿ÉÄÜ»á²Â²âÕæÊµÊÀ½çµÄÓÃÀýÒ²ÓÐһЩȱʧ¡£ÄãÐèҪѵÁ·¡¢²âÊÔ²¢ÒÀÀµÏàͬÊý¾Ý£¬ÒÔ¼°Êý¾ÝµÄÌØÕ÷¡£
×îºó£¬ÎÒÃǶ¨ÒåÎÒÃÇÐèÒªÔ¤²âµÄ¶«Î÷¡£Ðí¶àÇé¿öÏ£¬¾ÍÏñ³¢ÊÔÔ¤²â¿Í»§µÄ±£·ÑµÄ°¸ÀýÖУ¬Äã½ö½öÐèÒªÒ»¸öÊý×Ö£¬µ«ÊǶÔÓÚÔ¤²âÀ´Ëµ£¬ÄãÐèÒªÔ¤²âÖ¸¶¨ÊýÁ¿µÄÊý¾Ýµã¡£ÎÒÃǼÙÉèÎÒÃÇ´òËãÔ¤²âÊý¾Ý¼¯Õû¸ö³¤¶ÈµÄ
1%¡£Òò´Ë£¬Èç¹ûÎÒÃǵÄÊý¾ÝÊÇ 100 ÌìµÄ¹ÉƱ¼Û¸ñ£¬ÎÒÃÇÐèÒªÄܹ»Ô¤²âδÀ´Ò»ÌìµÄ¼Û¸ñ¡£Ñ¡ÔñÄãÏëÒªµÄÄǸö¡£Èç¹ûÄãÖ»Êdz¢ÊÔÔ¤²âÃ÷ÌìµÄ¼Û¸ñ£¬ÄãÓ¦¸ÃѡȡһÌìÖ®ºóµÄÊý¾Ý£¬¶øÇÒÒ²Ö»ÄÜÒ»ÌìÖ®ºóµÄÊý¾Ý¡£Èç¹ûÄã´òËãÔ¤²â
10 Ì죬ÎÒÃÇ¿ÉÒÔΪÿһÌìÉú³ÉÒ»¸öÔ¤²â¡£
ÎÒÃÇÕâÀÎÒÃǾö¶¨ÁË£¬ÌØÕ÷ÊÇһϵÁе±Ç°Öµ£¬±êÇ©ÊÇδÀ´µÄ¼Û¸ñ£¬ÆäÖÐδÀ´ÊÇÊý¾Ý¼¯Õû¸ö³¤¶ÈµÄ 1%¡£ÎÒÃǼÙÉèËùÓе±Ç°Áж¼ÊÇÎÒÃǵÄÌØÕ÷£¬ËùÒÔÎÒÃÇʹÓÃÒ»¸ö¼òµ¥µÄ
Pnadas ²Ù×÷Ìí¼ÓÒ»¸öеÄÁУº
df['label']
= df[forecast_col].shift(-forecast_out) |
ÏÖÔÚÎÒÃÇÓµÓÐÁËÊý¾Ý£¬°üº¬ÌØÕ÷ºÍ±êÇ©¡£ÏÂÃæÎÒÃÇÔÚʵ¼ÊÔËÐÐÈκζ«Î÷֮ǰ£¬ÎÒÃÇÐèÒª×öһЩԤ´¦ÀíºÍ×îÖÕ²½Ö裬ÎÒÃÇÔÚÏÂһƪ½Ì³Ì»á¹Ø×¢¡£
ѵÁ·ºÍ²âÊÔ
»¶ÓÔĶÁ Python »úÆ÷ѧϰϵÁн̵̳ĵÚËIJ¿·Ö¡£ÔÚÉÏÒ»¸ö½Ì³ÌÖУ¬ÎÒÃÇ»ñÈ¡Á˳õʼÊý¾Ý£¬°´ÕÕÎÒÃǵÄϲºÃ²Ù×÷ºÍת»»Êý¾Ý£¬Ö®ºóÎÒÃǶ¨ÒåÁËÎÒÃǵÄÌØÕ÷¡£Scikit
²»ÐèÒª´¦Àí Pandas ºÍ DataFrame£¬ÎÒ³öÓÚ×Ô¼ºµÄϲºÃ¶ø´¦ÀíËü£¬ÒòΪËü¿ì²¢ÇÒ¸ßЧ¡£·´Ö®£¬Sklearn
ʵ¼ÊÉÏÐèÒª NumPy Êý×é¡£Pandas µÄ DataFrame ¿ÉÒÔÇáÒ×ת»»Îª NumPy Êý×飬ËùÒÔÊÂÇé¾ÍÊÇÕâÑùµÄ¡£
ĿǰΪֹÎÒÃǵĴúÂ룺
import
quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation,
svm
from sklearn.linear_model import LinearRegression
df = quandl.get("WIKI/GOOGL")
print(df.head())
#print(df.tail())
df = df[['Adj. Open', 'Adj. High', 'Adj. Low',
'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj.
Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj.
Open']) / df['Adj. Open'] * 100.0
df = df[['Adj. Close', 'HL_PCT', 'PCT_change',
'Adj. Volume']]
print(df.head())
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out) |
ÎÒÃÇÖ®ºóÒª¶ªÆúËùÓÐÈÔ¾ÉÊÇ NaN µÄÐÅÏ¢¡£
¶ÔÓÚ»úÆ÷ѧϰÀ´Ëµ£¬Í¨³£Òª¶¨ÒåX£¨´óд£©×÷ÎªÌØÕ÷£¬ºÍy£¨Ð¡Ð´£©×÷Ϊ¶ÔÓÚÌØÕ÷µÄ±êÇ©¡£ÕâÑù£¬ÎÒÃÇ¿ÉÒÔ¶¨ÒåÎÒÃǵÄÌØÕ÷ºÍ±êÇ©£¬ÏñÕâÑù£º
X
= np.array(df.drop(['label'], 1))
y = np.array(df['label']) |
ÉÏÃæ£¬ÎÒÃÇËù×öµÄ¾ÍÊǶ¨ÒåX£¨ÌØÕ÷£©£¬ÊÇÎÒÃÇÕû¸öµÄ DataFrame£¬³ýÁËlabelÁУ¬²¢×ª»»Îª NumPy
Êý×é¡£ÎÒÃÇʹÓÃdrop·½·¨£¬¿ÉÒÔÓÃÓÚ DataFrame£¬Ëü·µ»ØÒ»¸öÐ嵀 DataFrame¡£ÏÂÃæ£¬ÎÒÃǶ¨ÒåÎÒÃǵÄy±äÁ¿£¬ËüÊÇÎÒÃǵıêÇ©£¬½ö½öÊÇ
DataFrame µÄ±êÇ©ÁУ¬²¢×ª»»Îª NumPy Êý×é¡£
ÏÖÔÚÎÒÃǾÍÄܸæÒ»¶ÎÂ䣬תÏòѵÁ·ºÍ²âÊÔÁË£¬µ«ÊÇÎÒÃÇ´òËã×öһЩԤ´¦Àí¡£Í¨³££¬ÄãÏ£ÍûÄãµÄÌØÕ÷ÔÚ -1 µ½
1 µÄ·¶Î§ÄÚ¡£Õâ¿ÉÄܲ»Æð×÷Ó㬵«ÊÇͨ³£»á¼ÓËÙ´¦Àí¹ý³Ì£¬²¢ÓÐÖúÓÚ׼ȷÐÔ¡£ÒòΪ´ó¼Ò¶¼Ê¹ÓÃÕâ¸ö·¶Î§£¬Ëü°üº¬ÔÚÁË
Sklearn µÄpreprocessingÄ£¿éÖС£ÎªÁËʹÓÃËü£¬ÄãÐèÒª¶ÔÄãµÄX±äÁ¿µ÷ÓÃpreprocessing.scale¡£
X
= preprocessing.scale(X) |
ÏÂÃæ£¬´´½¨±êÇ©y£º
y
= np.array(df['label']) |
ÏÖÔÚ¾ÍÊÇѵÁ·ºÍ²âÊÔµÄʱºòÁË¡£·½Ê½¾ÍÊÇѡȡ 75% µÄÊý¾ÝÓÃÓÚѵÁ·»úÆ÷ѧϰ·ÖÀàÆ÷¡£Ö®ºóѡȡʣÏ嵀 25%
µÄÊý¾ÝÓÃÓÚ²âÊÔ·ÖÀàÆ÷¡£ÓÉÓÚÕâÊÇÄãµÄÑùÀýÊý¾Ý£¬ÄãÓ¦¸ÃÓµÓÐÌØÕ÷ºÍÒ»Ö±±êÇ©¡£Òò´Ë£¬Èç¹ûÄã²âÊÔºó 25% µÄÊý¾Ý£¬Äã¾Í»áµÃµ½Ò»ÖÖ׼ȷ¶ÈºÍ¿É¿¿ÐÔ£¬½Ð×öÖÃÐŶȡ£ÓÐÐí¶à·½Ê½¿ÉÒÔʵÏÖËü£¬µ«ÊÇ£¬×îºÃµÄ·½Ê½¿ÉÄܾÍÊÇʹÓÃÄÚ½¨µÄcross_validation£¬ÒòΪËüÒ²»áΪÄã´òÂÒÊý¾Ý¡£´úÂëÊÇÕâÑù£º
X_train,
X_test, y_train, y_test = cross_validation.train_test_split(X,
y, test_size=0.2) |
ÕâÀïµÄ·µ»ØÖµÊÇÌØÕ÷µÄѵÁ·¼¯¡¢²âÊÔ¼¯¡¢±êÇ©µÄѵÁ·¼¯ºÍ²âÊÔ¼¯¡£ÏÖÔÚ£¬ÎÒÃÇÒѾ¶¨ÒåºÃÁË·ÖÀàÆ÷¡£Sklearn
ÌṩÁËÐí¶àͨÓõķÖÀàÆ÷£¬ÓÐһЩ¿ÉÒÔÓÃÓڻع顣ÎÒÃÇ»áÔÚÕâ¸öÀý×ÓÖÐչʾһЩ£¬µ«ÊÇÏÖÔÚ£¬ÈÃÎÒÃÇʹÓÃsvm°üÖеÄÖ§³ÖÏòÁ¿»Ø¹é¡£
ÎÒÃÇÕâÀï½ö½öʹÓÃĬÈÏÑ¡ÏîÀ´Ê¹ÊÂÇé¼òµ¥£¬µ«ÊÇÄã¿ÉÒÔÔÚsklearn.svm.SVRµÄÎĵµÖÐÁ˽â¸ü¶à¡£
Ò»µ©Ä㶨ÒåÁË·ÖÀàÆ÷£¬Äã¾Í¿ÉÒÔѵÁ·ËüÁË¡£ÔÚ Sklearn ÖУ¬Ê¹ÓÃfitÀ´ÑµÁ·¡£
clf.fit(X_train,
y_train)1 |
ÕâÀÎÒÃÇÄâºÏÁËÎÒÃǵÄѵÁ·ÌØÕ÷ºÍѵÁ·±êÇ©¡£
ÎÒÃǵķÖÀàÆ÷ÏÖÔÚѵÁ·Íê±Ï¡£Õâ·Ç³£¼òµ¥£¬ÏÖÔÚÎÒÃÇ¿ÉÒÔ²âÊÔÁË¡£
confidence
= clf.score(X_test, y_test) |
¼ÓÔØ²âÊÔ£¬Ö®ºó£º
print(confidence)
# 0.960075071072 |
ËùÒÔÕâÀÎÒÃÇ¿ÉÒÔ¿´µ½×¼È·Âʼ¸ºõÊÇ 96%¡£Ã»ÓÐʲô¿É˵µÄ£¬ÈÃÎÒÃdz¢ÊÔÁíÒ»¸ö·ÖÀàÆ÷£¬ÕâÒ»´ÎʹÓÃLinearRegression£º
clf
= LinearRegression()
# 0.963311624499 |
¸üºÃÒ»µã£¬µ«ÊÇ»ù±¾Ò»Ñù¡£ËùÒÔ×÷Ϊ¿ÆÑ§¼Ò£¬ÎÒÃÇÈçºÎÖªµÀ£¬Ñ¡ÔñÄĸöËã·¨ÄØ£¿²»¾Ã£¬Äã»áÊìϤʲôÔÚ¶àÊýÇé¿ö϶¼¹¤×÷£¬Ê²Ã´²»¹¤×÷¡£Äã¿ÉÒÔ´Ó
Scikit µÄÕ¾µãÉϲ鿴ѡÔñÕýÈ·µÄÆÀ¹À¹¤¾ß¡£ÕâÓÐÖúÓÚÄãä¯ÀÀһЩ»ù±¾µÄÑ¡Ïî¡£Èç¹ûÄãѯÎʸã»úÆ÷ѧϰµÄÈË£¬ËüÍêÈ«ÊÇÊÔÑéºÍ³ö´í¡£Äã»á³¢ÊÔ´óÁ¿µÄËã·¨²¢ÇÒ½ö½öѡȡ×îºÃµÄÄǸö¡£Òª×¢ÒâµÄÁíÒ»¼þÊÂÇé¾ÍÊÇ£¬Ò»Ð©Ëã·¨±ØÐëÏßÐÔÔËÐУ¬ÆäËüµÄ²»ÊÇ¡£²»Òª°ÑÏßÐԻعéºÍÏßÐÔÔËÐиã»ìÁË¡£ËùÒÔÕâЩÒâζ×ÅÊ²Ã´ÄØ£¿Ò»Ð©»úÆ÷ѧϰËã·¨»áÒ»´Î´¦ÀíÒ»²½£¬Ã»ÓжàỊ̈߳¬ÆäËüµÄʹÓöàỊ̈߳¬²¢ÇÒ¿ÉÒÔÀûÓÃÄã»úÆ÷ÉϵĶàºË¡£Äã¿ÉÒÔÉîÈëÁ˽âÿ¸öËã·¨£¬À´ÅªÇå³þÄĸö¿ÉÒÔ¶àỊ̈߳¬»òÕßÄã¿ÉÒÔÔĶÁÎĵµ£¬²¢²é¿´n_jobs²ÎÊý¡£Èç¹ûÓµÓÐn_jobs£¬Äã¾Í¿ÉÒÔÈÃË㷨ͨ¹ý¶àÏß³ÌÀ´»ñÈ¡¸ü¸ßµÄÐÔÄÜ¡£Èç¹ûûÓУ¬¾ÍºÜ²»×ßÔËÁË¡£ËùÒÔ£¬Èç¹ûÄã´¦Àí´óÁ¿µÄÊý¾Ý£¬»òÕßÐèÒª´¦ÀíÖеȹæÄ£µÄÊý¾Ý£¬µ«ÊÇÐèÒªºÜ¸ßµÄËÙ¶È£¬Äã¾Í¿ÉÄÜÏëÒªÏ̼߳ÓËÙ¡£ÈÃÎÒÃÇ¿´¿´ÕâÁ½¸öËã·¨¡£
·ÃÎÊsklearn.svm.SVRµÄÎĵµ£¬²¢²é¿´²ÎÊý£¬¿´µ½n_jobsÁËÂ·´ÕýÎÒû¿´µ½£¬ËùÒÔËü¾Í²»ÄÜʹÓÃÏ̡߳£Äã¿ÉÄܻῴµ½£¬ÔÚÎÒÃǵÄСÐÍÊý¾Ý¼¯ÉÏ£¬²îÒì²»´ó¡£µ«ÊÇ£¬¼ÙÉèÊý¾Ý¼¯ÓÉ
20MB£¬²îÒì¾ÍºÜÃ÷ÏÔ¡£È»ºó£¬ÎÒÃDz鿴LinearRegressionËã·¨£¬¿´µ½n_jobsÁËÂµ±È»£¬ËùÒÔÕâÀÄã¿ÉÒÔÖ¸¶¨ÄãÏ£Íû¶àÉÙÏ̡߳£Èç¹ûÄã´«Èë-1£¬Ëã·¨»áʹÓÃËùÓпÉÓõÄÏ̡߳£
ÕâÑù£º
clf
= LinearRegression(n_jobs=-1) |
¾Í¹»ÁË¡£ËäÈ»ÎÒÈÃÄã×öÁ˺ÜÉÙµÄÊÂÇ飨²é¿´Îĵµ£©£¬ÈÃÎÒ¸øÄã˵¸öÊÂʵ°É£¬½ö½öÓÉÓÚ»úÆ÷ѧϰË㷨ʹÓÃĬÈϲÎÊý¹¤×÷£¬²»´ú±íÄã¿ÉÒÔºöÂÔËüÃÇ¡£ÀýÈ磬ÈÃÎÒÃǻعËsvm.SVR¡£SVR
ÊÇÖ§³ÖÏòÁ¿»Ø¹é£¬ÔÚÖ´ÐлúÆ÷ѧϰʱ£¬ËüÊÇÒ»Öּܹ¹¡£Îҷdz£¹ÄÀøÄÇЩÓÐÐËȤѧϰ¸ü¶àµÄÈË£¬È¥Ñо¿Õâ¸öÖ÷Ì⣬ÒÔ¼°Ïò±ÈÎÒѧÀú¸ü¸ßµÄÈËѧϰ»ù´¡¡£ÎһᾡÁ¦°Ñ¶«Î÷½âÊ͵øü¼òµ¥£¬µ«ÊÇÎÒ²¢²»ÊÇר¼Ò¡£»Øµ½¸Õ²ÅµÄ»°Ì⣬svm.SVRÓÐÒ»¸ö²ÎÊý½Ð×ökernel¡£Õâ¸öÊÇÊ²Ã´ÄØ£¿ºË¾ÍÏ൱ÓÚÄãµÄÊý¾ÝµÄת»»¡£ÕâʹµÃ´¦Àí¹ý³Ì¸ü¼ÓѸËÙ¡£ÔÚsvm.SVRµÄÀý×ÓÖУ¬Ä¬ÈÏÖµÊÇrbf£¬ÕâÊǺ˵ÄÒ»¸öÀàÐÍ£¬ÄãÓÐһЩѡÔñ¡£²é¿´Îĵµ£¬Äã¿ÉÒÔÑ¡Ôñ'linear',
'poly', 'rbf', 'sigmoid', 'precomputed'»òÕßÒ»¸ö¿Éµ÷ÓöÔÏó¡£Í¬Ñù£¬¾ÍÏñ³¢ÊÔ²»Í¬µÄ
ML Ëã·¨Ò»Ñù£¬Äã¿ÉÒÔ×öÄãÏë×öµÄÈκÎÊÂÇ飬³¢ÊÔһϲ»Í¬µÄºË°É¡£
for
k in ['linear','poly','rbf','sigmoid']:
clf = svm.SVR(kernel=k)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(k,confidence)1
linear 0.960075071072
poly 0.63712232551
rbf 0.802831714511
sigmoid -0.125347960903
|
ÎÒÃÇ¿ÉÒÔ¿´µ½£¬ÏßÐԵĺ˱íÏÖ×îºÃ£¬Ö®ºóÊÇrbf£¬Ö®ºóÊÇpoly£¬sigmoidºÜÏÔÈ»ÊǸö°ÚÉ裬²¢ÇÒÓ¦¸ÃÒÆ³ý¡£
ËùÒÔÎÒÃÇѵÁ·²¢²âÊÔÁËÊý¾Ý¼¯¡£ÎÒÃÇÒѾÓÐ 71% µÄÂúÒâ¶ÈÁË¡£ÏÂÃæÎÒÃÇ×öÊ²Ã´ÄØ£¿ÏÖÔÚÎÒÃÇÐèÒªÔÙ½øÒ»²½£¬×öһЩԤ²â£¬ÏÂÒ»Õ»áÉæ¼°Ëü¡£
Ô¤²â
»¶ÓÔĶÁ»úÆ÷ѧϰϵÁн̵̳ĵÚÎåÕ£¬µ±Ç°Éæ¼°µ½»Ø¹é¡£Ä¿Ç°ÎªÖ¹£¬ÎÒÃÇÊÕ¼¯²¢ÐÞ¸ÄÁËÊý¾Ý£¬ÑµÁ·²¢²âÊÔÁË·ÖÀàÆ÷¡£ÕâÒ»ÕÂÖУ¬ÎÒÃÇ´òËãʹÓÃÎÒÃǵķÖÀàÆ÷À´Êµ¼Ê×öһЩԤ²â¡£ÎÒÃÇĿǰËùʹÓõĴúÂëΪ£º
import
quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation,
svm
from sklearn.linear_model import LinearRegression
df = quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low',
'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low'])
/ df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj.
Open']) / df['Adj. Open'] * 100.0
df = df[['Adj. Close', 'HL_PCT', 'PCT_change',
'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df['label'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,
y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(confidence) |
ÎÒ»áÇ¿µ÷£¬×¼È·ÂÊ´óÓÚ 95% µÄÏßÐÔÄ£ÐͲ¢²»ÊÇÄÇôºÃ¡£ÎÒµ±È»²»»áÓÃËüÀ´½»Ò׹ɯ±¡£ÈÔÈ»ÓÐһЩÐèÒª¿¼ÂǵÄÎÊÌâ£¬ÌØ±ðÊDz»Í¬¹«Ë¾Óв»Í¬µÄ¼Û¸ñ¹ì¼£¡£Google
·Ç³£ÏßÐÔ£¬ÏòÓÒÉϽÇÒÆ¶¯£¬Ðí¶à¹«Ë¾²»ÊÇÕâÑù£¬ËùÒÔÒª¼Çס¡£ÏÖÔÚ£¬ÎªÁË×öÔ¤²â£¬ÎÒÃÇÐèҪһЩÊý¾Ý¡£ÎÒÃǾö¶¨Ô¤²â
1% µÄÊý¾Ý£¬Òò´ËÎÒÃÇ´òË㣬»òÕßÖÁÉÙÄܹ»Ô¤²âÊý¾Ý¼¯µÄºó 1%¡£ËùÒÔÎÒÃÇʲô¿ÉÒÔÕâÑù×öÄØ£¿ÎÒÃÇʲôʱºò¿ÉÒÔʶ±ðÕâЩÊý¾Ý£¿ÎÒÃÇÏÖÔھͿÉÒÔ£¬µ«ÊÇҪעÒâÎÒÃdz¢ÊÔÔ¤²âµÄÊý¾Ý£¬²¢Ã»ÓÐÏñѵÁ·¼¯ÄÇÑùËõ·Å¡£ºÃµÄ£¬ÄÇô×öÊ²Ã´ÄØ£¿ÊÇ·ñÒª¶Ôºó
1% µ÷ÓÃpreprocessing.scale()£¿Ëõ·Å·½·¨»ùÓÚËùÓиøËüµÄÒÑÖªÊý¾Ý¼¯¡£ÀíÏëÇé¿öÏ£¬ÄãÓ¦¸ÃһͬËõ·ÅѵÁ·¼¯¡¢²âÊÔ¼¯ºÍÓÃÓÚÔ¤²âµÄÊý¾Ý¡£ÕâÓÀÔ¶ÊÇ¿ÉÄÜ»òºÏÀíµÄÂ²»ÊÇ£¬Èç¹ûÄã¿ÉÒÔÕâô×ö£¬Äã¾ÍÓ¦¸ÃÕâô×ö¡£µ«ÊÇ£¬ÎÒÃÇÕâÀÎÒÃÇ¿ÉÒÔÕâô×ö¡£ÎÒÃǵÄÊý¾Ý×㹻С£¬²¢ÇÒ´¦Àíʱ¼ä×ã¹»µÍ£¬ËùÒÔÎÒÃÇ»áÒ»´ÎÐÔÔ¤´¦Àí²¢Ëõ·ÅÊý¾Ý¡£
ÔÚÐí¶àÀý×ÓÖУ¬Äã²»ÄÜÕâô×ö¡£ÏëÏóÈç¹ûÄãʹÓü¸¸ö GB µÄÊý¾ÝÀ´ÑµÁ··ÖÀàÆ÷¡£ÑµÁ··ÖÀàÆ÷»á»¨·Ñ¼¸Ì죬²»ÄÜÔÚÿ´ÎÏëÒª×ö³öÔ¤²âµÄʱºò¶¼Õâô×ö¡£Òò´Ë£¬Äã¿ÉÄÜÐèÒª²»Ëõ·ÅÈκζ«Î÷£¬»òÕßµ¥¶ÀËõ·ÅÊý¾Ý¡£Í¨³££¬Äã¿ÉÄÜÏ£Íû²âÊÔÕâÁ½¸öÑ¡Ï²¢¿´¿´ÄǸö¶ÔÓÚÄãµÄÌØ¶¨°¸Àý¸üºÃ¡£
Òª¼ÇסËü£¬ÈÃÎÒÃÇÔÚ¶¨ÒåXµÄʱºò´¦ÀíËùÓÐÐУº
X
= np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df['label'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,
y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(confidence) |
ҪעÒâÎÒÃÇÊ×ÏÈ»ñÈ¡ËùÓÐÊý¾Ý£¬Ô¤´¦Àí£¬Ö®ºóÔٷָÎÒÃǵÄX_lately±äÁ¿°üº¬×î½üµÄÌØÕ÷£¬ÎÒÃÇÐèÒª¶ÔÆä½øÐÐÔ¤²â¡£Ä¿Ç°Äã¿ÉÒÔ¿´µ½£¬¶¨Òå·ÖÀàÆ÷¡¢ÑµÁ·¡¢ºÍ²âÊÔ¶¼·Ç³£¼òµ¥¡£Ô¤²â
forecast_set
= clf.predict(X_lately) |
Ò²·Ç³£¼òµ¥£º
forecast_setÊÇÔ¤²âÖµµÄÊý×飬±íÃ÷Äã²»½ö½ö¿ÉÒÔ×ö³öµ¥¸öÔ¤²â£¬»¹¿ÉÒÔÒ»´ÎÐÔÔ¤²â¶à¸öÖµ¡£¿´¿´ÎÒÃÇĿǰӵÓÐʲô£º
[
745.67829395 737.55633261 736.32921413 717.03929303
718.59047951
731.26376715 737.84381394 751.28161162 756.31775293
756.76751056
763.20185946 764.52651181 760.91320031 768.0072636
766.67038016
763.83749414 761.36173409 760.08514166 770.61581391
774.13939706
768.78733341 775.04458624 771.10782342 765.13955723
773.93369548
766.05507556 765.4984563 763.59630529 770.0057166
777.60915879] 0.956987938167 30 |
ËùÒÔÕâЩ¾ÍÊÇÎÒÃǵÄÔ¤²â½á¹û£¬È»ºóÄØ£¿ÒѾ»ù±¾Íê³ÉÁË£¬µ«ÊÇÎÒÃÇ¿ÉÒÔ½«Æä¿ÉÊÓ»¯¡£¹ÉƱ¼Û¸ñÊÇÿһÌìµÄ£¬Ò»ÖÜ
5 Ì죬ÖÜĩûÓС£ÎÒÖªµÀÕâ¸öÊÂʵ£¬µ«ÊÇÎÒÃÇ´òË㽫Æä¼ò»¯£¬°Ñÿ¸öÔ¤²âÖµµ±³ÉÿһÌìµÄ¡£Èç¹ûÄã´òËã´¦ÀíÖÜÄ©µÄ¼ä¸ô£¨²»ÒªÍüÁË¼ÙÆÚ£©£¬¾ÍÈ¥×ö°É£¬µ«ÊÇÎÒÕâÀï»á½«Æä¼ò»¯¡£×ʼ£¬ÎÒÃÇÌí¼ÓһЩеĵ¼È룺
import
datetime
import matplotlib.pyplot as plt
from matplotlib import style |
ÎÒµ¼ÈëÁËdatetimeÀ´´¦Àídatetime¶ÔÏó£¬Matplotlib µÄpyplot°üÓÃÓÚ»æÍ¼£¬ÒÔ¼°styleÀ´Ê¹ÎÒÃǵĻæÍ¼¸ü¼Óʱ÷Ö¡£ÈÃÎÒÃÇÉèÖÃÒ»¸öÑùʽ£º
Ö®ºó£¬ÎÒÃÇÌí¼ÓÒ»¸öеÄÁУ¬forecastÁУº
ÎÒÃÇÊ×ÏȽ«ÖµÉèÖÃΪ NaN£¬µ«ÊÇÎÒÃÇÖ®ºó»áÌî³äËû¡£
Ô¤²â¼¯µÄ±êÇ©ÕýºÃ´ÓÃ÷Ì쿪ʼ¡£ÒòΪÎÒÃÇÒªÔ¤²âδÀ´m = 0.1 * len(df)ÌìµÄÊý¾Ý£¬Ï൱ÓÚ°ÑÊÕÅ̼ÛÍùÇ°ÒÆ¶¯mÌìÉú³É±êÇ©¡£ÄÇôÊý¾Ý¼¯µÄºóm¸öÊDz»ÄÜÓÃ×÷ѵÁ·¼¯ºÍ²âÊÔ¼¯µÄ£¬ÒòΪûÓбêÇ©¡£ÓÚÊÇÎÒÃǽ«ºóm¸öÊý¾ÝÓÃ×÷Ô¤²â¼¯¡£Ô¤²â¼¯µÄµÚÒ»¸öÊý¾Ý£¬Ò²¾ÍÊÇÊý¾Ý¼¯µÄµÚn
- m¸öÊý¾Ý£¬ËüµÄ±êǩӦ¸ÃÊÇn - m + m = nÌìµÄÊÕÅ̼ۣ¬ÎÒÃÇÖªµÀ½ñÌìÔÚdfÀïÃæÊǵÚn -
1Ì죬ÄÇôËü¾ÍÊÇÃ÷Ìì¡£
ÎÒÃÇÊ×ÏÈÐèҪץȡ DataFrame µÄ×îºóÒ»Ì죬½«Ã¿Ò»¸öеÄÔ¤²âÖµ¸³¸øÐµÄÈÕÆÚ¡£ÎÒÃÇ»áÕâÑù¿ªÊ¼¡£
last_date
= df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day |
ÏÖÔÚÎÒÃÇÓµÓÐÁËÔ¤²â¼¯µÄÆðʼÈÕÆÚ£¬²¢ÇÒÒ»ÌìÓÐ 86400 Ãë¡£ÏÖÔÚÎÒÃǽ«Ô¤²âÌí¼Óµ½ÏÖÓÐµÄ DataFrame
ÖС£
for
i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i] |
ÎÒÃÇÕâÀïËù×öµÄÊÇ£¬µü´úÔ¤²â¼¯µÄ±êÇ©£¬»ñȡÿ¸öÔ¤²âÖµºÍÈÕÆÚ£¬Ö®ºó½«ÕâЩֵ·ÅÈë DataFrame£¨Ê¹Ô¤²â¼¯µÄÌØÕ÷Ϊ
NaN£©¡£×îºóÒ»ÐеĴúÂë´´½¨ DataFrame ÖеÄÒ»ÐУ¬ËùÓÐÔªËØÖÃΪ NaN£¬È»ºó½«×îºóÒ»¸öÔªËØÖÃΪi£¨ÕâÀïÊÇÔ¤²â¼¯µÄ±êÇ©£©¡£ÎÒÑ¡ÔñÁËÕâÖÖµ¥ÐеÄforÑ»·£¬ÒÔ±ãÔڸ͝
DataFrame ºÍÌØÕ÷Ö®ºó£¬´úÂ뻹ÄÜÕý³£¹¤×÷¡£ËùÓж«Î÷¶¼×öÍêÁËÂ𣿽«Æä»æÖƳöÀ´¡£
df['Adj.
Close'].plot()
df['Forecast'].plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show() |
ÍêÕûµÄ´úÂ룺
import
Quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation,
svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
style.use('ggplot')
df = Quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low',
'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low'])
/ df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj.
Open']) / df['Adj. Open'] * 100.0
df = df[['Adj. Close', 'HL_PCT', 'PCT_change',
'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df['label'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,
y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
forecast_set = clf.predict(X_lately)
df['Forecast'] = np.nan
last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day
for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]
df['Adj. Close'].plot()
df['Forecast'].plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show() |
½á¹û£º

±£´æºÍÀ©Õ¹
ÉÏһƪ½Ì³ÌÖУ¬ÎÒÃÇʹÓûعéÍê³ÉÁË¶Ô¹ÉÆ±¼Û¸ñµÄÔ¤²â£¬²¢Ê¹Óà Matplotlib ¿ÉÊÓ»¯¡£Õâ¸ö½Ì³ÌÖУ¬ÎÒÃÇ»áÌÖÂÛһЩ½ÓÏÂÀ´µÄ²½Öè¡£
ÎҼǵÃÎÒµÚÒ»´Î³¢ÊÔѧϰ»úÆ÷ѧϰµÄʱºò£¬¶àÊýʾÀý½ö½öÉæ¼°µ½ÑµÁ·ºÍ²âÊԵIJ¿·Ö£¬ÍêÈ«Ìø¹ýÁËÔ¤²â²¿·Ö¡£¶ÔÓÚÄÇЩ°üº¬ÑµÁ·¡¢²âÊÔºÍÔ¤²â²¿·ÖµÄ½Ì³ÌÀ´Ëµ£¬ÎÒûÓÐÕÒµ½Ò»Æª½âÊͱ£´æËã·¨µÄÎÄÕ¡£ÔÚÄÇЩÀý×ÓÖУ¬Êý¾Ýͨ³£·Ç³£Ð¡£¬ËùÒÔѵÁ·¡¢²âÊÔºÍÔ¤²â¹ý³Ì¶¼ºÜ¿ì¡£ÔÚÕæÊµÊÀ½çÖУ¬Êý¾Ý¶¼·Ç³£´ó£¬²¢ÇÒ»¨·Ñ¸ü³¤Ê±¼äÀ´´¦Àí¡£ÓÉÓÚûÓÐһƪ½Ì³ÌÕæÕý̸ÂÛµ½ÕâÒ»ÖØÒªµÄ¹ý³Ì£¬ÎÒ´òËã°üº¬Ò»Ð©´¦Àíʱ¼äºÍ±£´æËã·¨µÄÐÅÏ¢¡£
ËäÈ»ÎÒÃǵĻúÆ÷ѧϰ·ÖÀàÆ÷»¨·Ñ¼¸ÃëÀ´ÑµÁ·£¬ÔÚһЩÇé¿öÏ£¬ÑµÁ··ÖÀàÆ÷ÐèÒª¼¸¸öСʱÉõÖÁÊǼ¸Ìì¡£ÏëÏóÄãÏëÒªÔ¤²â¼Û¸ñµÄÿÌì¶¼ÐèÒªÕâô×ö¡£Õâ²»ÊDZØÒªµÄ£¬ÒòΪÎÒÃÇÄØ¿ÉÒÔʹÓÃ
Pickle Ä£¿éÀ´±£´æ·ÖÀàÆ÷¡£Ê×ÏÈÈ·±£Äãµ¼ÈëÁËËü£º
ʹÓà Pickle£¬Äã¿ÉÒÔ±£´æ Python ¶ÔÏ󣬾ÍÏñÎÒÃǵķÖÀàÆ÷ÄÇÑù¡£ÔÚ¶¨Ò塢ѵÁ·ºÍ²âÊÔÄãµÄ·ÖÀàÆ÷Ö®ºó£¬Ìí¼Ó£º
with
open('linearregression.pickle','wb') as f:
pickle.dump(clf, f) |
ÏÖÔÚ£¬ÔÙ´ÎÖ´Ðнű¾£¬ÄãÓ¦¸ÃµÃµ½ÁËlinearregression.pickle£¬ËüÊÇ·ÖÀàÆ÷µÄÐòÁл¯Êý¾Ý¡£ÏÖÔÚ£¬ÄãÐèÒª×öµÄËùÓÐÊÂÇé¾ÍÊǼÓÔØpickleÎļþ£¬½«Æä±£´æµ½clf£¬²¢ÕÕ³£Ê¹Óã¬ÀýÈ磺
pickle_in
= open('linearregression.pickle','rb')
clf = pickle.load(pickle_in) |
´úÂëÖУº
import
Quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation,
svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
import pickle
style.use('ggplot')
df = Quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low',
'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low'])
/ df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj.
Open']) / df['Adj. Open'] * 100.0
df = df[['Adj. Close', 'HL_PCT', 'PCT_change',
'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.1 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df['label'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,
y, test_size=0.2)
#COMMENTED OUT:
##clf = svm.SVR(kernel='linear')
##clf.fit(X_train, y_train)
##confidence = clf.score(X_test, y_test)
##print(confidence)
pickle_in = open('linearregression.pickle','rb')
clf = pickle.load(pickle_in)
forecast_set = clf.predict(X_lately)
df['Forecast'] = np.nan
last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day
for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]
df['Adj. Close'].plot()
df['Forecast'].plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show() |

ҪעÒâÎÒÃÇ×¢Ê͵ôÁË·ÖÀàÆ÷µÄÔʼ¶¨Ò壬²¢Ì滻Ϊ¼ÓÔØÎÒÃDZ£´æµÄ·ÖÀàÆ÷¡£¾ÍÊÇÕâô¼òµ¥¡£
×îºó£¬ÎÒÃÇÒªÌÖÂÛÒ»ÏÂЧÂʺͱ£´æÊ±¼ä£¬Ç°¼¸ÌìÎÒ´òËãÌá³öÒ»¸öÏà¶Ô½ÏµÍµÄ·¶Ê½£¬Õâ¾ÍÊÇÁÙʱµÄ³¬¼¶¼ÆËã»ú¡£ÑÏËàµØËµ£¬Ëæ×Ű´ÐèÖ÷»ú·þÎñµÄÐËÆð£¬ÀýÈç
AWS¡¢DO ºÍ Linode£¬ÄãÄܹ»°´ÕÕСʱÀ´¹ºÂòÖ÷»ú¡£ÐéÄâ·þÎñÆ÷¿ÉÒÔÔÚ 60 ÃëÄÚ½¨Á¢£¬ËùÐèµÄÄ£¿é¿ÉÒÔÔÚ
15 ·ÖÖÓÄÚ°²×°£¬ËùÒԷdz£ÓÐÏÞ¡£Äã¿ÉÒÔдһ¸ö shell ½Å±¾»òÕßʲô¶«Î÷À´¸øËü¼ÓËÙ¡£¿¼ÂÇÄãÐèÒª´óÁ¿µÄ´¦Àí£¬²¢ÇÒ»¹Ã»ÓÐһ̨¶¥¼¶¼ÆËã»ú£¬»òÕßÄãʹÓñʼDZ¾¡£Ã»ÓÐÎÊÌ⣬ֻÐèÒªÆô¶¯Ò»Ì¨·þÎñÆ÷¡£
ÎÒ¶ÔÕâ¸ö·½Ê½µÄ×îºóÒ»¸ö×¢½âÊÇ£¬Ê¹ÓÃÈκÎÖ÷»ú£¬Äãͨ³£¶¼¿ÉÒÔ½¨Á¢Ò»¸ö·Ç³£Ð¡Ð͵ķþÎñÆ÷£¬¼ÓÔØËùÐèµÄ¶«Î÷£¬Ö®ºóÀ©Õ¹Õâ¸ö·þÎñÆ÷¡£ÎÒϲ»¶ÒÔÒ»¸öСÐÍ·þÎñÆ÷¿ªÊ¼£¬Ö®ºó£¬ÎÒ×¼±¸ºÃµÄʱºò£¬ÎÒ»á¸Ä±äËüµÄ³ß´ç£¬¸øËüÉý¼¶¡£Íê³ÉÖ®ºó£¬²»ÒªÍüÁË×¢Ïú»òÕß½µ¼¶ÄãµÄ·þÎñÆ÷¡£
ÀíÂÛÒÔ¼°¹¤×÷ÔÀí
»¶ÓÔĶÁµÚÆßƪ½Ì³Ì¡£Ä¿Ç°ÎªÖ¹£¬ÄãÒѾ¿´µ½ÁËÏßÐԻعéµÄ¼ÛÖµ£¬ÒÔ¼°ÈçºÎʹÓà Sklearn À´Ó¦ÓÃËü¡£ÏÖÔÚÎÒÃÇ´òËãÉîÈëÁ˽âËüÈçºÎ¼ÆËã¡£ËäÈ»ÎÒ¾õµÃ²»±ØÒªÉîÈ뵽ÿ¸ö»úÆ÷ѧϰËã·¨ÊýѧÖУ¨ÄãÓÐûÓнøÈëµ½Äã×îϲ»¶µÄÄ£¿éµÄÔ´ÂëÖУ¬¿´¿´ËüÊÇÈçºÎʵÏֵģ¿£©£¬ÏßÐÔ´úÊýÊÇ»úÆ÷ѧϰµÄ±¾ÖÊ£¬²¢ÇÒ¶ÔÓÚÀí½â»úÆ÷ѧϰµÄ¹¹½¨»ù´¡Ê®·ÖʵÓá£
ÏßÐÔ´úÊýµÄÄ¿±êÊǼÆËãÏòÁ¿¿Õ¼äÖеĵãµÄ¹ØÏµ¡£Õâ¿ÉÒÔÓÃÓںܶàÊÂÇ飬µ«ÊÇijÌ죬ÓиöÈËÓÐÁ˸ö·Ç³£¿ñÒ°µÄÏë·¨£¬ÄÃËû´¦ÀíÊý¾Ý¼¯µÄÌØÕ÷¡£ÎÒÃÇÒ²¿ÉÒÔ¡£¼ÇµÃ֮ǰÎÒÃǶ¨ÒåÊý¾ÝÀàÐ͵Äʱºò£¬ÏßÐԻع鴦ÀíÁ¬ÐøÊý¾ÝÂð£¿Õâ²¢²»ÊÇÒòΪʹÓÃÏßÐԻعéµÄÈË£¬¶øÊÇÒòΪ×é³ÉËüµÄÊýѧ¡£¼òµ¥µÄÏßÐԻعé¿ÉÓÃÓÚѰÕÒÊý¾Ý¼¯µÄ×î¼ÑÄâºÏÖ±Ïß¡£Èç¹ûÊý¾Ý²»ÊÇÁ¬ÐøµÄ£¬¾Í²»ÊÇ×î¼ÑÄâºÏÖ±Ïß¡£ÈÃÎÒÃÇ¿´¿´Ò»Ð©Ê¾Àý¡£
з½²î

ÉÏÃæµÄͼÏñÏÔȻӵÓÐÁ¼ºÃµÄз½²î¡£Èç¹ûÄãͨ¹ý¹À¼Æ»Ò»Ìõ×î¼ÑÄâºÏÖ±Ïߣ¬ÄãÓ¦¸ÃÄܹ»ÇáÒ×»³öÀ´£º

Èç¹ûͼÏñÊÇÕâÑùÄØ£¿

²¢²»ºÍ֮ǰһÑù£¬µ«ÊÇÊÇÇå³þµÄ¸ºÏà¹Ø¡£Äã¿ÉÄÜÄܹ»»³ö×î¼ÑÄâºÏÖ±Ïߣ¬µ«ÊǸü¿ÉÄÜ»²»³öÀ´¡£
×îºó£¬Õâ¸öÄØ£¿

ɶ£¿µÄÈ·ÓÐ×î¼ÑÄâºÏÖ±Ïߣ¬µ«ÊÇÐèÒªÔËÆø½«Æä»³öÀ´¡£
½«ÉÏÃæµÄͼÏñ¿´×öÌØÕ÷µÄͼÏñ£¬ËùÒÔ X ×ø±êÊÇÌØÕ÷£¬Y ×ø±êÊÇÏà¹ØµÄ±êÇ©¡£X ºÍ Y ÊÇ·ñÓÐÈκÎÐÎʽµÄ½á¹¹»¯¹ØÏµÄØ£¿ËäÈ»ÎÒÃÇ¿ÉÒÔ׼ȷ¼ÆËã¹ØÏµ£¬Î´À´ÎÒÃǾͲ»Ì«¿ÉÄÜÓµÓÐÕâô¶àÖµÁË¡£
ÔÚÆäËüͼÏñµÄ°¸ÀýÖУ¬X ºÍ Y Ö®¼äÏÔÈ»´æÔÚ¹ØÏµ¡£ÎÒÃÇʵ¼ÊÉÏ¿ÉÒÔ̽Ë÷ÕâÖÖ¹ØÏµ£¬Ö®ºóÑØ×ÅÎÒÃÇÏ£ÍûµÄÈÎºÎµã»æÍ¼¡£ÎÒÃÇ¿ÉÒÔÄÃ
Y À´Ô¤²â X£¬»òÕßÄà X À´Ô¤²â Y£¬¶ÔÓÚÈκÎÎÒÃÇ¿ÉÒÔÏëµ½µÄµã¡£ÎÒÃÇÒ²¿ÉÒÔÔ¤²âÎÒÃǵÄÄ£ÐÍÓжàÉÙµÄÎó²î£¬¼´Ê¹Ä£ÐÍÖ»ÓÐÒ»¸öµã¡£ÎÒÃÇÈçºÎʵÏÖÕâ¸öħ·¨ÄØ£¿µ±È»ÊÇÏßÐÔ´úÊý¡£
Ê×ÏÈ£¬ÈÃÎÒÃǻص½ÖÐѧ£¬ÎÒÃÇÔÚÄÇÀ︴ϰֱÏߵ͍Ò壺y = mx + b£¬ÆäÖÐmÊÇбÂÊ£¬bÊÇ×ݽؾࡣÕâ¿ÉÒÔÊÇÓÃÓÚÇó½âyµÄ·½³Ì£¬ÎÒÃÇ¿ÉÒÔ½«Æä±äÐÎÀ´Çó½âx£¬Ê¹Óûù±¾µÄ´úÊýÔÔò£ºx
= (y-b)/m¡£
ºÃµÄ£¬ËùÒÔ£¬ÎÒÃǵÄÄ¿±êÊÇѰÕÒ×î¼ÑÄâºÏÖ±Ïß¡£²»Êǽö½öÊÇÄâºÏÁ¼ºÃµÄÖ±Ïߣ¬¶øÊÇ×îºÃµÄÄÇÌõ¡£ÕâÌõÖ±Ïߵ͍Òå¾ÍÊÇy
= mx + b¡£y¾ÍÊǴ𰸣¨ÎÒÃÇÆäËûµÄ×ø±ê£¬»òÕßÉõÖÁÊÇÎÒÃǵÄÌØÕ÷£©£¬ËùÒÔÎÒÃÇÈÔÈ»ÐèÒªm£¨Ð±ÂÊ£©ºÍb£¨×ݽؾࣩ£¬ÓÉÓÚx¿ÉÄÜÎªÑØ
x ÖáµÄÈÎÒ»µã£¬ËùÒÔËüÊÇÒÑÖªµÄ¡£
×î¼ÑÄâºÏÖ±ÏßµÄбÂÊm¶¨ÒåΪ£º

×¢£º¿É¼òдΪm = cov(x, y) / var(x)¡£
·ûºÅÉÏÃæµÄºá¸Ü´ú±í¾ùÖµ¡£Èç¹ûÁ½¸ö·ûºÅ°¤×Å£¬¾Í½«ÆäÏà³Ë¡£xs ºÍ ys
ÊÇËùÓÐÒÑÖª×ø±ê¡£ËùÒÔÎÒÃÇÏÖÔÚÇó³öÁËy=mx+b×î¼ÑÄâºÏÖ±Ïß¶¨ÒåµÄm£¨Ð±ÂÊ£©£¬ÏÖÔÚÎÒÃǽö½öÐèÒªb£¨×ݽؾࣩ¡£ÕâÀïÊǹ«Ê½£º

ºÃµÄ¡£Õû¸ö²¿·Ö²»ÊǸöÊýѧ½Ì³Ì£¬¶øÊǸö±à³Ì½Ì³Ì¡£ÏÂÒ»¸ö½Ì³ÌÖУ¬ÎÒÃÇ´òËãÕâÑù×ö£¬²¢ÇÒ½âÊÍΪʲôÎÒÒª±à³ÌʵÏÖËü£¬¶ø²»ÊÇÖ±½ÓÓÃÄ£¿é¡£
±à³Ì¼ÆËãбÂÊ
»¶ÓÔĶÁµÚ°Ëƪ½Ì³Ì£¬ÎÒÃǸոÕÒâʶµ½£¬ÎÒÃÇÐèҪʹÓà Python ÖØ¸´±àдһЩ±È½ÏÖØÒªµÄËã·¨£¬À´³¢ÊÔ¸ø¶¨Êý¾Ý¼¯µÄ¼ÆËã×î¼ÑÄâºÏÖ±Ïß¡£
ÔÚÎÒÃÇ¿ªÊ¼Ö®Ç°£¬ÎªÊ²Ã´ÎÒÃÇ»áÓÐһЩСÂ鷳Ĩ£¿ÏßÐԻعéÊÇ»úÆ÷ѧϰµÄ¹¹½¨»ù´¡¡£Ëü¼¸ºõÓÃÓÚÿ¸öµ¥¶ÀµÄÖ÷Á÷»úÆ÷ѧϰËã·¨Ö®ÖУ¬ËùÒÔ¶ÔËüµÄÀí½âÓÐÖúÓÚÄãÕÆÎÕ¶àÊýÖ÷Á÷»úÆ÷ѧϰËã·¨¡£³öÓÚÎÒÃǵÄÈÈÇ飬Àí½âÏßÐԻعéºÍÏßÐÔ´úÊý£¬ÊDZàдÄã×Ô¼ºµÄ»úÆ÷ѧϰËã·¨£¬ÒÔ¼°¿çÈë»úÆ÷Ñ§Ï°Ç°ÑØ£¬Ê¹Óõ±Ç°×î¼ÑµÄ´¦Àí¹ý³ÌµÄµÚÒ»²½¡£ÓÉÓÚ´¦Àí¹ý³ÌµÄÓÅ»¯ºÍÓ²¼þ¼Ü¹¹µÄ¸Ä±ä¡£ÓÃÓÚ»úÆ÷ѧϰµÄ·½·¨ÂÛÒ²»á¸Ä±ä¡£×î½ü³öÏÖµÄÉñ¾ÍøÂ磬ʹÓôóÁ¿
GPU À´Íê³É¹¤×÷¡£ÄãÏëÖªµÀʲôÊÇÉñ¾ÍøÂçµÄºËÐÄÂð£¿Äã²Â¶ÔÁË£¬ÏßÐÔ´úÊý¡£
Èç¹ûÄãÄܼǵã¬×î¼ÑÄâºÏÖ±ÏßµÄбÂÊm£º

Êǵģ¬ÎÒÃǻὫÆä²ð³ÉƬ¶Î¡£Ê×ÏÈ£¬½øÐÐһЩµ¼È룺
from
statistics import mean
import numpy as np |
ÎÒÃÇ´Óstatisticsµ¼Èëmean£¬ËùÒÔÎÒÃÇ¿ÉÒÔÇáÒ×»ñÈ¡ÁбíµÄ¾ùÖµ¡£ÏÂÃæ£¬ÎÒÃÇʹnumpy
as np£¬ËùÒÔÎÒÃÇ¿ÉÒÔÆä´´½¨ NumPy Êý×é¡£ÎÒÃÇ¿ÉÒÔ¶ÔÁбí×öºÜ¶àÊÂÇ飬µ«ÊÇÎÒÃÇÐèÒªÄܹ»×öһЩ¼òµ¥µÄ¾ØÕóÔËË㣬Ëü²¢²»¶Ô¼òµ¥ÁбíÌṩ£¬ËùÒÔÎÒÃÇʹÓÃ
NumPy¡£ÎÒÃÇÔÚÕâ¸ö½×¶Î²»»áʹÓÃÌ«¸´Ô NumPy£¬µ«ÊÇÖ®ºó NumPy ¾Í»á³ÉΪÄãµÄ×î¼Ñ»ï°é¡£ÏÂÃæ£¬ÈÃÎÒÃǶ¨ÒåһЩÆðʼµã°É¡£
xs
= [1,2,3,4,5]
ys = [5,4,6,5,6] |
ËùÒÔÕâÀïÓÐһЩÎÒÃÇҪʹÓõÄÊý¾Ýµã£¬xsºÍys¡£Äã¿ÉÒÔÈÏΪxs¾ÍÊÇÌØÕ÷£¬ys¾ÍÊDZêÇ©£¬»òÕßËûÃǶ¼ÊÇÌØÕ÷£¬ÎÒÃÇÏëÒª½¨Á¢ËûÃǵÄÁªÏµ¡£Ö®Ç°Ìáµ½¹ý£¬ÎÒÃÇʵ¼ÊÉϰÑËüÃDZä³É
NumPy Êý×飬ÒÔ±ãÖ´ÐоØÕóÔËËã¡£ËùÒÔÈÃÎÒÃÇÐÞ¸ÄÕâÁ½ÐУº
xs
= np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64) |
ÏÖÔÚËûÃǶ¼ÊÇ NumPy Êý×éÁË¡£ÎÒÃÇÒ²ÏÔʽÉùÃ÷ÁËÊý¾ÝÀàÐÍ¡£¼òµ¥½²Ò»Ï£¬Êý¾ÝÀàÐÍÓÐÌØÐÔÊÇÊôÐÔ£¬ÕâЩÊôÐÔ¾ö¶¨ÁËÊý¾Ý±¾ÉíÈçºÎ´¢´æºÍ²Ù×÷¡£ÏÖÔÚËü²»ÊÇʲôÎÊÌ⣬µ«ÊÇÈç¹ûÎÒÃÇÖ´ÐдóÁ¿ÔËË㣬²¢Ï£ÍûËûÃÇÅÜÔÚ
GPU ¶ø²»ÊÇ CPU ÉϾÍÊÇÁË¡£
½«Æä»³öÀ´£¬ËûÃÇÊÇ£º

ÏÖÔÚÎÒÃÇ×¼±¸ºÃ¹¹½¨º¯ÊýÀ´¼ÆËãm£¬Ò²¾ÍÊÇÎÒÃǵÄÖ±ÏßбÂÊ£º
def
best_fit_slope(xs,ys):
return m
m = best_fit_slope(xs,ys) |
ºÃÁË¡£¿ª¸öÍæÐ¦£¬ËùÒÔÕâÊÇÎÒÃǵĿò¼Ü£¬ÏÖÔÚÎÒÃÇÒªÌî³äÁË¡£
ÎÒÃǵĵÚÒ»¸öÂß¼¾ÍÊǼÆËãxsµÄ¾ùÖµ£¬ÔÙ³ËÉÏysµÄ¾ùÖµ¡£¼ÌÐøÌî³äÎÒÃǵĿò¼Ü£º
def
best_fit_slope(xs,ys):
m = (mean(xs) * mean(ys))
return m |
ĿǰΪֹ»¹ºÜ¼òµ¥¡£Äã¿ÉÒÔ¶ÔÁÐ±í¡¢Ôª×é»òÕßÊý×éʹÓÃmeanº¯Êý¡£Òª×¢ÒâÎÒÕâÀïʹÓÃÁËÀ¨ºÅ¡£Python
µÄ×ñÑÔËËã·ûµÄÊýѧÓÅÏȼ¶¡£ËùÒÔÈç¹ûÄã´òË㱣֤˳Ðò£¬ÒªÏÔʽʹÓÃÀ¨ºÅ¡£Òª¼ÇסÄãµÄÔËËã¹æÔò¡£
ÏÂÃæÎÒÃÇÐèÒª½«Æä¼õÈ¥x*yµÄ¾ùÖµ¡£Õâ¼ÈÊÇÎÒÃǵľØÕóÔËËãmean(xs*ys)¡£ÏÖÔڵĴúÂëÊÇ£º
def
best_fit_slope(xs,ys):
m = ( (mean(xs)*mean(ys)) - mean(xs*ys) )
return m |
ÎÒÃÇÍê³ÉÁ˹«Ê½µÄ·Ö×Ó²¿·Ö£¬ÏÖÔÚÎÒÃǼÌÐø´¦ÀíµÄ·Öĸ£¬ÒÔxµÄ¾ùֵƽ·½¿ªÊ¼£º(mean(xs)*mean(xs))¡£Python
Ö§³Ö** 2£¬Äܹ»´¦ÀíÎÒÃÇµÄ NumPy Êý×éµÄfloat64ÀàÐÍ¡£Ìí¼ÓÕâЩ¶«Î÷£º
def
best_fit_slope(xs,ys):
m = ( ((mean(xs)*mean(ys)) - mean(xs*ys)) /
(mean(xs)**2))
return m |
ËäÈ»¸ù¾ÝÔËËã·ûÓÅÏȼ¶£¬ÏòÕû¸ö±í´ïʽÌí¼ÓÀ¨ºÅÊDz»±ØÒªµÄ¡£ÎÒÕâÀïÕâÑù×ö£¬ËùÒÔÎÒ¿ÉÒÔÔÚ³ý·¨ºóÃæÌí¼ÓÒ»ÐУ¬Ê¹Õû¸öʽ×Ó¸ü¼ÓÒ×¶ÁºÍÒ×Àí½â¡£²»ÕâÑùµÄ»°£¬ÎÒÃÇ»áÔÚеÄÒ»Ðеõ½Óï·¨´íÎó¡£ÎÒÃǼ¸ºõÍê³ÉÁË£¬ÏÖÔÚÎÒÃÇÖ»ÐèÒª½«xµÄ¾ùֵƽ·½ºÍxµÄƽ·½¾ùÖµ£¨mean(xs*xs)£©Ïà¼õ¡£È«²¿´úÂëΪ£º
def
best_fit_slope(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)**2) - mean(xs*xs)))
return m |
ºÃµÄ£¬ÏÖÔÚÎÒÃǵÄÍêÕû½Å±¾Îª£º
from
statistics import mean
import numpy as np
xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)
def best_fit_slope(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)**2) - mean(xs**2)))
return m
m = best_fit_slope(xs,ys)
print(m)
# 0.3 |
ÏÂÃæ¸Éʲô£¿ÎÒÃÇÐèÒª¼ÆËã×ݽؾàb¡£ÎÒÃÇ»áÔÚÏÂÒ»¸ö½Ì³ÌÖд¦ÀíËü£¬²¢Íê³ÉÍêÕûµÄ×î¼ÑÄâºÏÖ±Ïß¼ÆËã¡£Ëü±ÈбÂʸü¼ÑÒ×ÓÚ¼ÆË㣬³¢ÊÔ±àдÄã×Ô¼ºµÄº¯ÊýÀ´¼ÆËãËü¡£Èç¹ûÄã×öµ½ÁË£¬Ò²²»ÒªÌø¹ýÏÂÒ»¸ö½Ì³Ì£¬ÎÒÃÇ»á×öһЩ±ðµÄÊÂÇé¡£
¼ÆËã×ݽؾà
»¶ÓÔĶÁµÚ¾Åƪ½Ì³Ì¡£ÎÒÃǵ±Ç°ÕýÔÚΪ¸ø¶¨µÄÊý¾Ý¼¯£¬Ê¹Óà Python
¼ÆËã»Ø¹é»òÕß×î¼ÑÄâºÏÖ±Ïß¡£Ö®Ç°£¬ÎÒÃDZàдÁËÒ»¸öº¯ÊýÀ´¼ÆËãбÂÊ£¬ÏÖÔÚÎÒÃÇÐèÒª¼ÆËã×ݽؾࡣÎÒÃÇĿǰµÄ´úÂëÊÇ£º
from
statistics import mean
import numpy as np
xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)
def best_fit_slope(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))
return m
m = best_fit_slope(xs,ys)
print(m) |
Çë»ØÒ䣬×î¼ÑÄâºÏÖ±ÏßµÄ×ݽؾàÊÇ£º

Õâ¸ö±ÈбÂʼòµ¥¶àÁË¡£ÎÒÃÇ¿ÉÒÔ½«Æäдµ½Í¬Ò»¸öº¯ÊýÀ´½ÚÊ¡¼¸ÐдúÂë¡£ÎÒÃǽ«º¯ÊýÖØÃüÃûΪbest_fit_slope_and_intercept¡£
ÏÂÃæ£¬ÎÒÃÇ¿ÉÒÔÌî³äb = mean(ys) - (m*mean(xs))£¬²¢·µ»Øm, b£º
def
best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))
b = mean(ys) - m*mean(xs)
return m, b |
ÏÖÔÚÎÒÃÇ¿ÉÒÔµ÷ÓÃËü£º
best_fit_slope_and_intercept(xs,ys) |
ÎÒÃÇĿǰΪֹµÄ´úÂ룺
from
statistics import mean
import numpy as np
xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)
def best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))
b = mean(ys) - m*mean(xs)
return m, b
m, b = best_fit_slope_and_intercept(xs,ys)
print(m,b)
# 0.3, 4.3 |
ÏÖÔÚÎÒÃǽö½öÐèҪΪÊý¾Ý´´½¨Ò»ÌõÖ±Ïߣº

Òª¼Çסy=mx+b£¬ÎÒÃÇÄܹ»Îª´Ë±àдһ¸öº¯Êý£¬»òÕß½ö½öʹÓÃÒ»ÐеÄforÑ»·¡£
regression_line
= [(m*x)+b for x in xs] |
ÉÏÃæµÄÒ»ÐÐforÑ»·ºÍÕâ¸öÏàͬ£º
regression_line
= []
for x in xs:
regression_line.append((m*x)+b) |
ºÃµÄ£¬ÈÃÎÒÃÇÊÕÈ¡ÎÒÃǵÄÀͶ¯¹ûʵ°É¡£Ìí¼ÓÏÂÃæµÄµ¼È룺
import
matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot') |
ÎÒÃÇ¿ÉÒÔ»æÖÆÍ¼Ïñ£¬²¢ÇÒ²»»áÌØ±¸ÄÑ¿´¡£ÏÖÔÚ£º
plt.scatter(xs,ys,color='#003F72')
plt.plot(xs, regression_line)
plt.show() |
Ê×ÏÈÎÒÃÇ»æÖÆÁËÏÖÓÐÊý¾ÝµÄÉ¢µãͼ£¬Ö®ºóÎÒÃÇ»æÖÆÁËÎÒÃǵĻعéÖ±Ïߣ¬Ö®ºóչʾËü¡£Èç¹ûÄã²»ÊìϤ£¬¿ÉÒԲ鿴 Matplotlib
½Ì³Ì¼¯¡£
Êä³ö£º

¹§Ï²¹§Ï²¡£ËùÒÔ£¬ÈçºÎ»ù´¡Õâ¸öÄ£ÐÍÀ´×öһЩʵ¼ÊµÄÔ¤²âÄØ£¿ºÜ¼òµ¥£¬ÄãÓµÓÐÁËÄ£ÐÍ£¬Ö»ÒªÌî³äx¾ÍÐÐÁË¡£ÀýÈ磬ÈÃÎÒÃÇÔ¤²âһЩµã£º
ÎÒÃÇÊäÈëÁËÊý¾Ý£¬Ò²¾ÍÊÇÎÒÃǵÄÌØÕ÷¡£ÄÇô±êÇ©ÄØ£¿
predict_y
= (m*predict_x)+b
print(predict_y)
# 6.4 |
ÎÒÃÇÒ²¿ÉÒÔ»æÖÆËü£º
predict_x
= 7
predict_y = (m*predict_x)+b
plt.scatter(xs,ys,color='#003F72',label='data')
plt.plot(xs, regression_line, label='regression
line')
plt.legend(loc=4)
plt.show() |
Êä³ö£º

ÎÒÃÇÏÖÔÚÖªµÀÁËÈçºÎ´´½¨×Ô¼ºµÄÄ£ÐÍ£¬ÕâºÜºÃ£¬µ«ÊÇÎÒÃÇÈÔ¾ÉȱÉÙÁËһЩ¶«Î÷£¬ÎÒÃǵÄÄ£ÐÍÓжྫȷ£¿Õâ¾ÍÊÇÏÂÒ»¸ö½Ì³ÌµÄ»°ÌâÁË¡£
R ƽ·½ºÍÅж¨ÏµÊýÔÀí
»¶ÓÔĶÁµÚʮƪ½Ì³Ì¡£ÎÒÃǸոÕÍê³ÉÁËÏßÐÔÄ£Ð͵Ĵ´½¨ºÍ´¦Àí£¬ÏÖÔÚÎÒÃÇºÃÆæ½ÓÏÂÀ´Òª¸Éʲô¡£ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔÇáÒ×¹Û²ìÊý£¬²¢¾ö¶¨ÏßÐԻعéÄ£ÐÍÓжàô׼ȷ¡£µ«ÊÇ£¬Èç¹ûÄãµÄÏßÐԻعéÄ£ÐÍÊÇÄÃÉñ¾ÍøÂçµÄ
20 ¸ö²ã¼¶×ö³öÀ´µÄÄØ£¿²»½ö½öÊÇÕâÑù£¬ÄãµÄÄ£ÐÍÒÔ²½Öè»òÕß´°¿Ú¹¤×÷£¬Ò²¾ÍÊÇÒ»¹² 5 °ÙÍò¸öÊý¾Ýµã£¬Ò»´ÎÖ»ÏÔʾ
100 ¸ö£¬»áÔõôÑù£¿ÄãÐèҪһЩ×Ô¶¯»¯µÄ·½Ê½À´ÅжÏÄãµÄ×î¼ÑÄâºÏÖ±ÏßÓжàºÃ¡£
»ØÒä֮ǰ£¬ÎÒÃÇչʾ¼¸¸ö»æÍ¼µÄʱºò£¬ÄãÒѾ¿´µ½£¬×î¼ÑÄâºÏÖ±Ïߺû¹ÊDz»ºÃ¡£ÏñÕâÑù£º

ÓëÕâ¸öÏà±È£º

µÚ¶þÕÅͼƬÖУ¬µÄÈ·ÓÐ×î¼ÑÄâºÏÖ±Ïߣ¬µ«ÊÇûÓÐÈËÔÚÒâ¡£¼´Ê¹ÊÇ×î¼ÑÄâºÏÖ±ÏßÒ²ÊÇûÓÐÓõġ£²¢ÇÒ£¬ÎÒÃÇÏëÔÚ»¨·Ñ´óÁ¿¼ÆËãÄÜÁ¦Ö®Ç°¾ÍÖªµÀËü¡£
¼ì²éÎó²îµÄ±ê×¼·½Ê½¾ÍÊÇʹÓÃÆ½·½Îó²î¡£Äã¿ÉÄÜ֮ǰÌý˵¹ý£¬Õâ¸ö·½·¨½Ð×ö
R ƽ·½»òÕßÅж¨ÏµÊý¡£Ê²Ã´½Ðƽ·½Îó²îÄØ£¿

»Ø¹éÖ±ÏߺÍÊý¾ÝµÄyÖµµÄ¾àÀ룬¾Í½Ð×öÎó²î£¬ÎÒÃǽ«Æäƽ·½¡£Ö±Ïߵį½·½Îó²îÊÇËüÃÇµÄÆ½¾ù»òÕߺ͡£ÎÒÃǼòµ¥ÇóºÍ°É¡£
ÎÒÃÇʵ¼ÊÉÏÒѾ½â³ýÁËÆ½·½Îó²î¼ÙÉè¡£ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïß·½³Ì£¬ÓÃÓÚ¼ÆËã×î¼ÑÄâºÏ»Ø¹éÖ±Ïߣ¬¾ÍÊÇÖ¤Ã÷½á¹û¡£ÆäÖлعéÖ±Ïß¾ÍÊÇÓµÓÐ×îСƽ·½Îó²îµÄÖ±Ïߣ¨ËùÒÔËü²Å½Ð×ö×îС¶þ³Ë·¨£©¡£Äã¿ÉÒÔËÑË÷¡°»Ø¹éÖ¤Ã÷¡±£¬»òÕß¡°×î¼ÑÄâºÏÖ±ÏßÖ¤Ã÷¡±À´Àí½âËü¡£ËüºÜÒÖÓôÀí½â£¬µ«ÊÇÐèÒª´úÊý±äÐÎÄÜÁ¦À´µÃ³ö½á¹û¡£
ΪɶÊÇÆ½·½Îó²î£¿ÎªÊ²Ã´²»½ö½ö½«Æä¼ÓÆðÀ´£¿Ê×ÏÈ£¬ÎÒÃÇÏëÒªÒ»ÖÖ·½Ê½£¬½«Îó²î¹æ·¶»¯Îª¾àÀ룬ËùÒÔÎó²î¿ÉÄÜÊÇ
-5£¬µ«ÊÇ£¬Æ½·½Ö®ºó£¬Ëü¾ÍÊÇÕýÊýÁË¡£ÁíÒ»¸öÔÒòÊÇÒª½øÒ»²½³Í·£ÀëȺµã¡£½øÒ»²½µÄÒâ˼ÊÇ£¬ËüÓ°ÏìÎó²îµÄ³Ì¶È¸ü´ó¡£Õâ¾ÍÊÇÈËÃÇËùʹÓõıê×¼·½Ê½¡£ÄãÒ²¿ÉÒÔʹÓÃ4,
6, 8µÄÃÝ£¬»òÕ߯äËû¡£ÄãÒ²¿ÉÒÔ½ö½öʹÓÃÎó²îµÄ¾ø¶ÔÖµ¡£Èç¹ûÄãÖ»ÓÐÒ»¸öÌôÕ½£¬Ò²Ðí¾ÍÊÇ´æÔÚһЩÀëȺµã£¬µ«ÊÇÄã²¢²»´òËã¹ÜËüÃÇ£¬Äã¾Í¿ÉÒÔ¿¼ÂÇʹÓþø¶ÔÖµ¡£Èç¹ûÄã±È½ÏÔÚÒâÀëȺµã£¬Äã¾Í¿ÉÒÔʹÓøü¸ß½×µÄÖ¸Êý¡£ÎÒÃÇ»áʹÓÃÆ½·½£¬ÒòΪÕâÊÇ´ó¶àÊýÈËËùʹÓõġ£
ºÃµÄ£¬ËùÒÔÎÒÃǼÆËã»Ø¹éÖ±Ïߵį½·½Îó²î£¬Ê²Ã´¼ÆËãÄØ£¿ÕâÊÇʲôÒâ˼£¿Æ½·½Îó²îÍêÈ«ºÍÊý¾Ý¼¯Ïà¹Ø£¬ËùÒÔÎÒÃDz»ÔÙÐèÒª±ðµÄ¶«Î÷ÁË¡£Õâ¾ÍÊÇ
R ƽ·½ÒýÈëµÄʱºòÁË£¬Ò²½Ð×÷Åж¨ÏµÊý¡£·½³ÌÊÇ£º

y_hat
= x * m + b
r_sq = 1 - np.sum((y - y_hat) ** 2) / np.sum((y
- y.mean()) ** 2) |
Õâ¸ö·½³ÌµÄµÄ±¾ÖʾÍÊÇ£¬1 ¼õÈ¥»Ø¹éÖ±Ïߵį½·½Îó²î£¬±ÈÉÏ y ƽ¾ùÖ±Ïߵį½·½Îó²î¡£ y ƽ¾ùÖ±Ïß¾ÍÊÇÊý¾Ý¼¯ÖÐËùÓÐ
y ÖµµÄ¾ùÖµ£¬Èç¹ûÄ㽫Æä»³öÀ´£¬ËüÊÇÒ»¸öˮƽµÄÖ±Ïß¡£ËùÒÔ£¬ÎÒÃǼÆËã y ƽ¾ùÖ±Ïߣ¬ºÍ»Ø¹éÖ±Ïߵį½·½Îó²î¡£ÕâÀïµÄÄ¿±êÊÇʶ±ð£¬ÓëÇ·ÄâºÏµÄÖ±ÏßÏà±È£¬Êý¾ÝÌØÕ÷µÄ±ä»¯²úÉúÁ˶àÉÙÎó²î¡£
ËùÒÔÅж¨ÏµÊý¾ÍÊÇÉÏÃæÄǸö·½³Ì£¬ÈçºÎÅж¨ËüÊǺÃÊÇ»µ£¿ÎÒÃÇ¿´µ½ÁËËüÊÇ 1 ¼õȥһЩ¶«Î÷¡£Í¨³££¬ÔÚÊýѧÖУ¬Äã¿´µ½ËûµÄʱºò£¬Ëü·µ»ØÁËÒ»¸ö°Ù·Ö±È£¬ËüÊÇ
0 ~ 1 Ö®¼äµÄÊýÖµ¡£ÄãÈÏΪʲôÊÇºÃµÄ R ƽ·½»òÕßÅж¨ÏµÊýÄØ£¿ÈÃÎÒÃǼÙÉèÕâÀïµÄ R ƽ·½ÊÇ 0.8£¬ËüÊǺÃÊÇ»µÄØ£¿Ëü±È
0.3 ÊǺû¹ÊÇ»µ£¿¶ÔÓÚ 0.8 µÄ R ƽ·½£¬Õâ¾ÍÒâζׯعéÖ±Ïߵį½·½Îó²î£¬±ÈÉÏ y ¾ùÖµµÄƽ·½Îó²îÊÇ
2 ±È 10¡£Õâ¾ÍÊÇ˵»Ø¹éÖ±ÏßµÄÎó²î·Ç³£Ð¡ÓÚ y ¾ùÖµµÄÎó²î¡£ÌýÆðÀ´²»´í¡£ËùÒÔ 0.8 ·Ç³£ºÃ¡£
ÄÇôÓëÅж¨ÏµÊýµÄÖµ 0.3 Ïà±ÈÄØ£¿ÕâÀËüÒâζׯعéÖ±Ïߵį½·½Îó²î£¬±ÈÉÏ y ¾ùÖµµÄƽ·½Îó²îÊÇ
7 ±È 10¡£ÆäÖÐ 7 ±È 10 Òª»µÓÚ 2 ±È 10£¬7 ºÍ 2 ¶¼ÊǻعéÖ±Ïߵį½·½Îó²î¡£Òò´Ë£¬Ä¿±êÊǼÆËã
R ƽ·½Öµ£¬»òÕß½Ð×öÅж¨ÏµÊý£¬Ê¹Æä¾¡Á¿½Ó½ü 1¡£
±à³Ì¼ÆËã R ƽ·½
»¶ÓÔĶÁµÚʮһƪ½Ì³Ì¡£¼ÈÈ»ÎÒÃÇÖªµÀÁËÎÒÃÇѰÕҵĶ«Î÷£¬ÈÃÎÒÃÇʵ¼ÊÔÚ Python ÖмÆËãËü°É¡£µÚÒ»²½¾ÍÊǼÆËãÆ½·½Îó²î¡£º¯Êý¿ÉÄÜÊÇÕâÑù£º
def
squared_error(ys_orig,ys_line):
return sum((ys_line - ys_orig) * (ys_line -
ys_orig)) |
ʹÓÃÉÏÃæµÄº¯Êý£¬ÎÒÃÇ¿ÉÒÔ¼ÆËã³öÈκÎʵÏÖµ½Êý¾ÝµãµÄƽ·½Îó²î¡£ËùÒÔÎÒÃÇ¿ÉÒÔ½«Õâ¸öÓï·¨ÓÃÓڻعéÖ±ÏßºÍ y
¾ùÖµÖ±Ïß¡£Ò²¾ÍÊÇ˵£¬Æ½·½Îó²îÖ»ÊÇÅж¨ÏµÊýµÄÒ»²¿·Ö£¬ËùÒÔÈÃÎÒÃǹ¹½¨ÄǸöº¯Êý°É¡£ÓÉÓÚÆ½·½Îó²îº¯ÊýÖ»ÓÐÒ»ÐУ¬Äã¿ÉÒÔÑ¡Ôñ½«ÆäǶÈëµ½Åж¨ÏµÊýº¯ÊýÖУ¬µ«ÊÇÆ½·½Îó²îÊÇÄãÔÚÕâ¸öº¯ÊýÖ®Íâ¼ÆËãµÄ¶«Î÷£¬ËùÒÔÎÒÑ¡Ôñ½«Æäµ¥¶Àд³ÉÒ»¸öº¯Êý¡£¶ÔÓÚ
R ƽ·½£º
def
coefficient_of_determination(ys_orig,ys_line):
y_mean_line = [mean(ys_orig) for y in ys_orig]
squared_error_regr = squared_error(ys_orig,
ys_line)
squared_error_y_mean = squared_error(ys_orig,
y_mean_line)
return 1 - (squared_error_regr/squared_error_y_mean) |
ÎÒÃÇËù×öµÄÊÇ£¬¼ÆËã y ¾ùÖµÖ±Ïߣ¬Ê¹Óõ¥ÐеÄforÑ»·£¨ÆäʵÊDz»±ØÒªµÄ£©¡£Ö®ºóÎÒÃǼÆËãÁË y ¾ùÖµµÄƽ·½Îó²î£¬ÒÔ¼°»Ø¹éÖ±Ïߵį½·½Îó²î£¬Ê¹ÓÃÉÏÃæµÄº¯Êý¡£ÏÖÔÚ£¬ÎÒÃÇÐèÒª×öµÄ¾ÍÊǼÆËã³ö
R ƽ·½Ö®£¬Ëü½ö½öÊÇ 1 ¼õÈ¥»Ø¹éÖ±Ïߵį½·½Îó²î£¬³ýÒÔ y ¾ùÖµÖ±Ïߵį½·½Îó²î¡£ÎÒÃÇ·µ»Ø¸ÃÖµ£¬È»ºó¾ÍÍê³ÉÁË¡£×éºÏÆðÀ´²¢Ìø¹ý»æÍ¼²¿·Ö£¬´úÂëΪ£º
from
statistics import mean
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)
def best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))
b = mean(ys) - m*mean(xs)
return m, b
def squared_error(ys_orig,ys_line):
return sum((ys_line - ys_orig) * (ys_line -
ys_orig))
def coefficient_of_determination(ys_orig,ys_line):
y_mean_line = [mean(ys_orig) for y in ys_orig]
squared_error_regr = squared_error(ys_orig,
ys_line)
squared_error_y_mean = squared_error(ys_orig,
y_mean_line)
return 1 - (squared_error_regr/squared_error_y_mean)
m, b = best_fit_slope_and_intercept(xs,ys)
regression_line = [(m*x)+b for x in xs]
r_squared = coefficient_of_determination(ys,regression_line)
print(r_squared)
# 0.321428571429
##plt.scatter(xs,ys,color='#003F72',label='data')
##plt.plot(xs, regression_line, label='regression
line')
##plt.legend(loc=4)
##plt.show() |
ÕâÊǸöºÜµÍµÄÖµ£¬ËùÒÔ¸ù¾ÝÕâ¸ö¶ÈÁ¿£¬ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïß²¢²»ÊǺܺá£ÕâÀïµÄ R ƽ·½ÊǸöºÜºÃµÄ¶ÈÁ¿ÊÖ¶ÎÂ𣿿ÉÄÜÈ¡¾öÓÚÎÒÃǵÄÄ¿±ê¡£¶àÊýÇé¿öÏ£¬Èç¹ûÎÒÃǹØÐÄ׼ȷԤ²âδÀ´µÄÖµ£¬R
ƽ·½µÄÈ·ºÜÓÐÓá£Èç¹ûÄã¶ÔÔ¤²â¶¯»ú»òÕßÇ÷ÊÆ¸ÐÐËȤ£¬ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïßʵ¼ÊÉÏÒѾºÜºÃÁË¡£R ƽ·½²»Ó¦¸ÃÈç´ËÖØÒª¡£¿´Ò»¿´ÎÒÃÇʵ¼ÊµÄÊý¾Ý¼¯£¬ÎÒÃDZ»Ò»¸ö½ÏµÍµÄÊýÖµ¿¨×¡ÁË¡£ÖµÓëÖµÖ®¼äµÄ±ä»¯ÔÚijЩµãÉÏÊÇ
20% ~ 50%£¬ÕâÒѾ·Ç³£¸ßÁË¡£ÎÒÃÇÍêÈ«²»Ó¦¸Ã¸Ðµ½ÒâÍ⣬ʹÓÃÕâ¸ö¼òµ¥µÄÊý¾Ý¼¯£¬ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïß²¢²»ÄÜÃèÊöÕæÊµÊý¾Ý¡£
µ«ÊÇ£¬ÎÒÃǸղÅ˵µÄÊÇÒ»¸ö¼ÙÉè¡£ËäÈ»ÎÒÃÇÂß¼ÉÏͳһÕâ¸ö¼ÙÉ裬ÎÒÃÇÐèÒªÌá³öÒ»¸öÐµķ½·¨£¬À´ÑéÖ¤¼ÙÉè¡£µ½Ä¿Ç°ÎªÖ¹µÄËã·¨·Ç³£»ù´¡£¬ÎÒÃÇÏÖÔÚÖ»ÄÜ×öºÜÉÙµÄÊÂÇ飬ËùÒÔûÓÐʲô¿Õ¼äÀ´¸Ä½øÎó²îÁË£¬µ«ÊÇÖ®ºó£¬Äã»áÔÚ¿Õ¼äÖ®ÉÏ·¢Ïֿռ䡣²»½ö½öÒª¿¼ÂÇËã·¨±¾ÉíµÄ²ã´Î¿Õ¼ä£¬»¹ÓÐÓɺܶàËã·¨²ã´Î×éºÏ¶ø³ÉµÄËã·¨¡£ÆäÖУ¬ÎÒÃÇÐèÒª²âÊÔËüÃÇÀ´È·±£ÎÒÃǵļÙÉ裬¹ØÓÚËã·¨ÊǸÉʲôÓõģ¬ÊÇÕýÈ·µÄ¡£¿¼ÂǰѲÙ×÷×é³É³Éº¯ÊýÓɶàô¼òµ¥£¬Ö®ºó£¬´ÓÕâÀ↑ʼ£¬½«Õû¸öÑéÖ¤·Ö½â³ÉÊýǧÐдúÂë¡£
ÎÒÃÇÔÚÏÂһƪ½Ì³ÌËù×öµÄÊÇ£¬¹¹½¨Ò»¸öÏà¶Ô¼òµ¥µÄÊý¾Ý¼¯Éú³ÉÆ÷£¬¸ù¾ÝÎÒÃǵIJÎÊýÀ´Éú³ÉÊý¾Ý¡£ÎÒÃÇ¿ÉÒÔʹÓÃËüÀ´°´ÕÕÒâÔ¸²Ù×÷Êý¾Ý£¬Ö®ºó¶ÔÕâЩÊý¾Ý¼¯²âÊÔÎÒÃǵÄËã·¨£¬¸ù¾ÝÎÒÃǵļÙÉèÐ޸IJÎÊý£¬Ó¦¸Ã»á²úÉúһЩӰÏì¡£ÎÒÃÇÖ®ºó¿ÉÒÔ½«ÎÒÃǵļÙÉèºÍÕæÊµÇé¿ö±È½Ï£¬²¢Ï£ÍûËûÃÇÆ¥Åä¡£ÕâÀïµÄÀý×ÓÖУ¬¼ÙÉèÊÇÎÒÃÇÕýÈ·±àдÕâЩËã·¨£¬²¢ÇÒÅж¨ÏµÊýµÍµÄÔÒòÊÇ£¬y
ÖµµÄ·½²îÌ«´óÁË¡£ÎÒÃÇ»áÔÚÏÂÒ»¸ö½Ì³ÌÖÐÑéÖ¤Õâ¸ö¼ÙÉè¡£
Ϊ²âÊÔ´´½¨ÑùÀýÊý¾Ý¼¯
»¶ÓÔĶÁµÚÊ®¶þƪ½Ì³Ì¡£ÎÒÃÇÒѾÁ˽âÁ˻ع飬ÉõÖÁ±àдÁËÎÒÃÇ×Ô¼ºµÄ¼òµ¥ÏßÐԻعéËã·¨¡£²¢ÇÒ£¬ÎÒÃÇÒ²¹¹½¨ÁËÅж¨ÏµÊýËã·¨À´¼ì²é×î¼ÑÄâºÏÖ±ÏßµÄ׼ȷ¶ÈºÍ¿É¿¿ÐÔ¡£ÎÒÃÇ֮ǰÌÖÂÛºÍչʾ¹ý£¬×î¼ÑÄâºÏÖ±Ïß¿ÉÄܲ»ÊÇ×îºÃµÄÄâºÏ£¬Ò²½âÊÍÁËΪʲôÎÒÃǵÄʾÀý·½ÏòÉÏÊÇÕýÈ·µÄ£¬¼´Ê¹²¢²»×¼È·¡£µ«ÊÇÏÖÔÚ£¬ÎÒÃÇʹÓÃÁ½¸ö¶¥¼¶Ëã·¨£¬ËüÃÇÓÉһЩСÐÍËã·¨×é³É¡£Ëæ×ÅÎÒÃǼÌÐø¹¹ÔìÕâÖÖËã·¨²ã´Î£¬Èç¹ûËüÃÇÖ®ÖÐÓиöС´íÎó£¬ÎÒÃǾͻáÓöµ½Âé·³£¬ËùÒÔÎÒÃÇ´òËãÑéÖ¤ÎÒÃǵļÙÉè¡£
ÔÚ±à³ÌµÄÊÀ½çÖУ¬ÏµÍ³»¯µÄ³ÌÐò²âÊÔͨ³£½Ð×ö¡°µ¥Ôª²âÊÔ¡±¡£Õâ¾ÍÊÇ´óÐͳÌÐò¹¹½¨µÄ·½Ê½£¬Ã¿¸öСÐ͵Ä×Óϵͳ¶¼²»¶Ï¼ì²é¡£Ëæ×Å´óÐͳÌÐòµÄÉý¼¶ºÍ¸üУ¬¿ÉÒÔÇáÒ×ÒÆ³ýһЩºÍ֮ǰϵͳ³åÍ»µÄ¹¤¾ß¡£Ê¹ÓûúÆ÷ѧϰ£¬ÕâÒ²ÊǸöÎÊÌ⣬µ«ÊÇÎÒÃǵÄÖ÷Òª¹Ø×¢µã½ö½öÊDzâÊÔÎÒÃǵļÙÉè¡£×îºó£¬ÄãÓ¦¸Ã×ã¹»´ÏÃ÷£¬¿ÉÒÔΪÄãµÄÕû¸ö»úÆ÷ѧϰϵͳ´´½¨µ¥Ôª²âÊÔ£¬µ«ÊÇĿǰΪֹ£¬ÎÒÃÇÐèÒª¾¡¿ÉÄܼòµ¥¡£
ÎÒÃǵļÙÉèÊÇ£¬ÎÒÃÇ´´½¨ÁË×î¼úheÖ±Ïߣ¬Ö®ºóʹÓÃÅж¨ÏµÊý·¨À´²âÁ¿¡£ÎÒÃÇÖªµÀ£¨ÊýѧÉÏ£©£¬R ƽ·½µÄÖµÔ½µÍ£¬×î¼ÑÄâºÏÖ±Ïß¾ÍÔ½²»ºÃ£¬²¢ÇÒÔ½¸ß£¨½Ó½ü
1£©¾ÍÔ½ºÃ¡£ÎÒÃǵļÙÉèÊÇ£¬ÎÒÃǹ¹½¨ÁËÒ»¸öÕâÑù¹¤×÷µÄϵͳ£¬ÎÒÃǵÄϵͳÓÐÐí¶à²¿·Ö£¬¼´Ê¹ÊÇÒ»¸öСµÄ²Ù×÷´íÎó¶¼»á²úÉúºÜ´óµÄÂé·³¡£ÎÒÃÇÈçºÎ²âÊÔËã·¨µÄÐÐΪ£¬±£Ö¤Èκζ«Î÷¶¼Ô¤ÆÚ¹¤×÷ÄØ£¿
ÕâÀïµÄÀíÄîÊÇ´´½¨Ò»¸öÑùÀýÊý¾Ý¼¯£¬ÓÉÎÒÃǶ¨Ò壬Èç¹ûÎÒÃÇÓÐÒ»¸öÕýÏà¹ØµÄÊý¾Ý¼¯£¬Ïà¹ØÐԷdz£Ç¿£¬Èç¹ûÏà¹ØÐÔºÜÈõµÄ»°£¬µãÒ²²»ÊǺܽôÃÜ¡£ÎÒÃÇÓÃÑÛ¾¦ºÜÈÝÒׯÀ²âÕâ¸öÖ±Ïߣ¬µ«ÊÇ»úÆ÷Ó¦¸Ã×öµÃ¸üºÃ¡£ÈÃÎÒÃǹ¹½¨Ò»¸öϵͳ£¬Éú³ÉʾÀýÊý¾Ý£¬ÎÒÃÇ¿ÉÒÔµ÷ÕûÕâЩ²ÎÊý¡£
×ʼ£¬ÎÒÃǹ¹½¨Ò»¸ö¿ò¼Üº¯Êý£¬Ä£ÄâÎÒÃǵÄ×îÖÕÄ¿±ê£º
def
create_dataset(hm,variance,step=2,correlation=False):
return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64) |
ÎÒÃDz鿴º¯ÊýµÄ¿ªÍ·£¬Ëü½ÓÊÜÏÂÁвÎÊý£º
1.hm£¨how much£©£ºÕâÊÇÉú³É¶àÉÙ¸öÊý¾Ýµã¡£ÀýÈçÎÒÃÇ¿ÉÒÔÑ¡Ôñ
10£¬»òÕßһǧÍò¡£
2.variance£º¾ö¶¨Ã¿¸öÊý¾ÝµãºÍ֮ǰµÄÊý¾ÝµãÏà±È£¬Óжà´ó±ä»¯¡£±ä»¯Ô½´ó£¬¾ÍÔ½²»½ôÃÜ¡£
3.step£ºÃ¿¸öµã¾àÀë¾ùÖµÓжàÔ¶£¬Ä¬ÈÏΪ 2¡£
4.correlation£º¿ÉÒÔΪFalse¡¢pos»òÕßneg£¬¾ö¶¨²»Ïà¹Ø¡¢ÕýÏà¹ØºÍ¸ºÏà¹Ø¡£
ҪעÒ⣬ÎÒÃÇÒ²µ¼ÈëÁËrandom£¬Õâ»á°ïÖúÎÒÃÇÉú³É£¨Î±£©Ëæ»úÊý¾Ý¼¯¡£
ÏÖÔÚÎÒÃÇÒª¿ªÊ¼Ìî³äº¯ÊýÁË¡£
def
create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y) |
·Ç³£¼òµ¥£¬ÎÒÃǽö½öʹÓÃhm±äÁ¿£¬µü´úÎÒÃÇËùÑ¡µÄ·¶Î§£¬½«µ±Ç°Öµ¼ÓÉÏÒ»¸ö¸º²îÖµµ½Ö¤²îÖµµÄËæ»ú·¶Î§¡£Õâ»á²úÉúÊý¾Ý£¬µ«ÊÇÈç¹ûÎÒÃÇÏëÒªµÄ»°£¬ËüûÓÐÏà¹ØÐÔ¡£ÈÃÎÒÃÇÕâÑù£º
def
create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y)
if correlation and correlation == 'pos':
val+=step
elif correlation and correlation == 'neg':
val-=step |
·Ç³£°ôÁË£¬ÏÖÔÚÎÒÃǶ¨ÒåºÃÁË y Öµ¡£ÏÂÃæ£¬ÈÃÎÒÃÇ´´½¨ x£¬Ëü¸ü¼òµ¥£¬Ö»ÊÇ·µ»ØËùÓж«Î÷¡£
def
create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y)
if correlation and correlation == 'pos':
val+=step
elif correlation and correlation == 'neg':
val-=step
xs = [i for i in range(len(ys))]
return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64) |
ÎÒÃÇ×¼±¸ºÃÁË¡£ÎªÁË´´½¨ÑùÀýÊý¾Ý¼¯£¬ÎÒÃÇËùÐèµÄ¾ÍÊÇ£º
xs,
ys = create_dataset(40,40,2,correlation='pos') |
ÈÃÎÒÃǽ«Ö®Ç°ÏßÐԻعé½Ì³ÌµÄ´úÂë·Åµ½Ò»Æð£º
from
statistics import mean
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
def create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y)
if correlation and correlation == 'pos':
val+=step
elif correlation and correlation == 'neg':
val-=step
xs = [i for i in range(len(ys))]
return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64)
def best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))
b = mean(ys) - m*mean(xs)
return m, b
def coefficient_of_determination(ys_orig,ys_line):
y_mean_line = [mean(ys_orig) for y in ys_orig]
squared_error_regr = sum((ys_line - ys_orig)
* (ys_line - ys_orig))
squared_error_y_mean = sum((y_mean_line - ys_orig)
* (y_mean_line - ys_orig))
print(squared_error_regr)
print(squared_error_y_mean)
r_squared = 1 - (squared_error_regr/squared_error_y_mean)
return r_squared
xs, ys = create_dataset(40,40,2,correlation='pos')
m, b = best_fit_slope_and_intercept(xs,ys)
regression_line = [(m*x)+b for x in xs]
r_squared = coefficient_of_determination(ys,regression_line)
print(r_squared)
plt.scatter(xs,ys,color='#003F72', label =
'data')
plt.plot(xs, regression_line, label = 'regression
line')
plt.legend(loc=4)
plt.show() |
Ö´ÐдúÂ룬Äã»á¿´µ½£º

Åж¨ÏµÊýÊÇ 0.516508576011£¨Òª×¢ÒâÄãµÄ½á¹û²»»áÏàͬ£¬ÒòΪÎÒÃÇʹÓÃÁËËæ»úÊý·¶Î§£©¡£
²»´í£¬ËùÒÔÎÒÃǵļÙÉèÊÇ£¬Èç¹ûÎÒÃÇÉú³ÉÒ»¸ö¸ü¼Ó½ôÃÜÏà¹ØµÄÊý¾Ý¼¯£¬ÎÒÃǵÄ
R ƽ·½»òÅж¨ÏµÊýÓ¦¸Ã¸üºÃ¡£ÈçºÎʵÏÖËüÄØ£¿ºÜ¼òµ¥£¬°Ñ·¶Î§µ÷µÍ¡£
xs,
ys = create_dataset(40,10,2,correlation='pos') |

ÏÖÔÚÎÒÃÇµÄ R ƽ·½ÖµÎª 0.939865240568£¬·Ç³£²»´í£¬¾ÍÏñÔ¤ÆÚÒ»Ñù¡£ÈÃÎÒÃDzâÊÔ¸ºÏà¹Ø£º
xs,
ys = create_dataset(40,10,2,correlation='neg') |

R ƽ·½ÖµÊÇ 0.930242442156£¬¸ú֮ǰһÑùºÃ£¬ÓÉÓÚËüÃDzÎÊýÏàͬ£¬Ö»ÊÇ·½Ïò²»Í¬¡£
ÕâÀÎÒÃǵļÙÉè֤ʵÁË£º±ä»¯Ô½Ð¡ R ÖµºÍÅж¨ÏµÊýÔ½¸ß£¬±ä»¯Ô½´ó
R ÖµÔ½µÍ¡£Èç¹ûÊDz»Ïà¹ØÄØ£¿Ó¦¸ÃºÜµÍ£¬½Ó½üÓÚ 0£¬³ý·ÇÎÒÃǵÄËæ»úÊýÅÅÁÐʵ¼ÊÉÏÓÐÏà¹ØÐÔ¡£ÈÃÎÒÃDzâÊÔ£º
xs,
ys = create_dataset(40,10,2,correlation=False) |
Åж¨ÏµÊýΪ 0.0152650900427¡£
ÏÖÔÚΪֹ£¬ÎÒ¾õµÃÎÒÃÇÓ¦¸Ã¸Ðµ½×ÔÐÅ£¬ÒòΪÊÂÇé¶¼·ûºÏÎÒÃǵÄÔ¤ÆÚ¡£
¼ÈÈ»ÎÒÃÇÒѾ¶Ô¼òµ¥µÄÏßÐԻعéºÜÊìϤÁË£¬Ï¸ö½Ì³ÌÖÐÎÒÃÇ¿ªÊ¼½²½â·ÖÀà¡£
|