Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Python »úÆ÷ѧϰʵս½Ì³Ì£º»Ø¹é

 
À´Ô´£º²©¿Í ·¢²¼ÓÚ2017-7-10
  3771  次浏览      30
 

ÒýÑÔºÍÊý¾Ý

»¶Ó­ÔĶÁ Python »úÆ÷ѧϰϵÁн̵̳Ļع鲿·Ö¡£ÕâÀÄãÓ¦¸ÃÒѾ­°²×°ÁË Scikit-Learn¡£Èç¹ûûÓУ¬°²×°Ëü£¬ÒÔ¼° Pandas ºÍ Matplotlib¡£

pip install numpy

pip install scipy

pip install scikit-learn

pip install matplotlib

pip install pandas

³ýÁËÕâЩ½Ì³Ì·¶Î§µÄµ¼ÈëÖ®Í⣬ÎÒÃÇ»¹ÒªÔÚÕâÀïʹÓà Quandl£º

pip install quandl

Ê×ÏÈ£¬¶ÔÓÚÎÒÃǽ«ÆäÓÃÓÚ»úÆ÷ѧϰ¶øÑÔ£¬Ê²Ã´ÊǻعéÄØ£¿ËüµÄÄ¿±êÊǽÓÊÜÁ¬ÐøÊý¾Ý£¬Ñ°ÕÒ×îÊʺÏÊý¾ÝµÄ·½³Ì£¬²¢Äܹ»¶ÔÌØ¶¨Öµ½øÐÐÔ¤²â¡£Ê¹Óüòµ¥µÄÏßÐԻع飬Äã¿ÉÒÔ½ö½öͨ¹ý´´½¨×î¼ÑÄâºÏÖ±Ïߣ¬À´ÊµÏÖËü¡£

ÕâÀÎÒÃÇ¿ÉÒÔʹÓÃÕâÌõÖ±Ïߵķ½³Ì£¬À´Ô¤²âδÀ´µÄ¼Û¸ñ£¬ÆäÖÐÈÕÆÚÊÇ x Öá¡£

»Ø¹éµÄÈÈÃÅÓ÷¨ÊÇÔ¤²â¹ÉƱ¼Û¸ñ¡£ÓÉÓÚÎÒÃǻῼÂǼ۸ñËæÊ±¼äµÄÁ÷¶¯£¬²¢ÇÒʹÓÃÁ¬ÐøµÄÊý¾Ý¼¯£¬³¢ÊÔÔ¤²âδÀ´µÄÏÂÒ»¸öÁ÷¶¯¼Û¸ñ£¬ËùÒÔ¿ÉÒÔÕâÑù×ö¡£

»Ø¹éÊǼලµÄ»úÆ÷ѧϰµÄÒ»ÖÖ£¬Ò²¾ÍÊÇ˵£¬¿ÆÑ§¼ÒÏòÆäÕ¹Ê¾ÌØÕ÷£¬Ö®ºóÏòÆäչʾÕýÈ·´ð°¸À´½Ì»á»úÆ÷¡£Ò»µ©½Ì»áÁË»úÆ÷£¬¿ÆÑ§¼Ò¾ÍÄܹ»Ê¹ÓÃһЩ²»¿É¼ûµÄÊý¾ÝÀ´²âÊÔ»úÆ÷£¬ÆäÖпÆÑ§¼ÒÖªµÀÕýÈ·´ð°¸£¬µ«ÊÇ»úÆ÷²»ÖªµÀ¡£»úÆ÷µÄ´ð°¸»áÓëÒÑÖª´ð°¸¶Ô±È£¬²¢ÇÒ¶ÈÁ¿»úÆ÷µÄ׼ȷÂÊ¡£Èç¹û׼ȷÂÊ×ã¹»¸ß£¬¿ÆÑ§¼Ò¾Í»á¿¼Âǽ«ÆäËã·¨ÓÃÓÚÕæÊµÊÀ½ç¡£

ÓÉÓڻعé¹ã·ºÓÃÓÚ¹ÉÆ±¼Û¸ñ£¬ÎÒÃÇ¿ÉÒÔʹÓÃÒ»¸öʾÀý´ÓÕâÀ↑ʼ¡£×ʼ£¬ÎÒÃÇÐèÒªÊý¾Ý¡£ÓÐʱºòÊý¾ÝÒ×ÓÚ»ñÈ¡£¬ÓÐʱÄãÐèÒª³öÈ¥²¢Ç××ÔÊÕ¼¯¡£ÎÒÃÇÕâÀÎÒÃÇÖÁÉÙÄܹ»ÒÔ¼òµ¥µÄ¹ÉƱ¼Û¸ñºÍ³É½»Á¿ÐÅÏ¢¿ªÊ¼£¬ËüÃÇÀ´×Ô Quandl¡£ÎÒÃÇ»áץȡ Google µÄ¹ÉƱ¼Û¸ñ£¬ËüµÄ´úÂëÊÇGOOGL£º

import

pandas as pd

import quandl

df = quandl.get("WIKI/GOOGL")

print(df.head())

 

×¢Ò⣺дÕâÆªÎÄÕµÄʱºò£¬Quandl µÄÄ£¿éʹÓôóд Q ÒýÓ㬵«ÏÖÔÚÊÇСд q£¬ËùÒÔimport quandl¡£

µ½ÕâÀÎÒÃÇÓµÓУº

Open High Low Close Volume Ex-Dividend \
Date
2004-08-19 100.00 104.06 95.96 100.34 44659000 0
2004-08-20 101.01 109.08 100.50 108.31 22834300 0
2004-08-23 110.75 113.48 109.05 109.40 18256100 0
2004-08-24 111.24 111.60 103.57 104.87 15247300 0
2004-08-25 104.96 108.00 103.88 106.00 9188600 0

Split Ratio Adj. Open Adj. High Adj. Low Adj. Close \
Date
2004-08-19 1 50.000 52.03 47.980 50.170
2004-08-20 1 50.505 54.54 50.250 54.155
2004-08-23 1 55.375 56.74 54.525 54.700
2004-08-24 1 55.620 55.80 51.785 52.435
2004-08-25 1 52.480 54.00 51.940 53.000

Adj. Volume
Date
2004-08-19 44659000
2004-08-20 22834300
2004-08-23 18256100
2004-08-24 15247300
2004-08-25 9188600

 

ÕâÊǸö·Ç³£ºÃµÄ¿ªÊ¼£¬ÎÒÃÇÓµÓÐÁËÊý¾Ý£¬µ«ÊÇÓеã¶àÁË¡£

ÕâÀÎÒÃÇÓкܶàÁУ¬Ðí¶à¶¼ÊǶàÓàµÄ£¬»¹ÓÐЩ²»Ôõô±ä»¯¡£ÎÒÃÇ¿ÉÒÔ¿´µ½£¬³£¹æºÍÐÞÕý£¨Adj£©µÄÁÐÊÇÖØ¸´µÄ¡£ÐÞÕýµÄÁп´ÆðÀ´¸ü¼ÓÀíÏë¡£³£¹æµÄÁÐÊǵ±ÌìµÄ¼Û¸ñ£¬µ«ÊÇ¹ÉÆ±Óиö½Ð×ö·Ö²ðµÄ¶«Î÷£¬ÆäÖÐÒ»¹ÉͻȻ¾Í±ä³ÉÁËÁ½¹É£¬ËùÒÔÒ»¹ÉµÄ¼Û¸ñÒª¼õ°ë£¬µ«Êǹ«Ë¾µÄ¼ÛÖµ²»±ä¡£ÐÞÕýµÄÁÐΪ¹ÉƱ·Ö²ð¶øµ÷Õû£¬ÕâʹµÃËüÃǶÔÓÚ·ÖÎö¸ü¼Ó¿É¿¿¡£

ËùÒÔ£¬ÈÃÎÒÃǼÌÐø£¬Ï÷¼õԭʼµÄ DataFrame¡£

df = df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]

ÏÖÔÚÎÒÃÇÓµÓÐÁËÐÞÕýµÄÁУ¬ÒÔ¼°³É½»Á¿¡£ÓÐһЩ¶«Î÷ÐèҪעÒâ¡£Ðí¶àÈË̸ÂÛ»òÕßÌý˵»úÆ÷ѧϰ£¬¾ÍÏñÎÞÖÐÉúÓеĺÚħ·¨¡£»úÆ÷ѧϰ¿ÉÒÔÍ»³öÒÑÓеÄÊý¾Ý£¬µ«ÊÇÊý¾ÝÐèÒªÏÈ´æÔÚ¡£ÄãÐèÒªÓÐÒâÒåµÄÊý¾Ý¡£ËùÒÔÄãÔõô֪µÀÊÇ·ñÓÐÒâÒåÄØ£¿ÎÒµÄ×î¼Ñ½¨Òé¾ÍÊÇ£¬½ö½ö¼ò»¯ÄãµÄ´óÄÔ¡£¿¼ÂÇһϣ¬ÀúÊ·¼Û¸ñ»á¾ö¶¨Î´À´¼Û¸ñÂð£¿ÓÐЩÈËÕâôÈÏΪ£¬µ«ÊǾöø¾ÃÖ®Õⱻ֤ʵÊÇ´íÎóµÄ¡£µ«ÊÇÀúÊ·¹æÂÉÄØ£¿Í»³öµÄʱºò»áÓÐÒâÒ壨»úÆ÷ѧϰ»áÓÐËù°ïÖú£©£¬µ«ÊÇ»¹ÊÇÌ«ÈõÁË¡£ÄÇô£¬¼Û¸ñ±ä»¯ºÍ³É½»Á¿ËæÊ±¼äµÄ¹ØÏµ£¬ÔÙ¼ÓÉÏÀúÊ·¹æÂÉÄØ£¿¿ÉÄܸüºÃÒ»µã¡£ËùÒÔ£¬ÄãÒѾ­Äܹ»¿´µ½£¬²¢²»ÊÇÊý¾ÝÔ½¶àÔ½ºÃ£¬¶øÊÇÎÒÃÇÐèҪʹÓÃÓÐÓô¦µÄÊý¾Ý¡£Í¬Ê±£¬Ô­Ê¼Êý¾ÝÓ¦¸Ã×öһЩת»»¡£

¿¼ÂÇÿÈÕ²¨¶¯£¬ÀýÈç×î¸ß¼Û¼õ×îµÍ¼ÛµÄ°Ù·Ö±È²îÖµÈçºÎ£¿Ã¿Èյİٷֱȱ仯ÓÖÈçºÎÄØ£¿Äã¾õµÃOpen, High, Low, CloseÕâÖÖ¼òµ¥Êý¾Ý£¬»¹ÊÇClose, Spread/Volatility, %change daily¸üºÃ£¿ÎÒ¾õµÃºóÕ߸üºÃÒ»µã¡£Ç°Õß¶¼ÊǷdz£ÏàËÆµÄÊý¾Ýµã£¬ºóÕß»ùÓÚǰÕßµÄͳһÊý¾Ý´´½¨£¬µ«ÊÇ´øÓиü¼ÓÓмÛÖµµÄÐÅÏ¢¡£

ËùÒÔ£¬²¢²»ÊÇÄãÓµÓеÄËùÓÐÊý¾Ý¶¼ÊÇÓÐÓõ쬲¢ÇÒÓÐʱÄãÐèÒª¶ÔÄãµÄÊý¾ÝÖ´ÐнøÒ»²½µÄ²Ù×÷£¬²¢Ê¹Æä¸ü¼ÓÓмÛÖµ£¬Ö®ºó²ÅÄÜÌṩ¸ø»úÆ÷ѧϰËã·¨¡£ÈÃÎÒÃǼÌÐø²¢×ª»»ÎÒÃǵÄÊý¾Ý£º

df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0

Õâ»á´´½¨Ò»¸öеÄÁУ¬ËüÊÇ»ùÓÚÊÕÅ̼۵İٷֱȼ«²î£¬ÕâÊÇÎÒÃǶÔÓÚ²¨¶¯µÄ´Ö²Ú¶ÈÁ¿¡£ÏÂÃæ£¬ÎÒÃÇ»á¼ÆËãÿÈհٷֱȱ仯£º

df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

ÏÖÔÚÎÒÃǻᶨÒåÒ»¸öÐ嵀 DataFrame£º

df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
print(df.head())

Adj. Close HL_PCT PCT_change Adj. Volume
Date
2004-08-19 50.170 8.072553 0.340000 44659000
2004-08-20 54.155 7.921706 7.227007 22834300
2004-08-23 54.700 4.049360 -1.218962 18256100
2004-08-24 52.435 7.657099 -5.726357 15247300
2004-08-25 53.000 3.886792 0.990854 9188600

ÌØÕ÷ºÍ±êÇ©

»ùÓÚÉÏһƪ»úÆ÷ѧϰ»Ø¹é½Ì³Ì£¬ÎÒÃǽ«Òª¶ÔÎÒÃÇµÄ¹ÉÆ±¼Û¸ñÊý¾ÝÖ´Ðлع顣ĿǰµÄ´úÂ룺

import quandl
import pandas as pd

df = quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0
df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
print(df.head())

ÕâÀïÎÒÃÇÒѾ­»ñÈ¡ÁËÊý¾Ý£¬ÅжϳöÓмÛÖµµÄÊý¾Ý£¬²¢Í¨¹ý²Ù×÷´´½¨ÁËһЩ¡£ÎÒÃÇÏÖÔÚÒѾ­×¼±¸ºÃʹÓûع鿪ʼ»úÆ÷ѧϰµÄ¹ý³Ì¡£Ê×ÏÈ£¬ÎÒÃÇÐèҪһЩ¸ü¶àµÄµ¼Èë¡£ËùÓеĵ¼ÈëÊÇ£º

import quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression

ÎÒÃÇ»áʹÓÃnumpyÄ£¿éÀ´½«Êý¾Ýת»»Îª NumPy Êý×飬ËüÊÇ Sklearn µÄÔ¤ÆÚ¡£ÎÒÃÇÔÚÓõ½preprocessingºÍcross_validationʱ£¬»áÉîÈë̸ÂÛËûÃÇ£¬µ«ÊÇÔ¤´¦ÀíÊÇÓÃÓÚÔÚ»úÆ÷ѧϰ֮ǰ£¬¶ÔÊý¾ÝÇåÏ´ºÍËõ·ÅµÄÄ£¿é¡£½»²æÑéÖ¤ÔÚ²âÊÔ½×¶ÎʹÓá£×îºó£¬ÎÒÃÇÒ²´Ó Sklearn µ¼ÈëÁËLinearRegressionËã·¨£¬ÒÔ¼°svm¡£ËüÃÇÓÃ×÷ÎÒÃǵĻúÆ÷ѧϰËã·¨À´Õ¹Ê¾½á¹û¡£

ÕâÀÎÒÃÇÒѾ­»ñÈ¡ÁËÎÒÃÇÈÏΪÓÐÓõÄÊý¾Ý¡£ÕæÊµµÄ»úÆ÷ѧϰÈçºÎ¹¤×÷ÄØ£¿Ê¹Óüලʽѧϰ£¬ÄãÐèÒªÌØÕ÷ºÍ±êÇ©¡£ÌØÕ÷¾ÍÊÇÃèÊöÐÔÊôÐÔ£¬±êÇ©¾ÍÊÇÄã³¢ÊÔÔ¤²âµÄ½á¹û¡£ÁíÒ»¸ö³£¼ûµÄ»Ø¹éʾÀý¾ÍÊdz¢ÊÔΪij¸öÈËÔ¤²â±£Ïյı£·Ñ¡£±£ÏÕ¹«Ë¾»áÊÕ¼¯ÄãµÄÄêÁä¡¢¼ÝʻΥ¹æÐÐΪ¡¢¹«¹²·¸×ï¼Ç¼£¬ÒÔ¼°ÄãµÄÐÅÓÃÆÀ·Ö¡£¹«Ë¾»áʹÓÃÀϿͻ§£¬»ñÈ¡Êý¾Ý£¬²¢µÃ³öÓ¦¸Ã¸ø¿Í»§µÄ¡°ÀíÏë±£·Ñ¡±£¬»òÕßÈç¹ûËûÃǾõµÃÓÐÀû¿ÉͼµÄ»°£¬ËûÃÇ»áʹÓÃʵ¼ÊʹÓõĿͻ§¡£

ËùÒÔ£¬¶ÔÓÚѵÁ·»úÆ÷ѧϰ·ÖÀàÆ÷À´Ëµ£¬ÌØÕ÷Êǿͻ§ÊôÐÔ£¬±êÇ©ÊǺÍÕâЩÊôÐÔÏà¹ØµÄ±£·Ñ¡£

ÎÒÃÇÕâÀʲôÊÇÌØÕ÷ºÍ±êÇ©ÄØ£¿ÎÒÃdz¢ÊÔÔ¤²â¼Û¸ñ£¬ËùÒÔ¼Û¸ñ¾ÍÊDZêÇ©£¿Èç¹ûÕâÑù£¬Ê²Ã´ÊÇÌØÕ÷ÄØ£¿¶ÔÓÚÔ¤²âÎÒÃǵļ۸ñÀ´Ëµ£¬ÎÒÃǵıêÇ©£¬¾ÍÊÇÎÒÃÇ´òËãÔ¤²âµÄ¶«Î÷£¬Êµ¼ÊÉÏÊÇδÀ´¼Û¸ñ¡£ÕâÑù£¬ÎÒÃǵÄÌØÕ÷ʵ¼ÊÉÏÊÇ£ºµ±Ç°¼Û¸ñ¡¢HL °Ù·Ö±ÈºÍ°Ù·Ö±È±ä»¯¡£±êÇ©¼Û¸ñÊÇδÀ´Ä³¸öµãµÄ¼Û¸ñ¡£ÈÃÎÒÃǼÌÐøÌí¼ÓеÄÐУº

forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))

ÕâÀÎÒÃǶ¨ÒåÁËÔ¤²âÁУ¬Ö®ºóÎÒÃǽ«ÈκΠNaN Êý¾ÝÌî³äΪ -99999¡£¶ÔÓÚÈçºÎ´¦ÀíȱʧÊý¾Ý£¬ÄãÓÐһЩѡÔñ£¬Äã²»Äܽö½ö½« NaN£¨²»ÊÇÊýÖµ£©Êý¾Ýµã´«¸ø»úÆ÷ѧϰ·ÖÀàÎ÷£¬ÄãÐèÒª´¦ÀíËü¡£Ò»¸öÖ÷Á÷Ñ¡Ïî¾ÍÊǽ«È±Ê§ÖµÌî³äΪ -99999¡£ÔÚÐí¶à»úÆ÷ѧϰ·ÖÀàÆ÷ÖУ¬»á½«ÆäÊDZ»ÎªÀëȺµã¡£ÄãÒ²¿ÉÒÔ½ö½ö¶ªÆú°üº¬È±Ê§ÖµµÄËùÓÐÌØÕ÷»ò±êÇ©£¬µ«ÊÇÕâÑùÄã¿ÉÄܻᶪµô´óÁ¿µÄÊý¾Ý¡£

ÕæÊµÊÀ½çÖУ¬Ðí¶àÊý¾Ý¼¯¶¼ºÜ»ìÂÒ¡£¶àÊý¹É¼Û»ò³É½»Á¿Êý¾Ý¶¼ºÜ¸É¾»£¬ºÜÉÙÓÐȱʧÊý¾Ý£¬µ«ÊÇÐí¶àÊý¾Ý¼¯»áÓдóÁ¿È±Ê§Êý¾Ý¡£ÎÒ¼û¹ýһЩÊý¾Ý¼¯£¬´óÁ¿µÄÐк¬ÓÐȱʧÊý¾Ý¡£Äã²¢²»Ò»¶¨ÏëҪʧȥËùÓв»´íµÄÊý¾Ý£¬Èç¹ûÄãµÄÑùÀýÊý¾ÝÓÐһЩȱʧ£¬Äã¿ÉÄÜ»á²Â²âÕæÊµÊÀ½çµÄÓÃÀýÒ²ÓÐһЩȱʧ¡£ÄãÐèҪѵÁ·¡¢²âÊÔ²¢ÒÀÀµÏàͬÊý¾Ý£¬ÒÔ¼°Êý¾ÝµÄÌØÕ÷¡£

×îºó£¬ÎÒÃǶ¨ÒåÎÒÃÇÐèÒªÔ¤²âµÄ¶«Î÷¡£Ðí¶àÇé¿öÏ£¬¾ÍÏñ³¢ÊÔÔ¤²â¿Í»§µÄ±£·ÑµÄ°¸ÀýÖУ¬Äã½ö½öÐèÒªÒ»¸öÊý×Ö£¬µ«ÊǶÔÓÚÔ¤²âÀ´Ëµ£¬ÄãÐèÒªÔ¤²âÖ¸¶¨ÊýÁ¿µÄÊý¾Ýµã¡£ÎÒÃǼÙÉèÎÒÃÇ´òËãÔ¤²âÊý¾Ý¼¯Õû¸ö³¤¶ÈµÄ 1%¡£Òò´Ë£¬Èç¹ûÎÒÃǵÄÊý¾ÝÊÇ 100 ÌìµÄ¹ÉƱ¼Û¸ñ£¬ÎÒÃÇÐèÒªÄܹ»Ô¤²âδÀ´Ò»ÌìµÄ¼Û¸ñ¡£Ñ¡ÔñÄãÏëÒªµÄÄǸö¡£Èç¹ûÄãÖ»Êdz¢ÊÔÔ¤²âÃ÷ÌìµÄ¼Û¸ñ£¬ÄãÓ¦¸ÃѡȡһÌìÖ®ºóµÄÊý¾Ý£¬¶øÇÒÒ²Ö»ÄÜÒ»ÌìÖ®ºóµÄÊý¾Ý¡£Èç¹ûÄã´òËãÔ¤²â 10 Ì죬ÎÒÃÇ¿ÉÒÔΪÿһÌìÉú³ÉÒ»¸öÔ¤²â¡£

ÎÒÃÇÕâÀÎÒÃǾö¶¨ÁË£¬ÌØÕ÷ÊÇһϵÁе±Ç°Öµ£¬±êÇ©ÊÇδÀ´µÄ¼Û¸ñ£¬ÆäÖÐδÀ´ÊÇÊý¾Ý¼¯Õû¸ö³¤¶ÈµÄ 1%¡£ÎÒÃǼÙÉèËùÓе±Ç°Áж¼ÊÇÎÒÃǵÄÌØÕ÷£¬ËùÒÔÎÒÃÇʹÓÃÒ»¸ö¼òµ¥µÄ Pnadas ²Ù×÷Ìí¼ÓÒ»¸öеÄÁУº

df['label'] = df[forecast_col].shift(-forecast_out)

ÏÖÔÚÎÒÃÇÓµÓÐÁËÊý¾Ý£¬°üº¬ÌØÕ÷ºÍ±êÇ©¡£ÏÂÃæÎÒÃÇÔÚʵ¼ÊÔËÐÐÈκζ«Î÷֮ǰ£¬ÎÒÃÇÐèÒª×öһЩԤ´¦ÀíºÍ×îÖÕ²½Ö裬ÎÒÃÇÔÚÏÂһƪ½Ì³Ì»á¹Ø×¢¡£

ѵÁ·ºÍ²âÊÔ

»¶Ó­ÔĶÁ Python »úÆ÷ѧϰϵÁн̵̳ĵÚËIJ¿·Ö¡£ÔÚÉÏÒ»¸ö½Ì³ÌÖУ¬ÎÒÃÇ»ñÈ¡Á˳õʼÊý¾Ý£¬°´ÕÕÎÒÃǵÄϲºÃ²Ù×÷ºÍת»»Êý¾Ý£¬Ö®ºóÎÒÃǶ¨ÒåÁËÎÒÃǵÄÌØÕ÷¡£Scikit ²»ÐèÒª´¦Àí Pandas ºÍ DataFrame£¬ÎÒ³öÓÚ×Ô¼ºµÄϲºÃ¶ø´¦ÀíËü£¬ÒòΪËü¿ì²¢ÇÒ¸ßЧ¡£·´Ö®£¬Sklearn ʵ¼ÊÉÏÐèÒª NumPy Êý×é¡£Pandas µÄ DataFrame ¿ÉÒÔÇáÒ×ת»»Îª NumPy Êý×飬ËùÒÔÊÂÇé¾ÍÊÇÕâÑùµÄ¡£

ĿǰΪֹÎÒÃǵĴúÂ룺

import quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression

df = quandl.get("WIKI/GOOGL")

print(df.head())
#print(df.tail())

df = df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]

df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
print(df.head())

forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))

df['label'] = df[forecast_col].shift(-forecast_out)

ÎÒÃÇÖ®ºóÒª¶ªÆúËùÓÐÈÔ¾ÉÊÇ NaN µÄÐÅÏ¢¡£

df.dropna(inplace=True)

¶ÔÓÚ»úÆ÷ѧϰÀ´Ëµ£¬Í¨³£Òª¶¨ÒåX£¨´óд£©×÷ÎªÌØÕ÷£¬ºÍy£¨Ð¡Ð´£©×÷Ϊ¶ÔÓÚÌØÕ÷µÄ±êÇ©¡£ÕâÑù£¬ÎÒÃÇ¿ÉÒÔ¶¨ÒåÎÒÃǵÄÌØÕ÷ºÍ±êÇ©£¬ÏñÕâÑù£º

X = np.array(df.drop(['label'], 1))
y = np.array(df['label'])

ÉÏÃæ£¬ÎÒÃÇËù×öµÄ¾ÍÊǶ¨ÒåX£¨ÌØÕ÷£©£¬ÊÇÎÒÃÇÕû¸öµÄ DataFrame£¬³ýÁËlabelÁУ¬²¢×ª»»Îª NumPy Êý×é¡£ÎÒÃÇʹÓÃdrop·½·¨£¬¿ÉÒÔÓÃÓÚ DataFrame£¬Ëü·µ»ØÒ»¸öÐ嵀 DataFrame¡£ÏÂÃæ£¬ÎÒÃǶ¨ÒåÎÒÃǵÄy±äÁ¿£¬ËüÊÇÎÒÃǵıêÇ©£¬½ö½öÊÇ DataFrame µÄ±êÇ©ÁУ¬²¢×ª»»Îª NumPy Êý×é¡£

ÏÖÔÚÎÒÃǾÍÄܸæÒ»¶ÎÂ䣬תÏòѵÁ·ºÍ²âÊÔÁË£¬µ«ÊÇÎÒÃÇ´òËã×öһЩԤ´¦Àí¡£Í¨³££¬ÄãÏ£ÍûÄãµÄÌØÕ÷ÔÚ -1 µ½ 1 µÄ·¶Î§ÄÚ¡£Õâ¿ÉÄܲ»Æð×÷Ó㬵«ÊÇͨ³£»á¼ÓËÙ´¦Àí¹ý³Ì£¬²¢ÓÐÖúÓÚ׼ȷÐÔ¡£ÒòΪ´ó¼Ò¶¼Ê¹ÓÃÕâ¸ö·¶Î§£¬Ëü°üº¬ÔÚÁË Sklearn µÄpreprocessingÄ£¿éÖС£ÎªÁËʹÓÃËü£¬ÄãÐèÒª¶ÔÄãµÄX±äÁ¿µ÷ÓÃpreprocessing.scale¡£

X = preprocessing.scale(X)

ÏÂÃæ£¬´´½¨±êÇ©y£º

y = np.array(df['label'])

ÏÖÔÚ¾ÍÊÇѵÁ·ºÍ²âÊÔµÄʱºòÁË¡£·½Ê½¾ÍÊÇѡȡ 75% µÄÊý¾ÝÓÃÓÚѵÁ·»úÆ÷ѧϰ·ÖÀàÆ÷¡£Ö®ºóѡȡʣÏ嵀 25% µÄÊý¾ÝÓÃÓÚ²âÊÔ·ÖÀàÆ÷¡£ÓÉÓÚÕâÊÇÄãµÄÑùÀýÊý¾Ý£¬ÄãÓ¦¸ÃÓµÓÐÌØÕ÷ºÍÒ»Ö±±êÇ©¡£Òò´Ë£¬Èç¹ûÄã²âÊÔºó 25% µÄÊý¾Ý£¬Äã¾Í»áµÃµ½Ò»ÖÖ׼ȷ¶ÈºÍ¿É¿¿ÐÔ£¬½Ð×öÖÃÐŶȡ£ÓÐÐí¶à·½Ê½¿ÉÒÔʵÏÖËü£¬µ«ÊÇ£¬×îºÃµÄ·½Ê½¿ÉÄܾÍÊÇʹÓÃÄÚ½¨µÄcross_validation£¬ÒòΪËüÒ²»áΪÄã´òÂÒÊý¾Ý¡£´úÂëÊÇÕâÑù£º

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

ÕâÀïµÄ·µ»ØÖµÊÇÌØÕ÷µÄѵÁ·¼¯¡¢²âÊÔ¼¯¡¢±êÇ©µÄѵÁ·¼¯ºÍ²âÊÔ¼¯¡£ÏÖÔÚ£¬ÎÒÃÇÒѾ­¶¨ÒåºÃÁË·ÖÀàÆ÷¡£Sklearn ÌṩÁËÐí¶àͨÓõķÖÀàÆ÷£¬ÓÐһЩ¿ÉÒÔÓÃÓڻع顣ÎÒÃÇ»áÔÚÕâ¸öÀý×ÓÖÐչʾһЩ£¬µ«ÊÇÏÖÔÚ£¬ÈÃÎÒÃÇʹÓÃsvm°üÖеÄÖ§³ÖÏòÁ¿»Ø¹é¡£

clf = svm.SVR()

ÎÒÃÇÕâÀï½ö½öʹÓÃĬÈÏÑ¡ÏîÀ´Ê¹ÊÂÇé¼òµ¥£¬µ«ÊÇÄã¿ÉÒÔÔÚsklearn.svm.SVRµÄÎĵµÖÐÁ˽â¸ü¶à¡£

Ò»µ©Ä㶨ÒåÁË·ÖÀàÆ÷£¬Äã¾Í¿ÉÒÔѵÁ·ËüÁË¡£ÔÚ Sklearn ÖУ¬Ê¹ÓÃfitÀ´ÑµÁ·¡£

clf.fit(X_train, y_train)1

ÕâÀÎÒÃÇÄâºÏÁËÎÒÃǵÄѵÁ·ÌØÕ÷ºÍѵÁ·±êÇ©¡£

ÎÒÃǵķÖÀàÆ÷ÏÖÔÚѵÁ·Íê±Ï¡£Õâ·Ç³£¼òµ¥£¬ÏÖÔÚÎÒÃÇ¿ÉÒÔ²âÊÔÁË¡£

confidence = clf.score(X_test, y_test)

¼ÓÔØ²âÊÔ£¬Ö®ºó£º

print(confidence)
# 0.960075071072

ËùÒÔÕâÀÎÒÃÇ¿ÉÒÔ¿´µ½×¼È·Âʼ¸ºõÊÇ 96%¡£Ã»ÓÐʲô¿É˵µÄ£¬ÈÃÎÒÃdz¢ÊÔÁíÒ»¸ö·ÖÀàÆ÷£¬ÕâÒ»´ÎʹÓÃLinearRegression£º

clf = LinearRegression()
# 0.963311624499

¸üºÃÒ»µã£¬µ«ÊÇ»ù±¾Ò»Ñù¡£ËùÒÔ×÷Ϊ¿ÆÑ§¼Ò£¬ÎÒÃÇÈçºÎÖªµÀ£¬Ñ¡ÔñÄĸöËã·¨ÄØ£¿²»¾Ã£¬Äã»áÊìϤʲôÔÚ¶àÊýÇé¿ö϶¼¹¤×÷£¬Ê²Ã´²»¹¤×÷¡£Äã¿ÉÒÔ´Ó Scikit µÄÕ¾µãÉϲ鿴ѡÔñÕýÈ·µÄÆÀ¹À¹¤¾ß¡£ÕâÓÐÖúÓÚÄãä¯ÀÀһЩ»ù±¾µÄÑ¡Ïî¡£Èç¹ûÄãѯÎʸã»úÆ÷ѧϰµÄÈË£¬ËüÍêÈ«ÊÇÊÔÑéºÍ³ö´í¡£Äã»á³¢ÊÔ´óÁ¿µÄËã·¨²¢ÇÒ½ö½öѡȡ×îºÃµÄÄǸö¡£Òª×¢ÒâµÄÁíÒ»¼þÊÂÇé¾ÍÊÇ£¬Ò»Ð©Ëã·¨±ØÐëÏßÐÔÔËÐУ¬ÆäËüµÄ²»ÊÇ¡£²»Òª°ÑÏßÐԻعéºÍÏßÐÔÔËÐиã»ìÁË¡£ËùÒÔÕâЩÒâζ×ÅÊ²Ã´ÄØ£¿Ò»Ð©»úÆ÷ѧϰËã·¨»áÒ»´Î´¦ÀíÒ»²½£¬Ã»ÓжàỊ̈߳¬ÆäËüµÄʹÓöàỊ̈߳¬²¢ÇÒ¿ÉÒÔÀûÓÃÄã»úÆ÷ÉϵĶàºË¡£Äã¿ÉÒÔÉîÈëÁ˽âÿ¸öËã·¨£¬À´ÅªÇå³þÄĸö¿ÉÒÔ¶àỊ̈߳¬»òÕßÄã¿ÉÒÔÔĶÁÎĵµ£¬²¢²é¿´n_jobs²ÎÊý¡£Èç¹ûÓµÓÐn_jobs£¬Äã¾Í¿ÉÒÔÈÃË㷨ͨ¹ý¶àÏß³ÌÀ´»ñÈ¡¸ü¸ßµÄÐÔÄÜ¡£Èç¹ûûÓУ¬¾ÍºÜ²»×ßÔËÁË¡£ËùÒÔ£¬Èç¹ûÄã´¦Àí´óÁ¿µÄÊý¾Ý£¬»òÕßÐèÒª´¦ÀíÖеȹæÄ£µÄÊý¾Ý£¬µ«ÊÇÐèÒªºÜ¸ßµÄËÙ¶È£¬Äã¾Í¿ÉÄÜÏëÒªÏ̼߳ÓËÙ¡£ÈÃÎÒÃÇ¿´¿´ÕâÁ½¸öËã·¨¡£

·ÃÎÊsklearn.svm.SVRµÄÎĵµ£¬²¢²é¿´²ÎÊý£¬¿´µ½n_jobsÁËÂ·´ÕýÎÒû¿´µ½£¬ËùÒÔËü¾Í²»ÄÜʹÓÃÏ̡߳£Äã¿ÉÄܻῴµ½£¬ÔÚÎÒÃǵÄСÐÍÊý¾Ý¼¯ÉÏ£¬²îÒì²»´ó¡£µ«ÊÇ£¬¼ÙÉèÊý¾Ý¼¯ÓÉ 20MB£¬²îÒì¾ÍºÜÃ÷ÏÔ¡£È»ºó£¬ÎÒÃDz鿴LinearRegressionËã·¨£¬¿´µ½n_jobsÁËÂµ±È»£¬ËùÒÔÕâÀÄã¿ÉÒÔÖ¸¶¨ÄãÏ£Íû¶àÉÙÏ̡߳£Èç¹ûÄã´«Èë-1£¬Ëã·¨»áʹÓÃËùÓпÉÓõÄÏ̡߳£

ÕâÑù£º

clf = LinearRegression(n_jobs=-1)

¾Í¹»ÁË¡£ËäÈ»ÎÒÈÃÄã×öÁ˺ÜÉÙµÄÊÂÇ飨²é¿´Îĵµ£©£¬ÈÃÎÒ¸øÄã˵¸öÊÂʵ°É£¬½ö½öÓÉÓÚ»úÆ÷ѧϰË㷨ʹÓÃĬÈϲÎÊý¹¤×÷£¬²»´ú±íÄã¿ÉÒÔºöÂÔËüÃÇ¡£ÀýÈ磬ÈÃÎÒÃǻعËsvm.SVR¡£SVR ÊÇÖ§³ÖÏòÁ¿»Ø¹é£¬ÔÚÖ´ÐлúÆ÷ѧϰʱ£¬ËüÊÇÒ»Öּܹ¹¡£Îҷdz£¹ÄÀøÄÇЩÓÐÐËȤѧϰ¸ü¶àµÄÈË£¬È¥Ñо¿Õâ¸öÖ÷Ì⣬ÒÔ¼°Ïò±ÈÎÒѧÀú¸ü¸ßµÄÈËѧϰ»ù´¡¡£ÎһᾡÁ¦°Ñ¶«Î÷½âÊ͵øü¼òµ¥£¬µ«ÊÇÎÒ²¢²»ÊÇר¼Ò¡£»Øµ½¸Õ²ÅµÄ»°Ì⣬svm.SVRÓÐÒ»¸ö²ÎÊý½Ð×ökernel¡£Õâ¸öÊÇÊ²Ã´ÄØ£¿ºË¾ÍÏ൱ÓÚÄãµÄÊý¾ÝµÄת»»¡£ÕâʹµÃ´¦Àí¹ý³Ì¸ü¼ÓѸËÙ¡£ÔÚsvm.SVRµÄÀý×ÓÖУ¬Ä¬ÈÏÖµÊÇrbf£¬ÕâÊǺ˵ÄÒ»¸öÀàÐÍ£¬ÄãÓÐһЩѡÔñ¡£²é¿´Îĵµ£¬Äã¿ÉÒÔÑ¡Ôñ'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'»òÕßÒ»¸ö¿Éµ÷ÓöÔÏó¡£Í¬Ñù£¬¾ÍÏñ³¢ÊÔ²»Í¬µÄ ML Ëã·¨Ò»Ñù£¬Äã¿ÉÒÔ×öÄãÏë×öµÄÈκÎÊÂÇ飬³¢ÊÔһϲ»Í¬µÄºË°É¡£

for k in ['linear','poly','rbf','sigmoid']:
clf = svm.SVR(kernel=k)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(k,confidence)1
linear 0.960075071072
poly 0.63712232551
rbf 0.802831714511
sigmoid -0.125347960903

ÎÒÃÇ¿ÉÒÔ¿´µ½£¬ÏßÐԵĺ˱íÏÖ×îºÃ£¬Ö®ºóÊÇrbf£¬Ö®ºóÊÇpoly£¬sigmoidºÜÏÔÈ»ÊǸö°ÚÉ裬²¢ÇÒÓ¦¸ÃÒÆ³ý¡£

ËùÒÔÎÒÃÇѵÁ·²¢²âÊÔÁËÊý¾Ý¼¯¡£ÎÒÃÇÒѾ­ÓÐ 71% µÄÂúÒâ¶ÈÁË¡£ÏÂÃæÎÒÃÇ×öÊ²Ã´ÄØ£¿ÏÖÔÚÎÒÃÇÐèÒªÔÙ½øÒ»²½£¬×öһЩԤ²â£¬ÏÂÒ»Õ»áÉæ¼°Ëü¡£

Ô¤²â

»¶Ó­ÔĶÁ»úÆ÷ѧϰϵÁн̵̳ĵÚÎåÕ£¬µ±Ç°Éæ¼°µ½»Ø¹é¡£Ä¿Ç°ÎªÖ¹£¬ÎÒÃÇÊÕ¼¯²¢ÐÞ¸ÄÁËÊý¾Ý£¬ÑµÁ·²¢²âÊÔÁË·ÖÀàÆ÷¡£ÕâÒ»ÕÂÖУ¬ÎÒÃÇ´òËãʹÓÃÎÒÃǵķÖÀàÆ÷À´Êµ¼Ê×öһЩԤ²â¡£ÎÒÃÇĿǰËùʹÓõĴúÂëΪ£º

import quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression

df = quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)

X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df['label'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(confidence)

ÎÒ»áÇ¿µ÷£¬×¼È·ÂÊ´óÓÚ 95% µÄÏßÐÔÄ£ÐͲ¢²»ÊÇÄÇôºÃ¡£ÎÒµ±È»²»»áÓÃËüÀ´½»Ò׹ɯ±¡£ÈÔÈ»ÓÐһЩÐèÒª¿¼ÂǵÄÎÊÌâ£¬ÌØ±ðÊDz»Í¬¹«Ë¾Óв»Í¬µÄ¼Û¸ñ¹ì¼£¡£Google ·Ç³£ÏßÐÔ£¬ÏòÓÒÉϽÇÒÆ¶¯£¬Ðí¶à¹«Ë¾²»ÊÇÕâÑù£¬ËùÒÔÒª¼Çס¡£ÏÖÔÚ£¬ÎªÁË×öÔ¤²â£¬ÎÒÃÇÐèҪһЩÊý¾Ý¡£ÎÒÃǾö¶¨Ô¤²â 1% µÄÊý¾Ý£¬Òò´ËÎÒÃÇ´òË㣬»òÕßÖÁÉÙÄܹ»Ô¤²âÊý¾Ý¼¯µÄºó 1%¡£ËùÒÔÎÒÃÇʲô¿ÉÒÔÕâÑù×öÄØ£¿ÎÒÃÇʲôʱºò¿ÉÒÔʶ±ðÕâЩÊý¾Ý£¿ÎÒÃÇÏÖÔھͿÉÒÔ£¬µ«ÊÇҪעÒâÎÒÃdz¢ÊÔÔ¤²âµÄÊý¾Ý£¬²¢Ã»ÓÐÏñѵÁ·¼¯ÄÇÑùËõ·Å¡£ºÃµÄ£¬ÄÇô×öÊ²Ã´ÄØ£¿ÊÇ·ñÒª¶Ôºó 1% µ÷ÓÃpreprocessing.scale()£¿Ëõ·Å·½·¨»ùÓÚËùÓиøËüµÄÒÑÖªÊý¾Ý¼¯¡£ÀíÏëÇé¿öÏ£¬ÄãÓ¦¸ÃһͬËõ·ÅѵÁ·¼¯¡¢²âÊÔ¼¯ºÍÓÃÓÚÔ¤²âµÄÊý¾Ý¡£ÕâÓÀÔ¶ÊÇ¿ÉÄÜ»òºÏÀíµÄÂ²»ÊÇ£¬Èç¹ûÄã¿ÉÒÔÕâô×ö£¬Äã¾ÍÓ¦¸ÃÕâô×ö¡£µ«ÊÇ£¬ÎÒÃÇÕâÀÎÒÃÇ¿ÉÒÔÕâô×ö¡£ÎÒÃǵÄÊý¾Ý×㹻С£¬²¢ÇÒ´¦Àíʱ¼ä×ã¹»µÍ£¬ËùÒÔÎÒÃÇ»áÒ»´ÎÐÔÔ¤´¦Àí²¢Ëõ·ÅÊý¾Ý¡£

ÔÚÐí¶àÀý×ÓÖУ¬Äã²»ÄÜÕâô×ö¡£ÏëÏóÈç¹ûÄãʹÓü¸¸ö GB µÄÊý¾ÝÀ´ÑµÁ··ÖÀàÆ÷¡£ÑµÁ··ÖÀàÆ÷»á»¨·Ñ¼¸Ì죬²»ÄÜÔÚÿ´ÎÏëÒª×ö³öÔ¤²âµÄʱºò¶¼Õâô×ö¡£Òò´Ë£¬Äã¿ÉÄÜÐèÒª²»Ëõ·ÅÈκζ«Î÷£¬»òÕßµ¥¶ÀËõ·ÅÊý¾Ý¡£Í¨³££¬Äã¿ÉÄÜÏ£Íû²âÊÔÕâÁ½¸öÑ¡Ï²¢¿´¿´ÄǸö¶ÔÓÚÄãµÄÌØ¶¨°¸Àý¸üºÃ¡£

Òª¼ÇסËü£¬ÈÃÎÒÃÇÔÚ¶¨ÒåXµÄʱºò´¦ÀíËùÓÐÐУº

X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]

df.dropna(inplace=True)

y = np.array(df['label'])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print(confidence)

ҪעÒâÎÒÃÇÊ×ÏÈ»ñÈ¡ËùÓÐÊý¾Ý£¬Ô¤´¦Àí£¬Ö®ºóÔٷָÎÒÃǵÄX_lately±äÁ¿°üº¬×î½üµÄÌØÕ÷£¬ÎÒÃÇÐèÒª¶ÔÆä½øÐÐÔ¤²â¡£Ä¿Ç°Äã¿ÉÒÔ¿´µ½£¬¶¨Òå·ÖÀàÆ÷¡¢ÑµÁ·¡¢ºÍ²âÊÔ¶¼·Ç³£¼òµ¥¡£Ô¤²â

forecast_set = clf.predict(X_lately)

Ò²·Ç³£¼òµ¥£º

forecast_setÊÇÔ¤²âÖµµÄÊý×飬±íÃ÷Äã²»½ö½ö¿ÉÒÔ×ö³öµ¥¸öÔ¤²â£¬»¹¿ÉÒÔÒ»´ÎÐÔÔ¤²â¶à¸öÖµ¡£¿´¿´ÎÒÃÇĿǰӵÓÐʲô£º

[ 745.67829395 737.55633261 736.32921413 717.03929303 718.59047951
731.26376715 737.84381394 751.28161162 756.31775293 756.76751056
763.20185946 764.52651181 760.91320031 768.0072636 766.67038016
763.83749414 761.36173409 760.08514166 770.61581391 774.13939706
768.78733341 775.04458624 771.10782342 765.13955723 773.93369548
766.05507556 765.4984563 763.59630529 770.0057166 777.60915879] 0.956987938167 30

ËùÒÔÕâЩ¾ÍÊÇÎÒÃǵÄÔ¤²â½á¹û£¬È»ºóÄØ£¿ÒѾ­»ù±¾Íê³ÉÁË£¬µ«ÊÇÎÒÃÇ¿ÉÒÔ½«Æä¿ÉÊÓ»¯¡£¹ÉƱ¼Û¸ñÊÇÿһÌìµÄ£¬Ò»ÖÜ 5 Ì죬ÖÜĩûÓС£ÎÒÖªµÀÕâ¸öÊÂʵ£¬µ«ÊÇÎÒÃÇ´òË㽫Æä¼ò»¯£¬°Ñÿ¸öÔ¤²âÖµµ±³ÉÿһÌìµÄ¡£Èç¹ûÄã´òËã´¦ÀíÖÜÄ©µÄ¼ä¸ô£¨²»ÒªÍüÁË¼ÙÆÚ£©£¬¾ÍÈ¥×ö°É£¬µ«ÊÇÎÒÕâÀï»á½«Æä¼ò»¯¡£×ʼ£¬ÎÒÃÇÌí¼ÓһЩеĵ¼È룺

import datetime
import matplotlib.pyplot as plt
from matplotlib import style

ÎÒµ¼ÈëÁËdatetimeÀ´´¦Àídatetime¶ÔÏó£¬Matplotlib µÄpyplot°üÓÃÓÚ»æÍ¼£¬ÒÔ¼°styleÀ´Ê¹ÎÒÃǵĻæÍ¼¸ü¼Óʱ÷Ö¡£ÈÃÎÒÃÇÉèÖÃÒ»¸öÑùʽ£º

style.use('ggplot')1

Ö®ºó£¬ÎÒÃÇÌí¼ÓÒ»¸öеÄÁУ¬forecastÁУº

df['Forecast'] = np.nan

ÎÒÃÇÊ×ÏȽ«ÖµÉèÖÃΪ NaN£¬µ«ÊÇÎÒÃÇÖ®ºó»áÌî³äËû¡£

Ô¤²â¼¯µÄ±êÇ©ÕýºÃ´ÓÃ÷Ì쿪ʼ¡£ÒòΪÎÒÃÇÒªÔ¤²âδÀ´m = 0.1 * len(df)ÌìµÄÊý¾Ý£¬Ï൱ÓÚ°ÑÊÕÅ̼ÛÍùÇ°ÒÆ¶¯mÌìÉú³É±êÇ©¡£ÄÇôÊý¾Ý¼¯µÄºóm¸öÊDz»ÄÜÓÃ×÷ѵÁ·¼¯ºÍ²âÊÔ¼¯µÄ£¬ÒòΪûÓбêÇ©¡£ÓÚÊÇÎÒÃǽ«ºóm¸öÊý¾ÝÓÃ×÷Ô¤²â¼¯¡£Ô¤²â¼¯µÄµÚÒ»¸öÊý¾Ý£¬Ò²¾ÍÊÇÊý¾Ý¼¯µÄµÚn - m¸öÊý¾Ý£¬ËüµÄ±êǩӦ¸ÃÊÇn - m + m = nÌìµÄÊÕÅ̼ۣ¬ÎÒÃÇÖªµÀ½ñÌìÔÚdfÀïÃæÊǵÚn - 1Ì죬ÄÇôËü¾ÍÊÇÃ÷Ìì¡£

ÎÒÃÇÊ×ÏÈÐèҪץȡ DataFrame µÄ×îºóÒ»Ì죬½«Ã¿Ò»¸öеÄÔ¤²âÖµ¸³¸øÐµÄÈÕÆÚ¡£ÎÒÃÇ»áÕâÑù¿ªÊ¼¡£

last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day

ÏÖÔÚÎÒÃÇÓµÓÐÁËÔ¤²â¼¯µÄÆðʼÈÕÆÚ£¬²¢ÇÒÒ»ÌìÓÐ 86400 Ãë¡£ÏÖÔÚÎÒÃǽ«Ô¤²âÌí¼Óµ½ÏÖÓÐµÄ DataFrame ÖС£

for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]

ÎÒÃÇÕâÀïËù×öµÄÊÇ£¬µü´úÔ¤²â¼¯µÄ±êÇ©£¬»ñȡÿ¸öÔ¤²âÖµºÍÈÕÆÚ£¬Ö®ºó½«ÕâЩֵ·ÅÈë DataFrame£¨Ê¹Ô¤²â¼¯µÄÌØÕ÷Ϊ NaN£©¡£×îºóÒ»ÐеĴúÂë´´½¨ DataFrame ÖеÄÒ»ÐУ¬ËùÓÐÔªËØÖÃΪ NaN£¬È»ºó½«×îºóÒ»¸öÔªËØÖÃΪi£¨ÕâÀïÊÇÔ¤²â¼¯µÄ±êÇ©£©¡£ÎÒÑ¡ÔñÁËÕâÖÖµ¥ÐеÄforÑ­»·£¬ÒÔ±ãÔڸ͝ DataFrame ºÍÌØÕ÷Ö®ºó£¬´úÂ뻹ÄÜÕý³£¹¤×÷¡£ËùÓж«Î÷¶¼×öÍêÁËÂ𣿽«Æä»æÖƳöÀ´¡£

df['Adj. Close'].plot()
df['Forecast'].plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

ÍêÕûµÄ´úÂ룺

import Quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime

style.use('ggplot')

df = Quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)

X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]

df.dropna(inplace=True)

y = np.array(df['label'])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)

forecast_set = clf.predict(X_lately)
df['Forecast'] = np.nan

last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day

for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]

df['Adj. Close'].plot()
df['Forecast'].plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

½á¹û£º

±£´æºÍÀ©Õ¹

ÉÏһƪ½Ì³ÌÖУ¬ÎÒÃÇʹÓûعéÍê³ÉÁË¶Ô¹ÉÆ±¼Û¸ñµÄÔ¤²â£¬²¢Ê¹Óà Matplotlib ¿ÉÊÓ»¯¡£Õâ¸ö½Ì³ÌÖУ¬ÎÒÃÇ»áÌÖÂÛһЩ½ÓÏÂÀ´µÄ²½Öè¡£

ÎҼǵÃÎÒµÚÒ»´Î³¢ÊÔѧϰ»úÆ÷ѧϰµÄʱºò£¬¶àÊýʾÀý½ö½öÉæ¼°µ½ÑµÁ·ºÍ²âÊԵIJ¿·Ö£¬ÍêÈ«Ìø¹ýÁËÔ¤²â²¿·Ö¡£¶ÔÓÚÄÇЩ°üº¬ÑµÁ·¡¢²âÊÔºÍÔ¤²â²¿·ÖµÄ½Ì³ÌÀ´Ëµ£¬ÎÒûÓÐÕÒµ½Ò»Æª½âÊͱ£´æËã·¨µÄÎÄÕ¡£ÔÚÄÇЩÀý×ÓÖУ¬Êý¾Ýͨ³£·Ç³£Ð¡£¬ËùÒÔѵÁ·¡¢²âÊÔºÍÔ¤²â¹ý³Ì¶¼ºÜ¿ì¡£ÔÚÕæÊµÊÀ½çÖУ¬Êý¾Ý¶¼·Ç³£´ó£¬²¢ÇÒ»¨·Ñ¸ü³¤Ê±¼äÀ´´¦Àí¡£ÓÉÓÚûÓÐһƪ½Ì³ÌÕæÕý̸ÂÛµ½ÕâÒ»ÖØÒªµÄ¹ý³Ì£¬ÎÒ´òËã°üº¬Ò»Ð©´¦Àíʱ¼äºÍ±£´æËã·¨µÄÐÅÏ¢¡£

ËäÈ»ÎÒÃǵĻúÆ÷ѧϰ·ÖÀàÆ÷»¨·Ñ¼¸ÃëÀ´ÑµÁ·£¬ÔÚһЩÇé¿öÏ£¬ÑµÁ··ÖÀàÆ÷ÐèÒª¼¸¸öСʱÉõÖÁÊǼ¸Ìì¡£ÏëÏóÄãÏëÒªÔ¤²â¼Û¸ñµÄÿÌì¶¼ÐèÒªÕâô×ö¡£Õâ²»ÊDZØÒªµÄ£¬ÒòΪÎÒÃÇÄØ¿ÉÒÔʹÓà Pickle Ä£¿éÀ´±£´æ·ÖÀàÆ÷¡£Ê×ÏÈÈ·±£Äãµ¼ÈëÁËËü£º

import pickle

ʹÓà Pickle£¬Äã¿ÉÒÔ±£´æ Python ¶ÔÏ󣬾ÍÏñÎÒÃǵķÖÀàÆ÷ÄÇÑù¡£ÔÚ¶¨Ò塢ѵÁ·ºÍ²âÊÔÄãµÄ·ÖÀàÆ÷Ö®ºó£¬Ìí¼Ó£º

with open('linearregression.pickle','wb') as f:
pickle.dump(clf, f)

ÏÖÔÚ£¬ÔÙ´ÎÖ´Ðнű¾£¬ÄãÓ¦¸ÃµÃµ½ÁËlinearregression.pickle£¬ËüÊÇ·ÖÀàÆ÷µÄÐòÁл¯Êý¾Ý¡£ÏÖÔÚ£¬ÄãÐèÒª×öµÄËùÓÐÊÂÇé¾ÍÊǼÓÔØpickleÎļþ£¬½«Æä±£´æµ½clf£¬²¢ÕÕ³£Ê¹Óã¬ÀýÈ磺

pickle_in = open('linearregression.pickle','rb')
clf = pickle.load(pickle_in)

´úÂëÖУº

import Quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
import pickle

style.use('ggplot')

df = Quandl.get("WIKI/GOOGL")
df = df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.1 * len(df)))

df['label'] = df[forecast_col].shift(-forecast_out)

X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]

df.dropna(inplace=True)

y = np.array(df['label'])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
#COMMENTED OUT:
##clf = svm.SVR(kernel='linear')
##clf.fit(X_train, y_train)
##confidence = clf.score(X_test, y_test)
##print(confidence)
pickle_in = open('linearregression.pickle','rb')
clf = pickle.load(pickle_in)


forecast_set = clf.predict(X_lately)
df['Forecast'] = np.nan

last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day

for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
next_unix += 86400
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]
df['Adj. Close'].plot()
df['Forecast'].plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

ҪעÒâÎÒÃÇ×¢Ê͵ôÁË·ÖÀàÆ÷µÄԭʼ¶¨Ò壬²¢Ì滻Ϊ¼ÓÔØÎÒÃDZ£´æµÄ·ÖÀàÆ÷¡£¾ÍÊÇÕâô¼òµ¥¡£

×îºó£¬ÎÒÃÇÒªÌÖÂÛÒ»ÏÂЧÂʺͱ£´æÊ±¼ä£¬Ç°¼¸ÌìÎÒ´òËãÌá³öÒ»¸öÏà¶Ô½ÏµÍµÄ·¶Ê½£¬Õâ¾ÍÊÇÁÙʱµÄ³¬¼¶¼ÆËã»ú¡£ÑÏËàµØËµ£¬Ëæ×Ű´ÐèÖ÷»ú·þÎñµÄÐËÆð£¬ÀýÈç AWS¡¢DO ºÍ Linode£¬ÄãÄܹ»°´ÕÕСʱÀ´¹ºÂòÖ÷»ú¡£ÐéÄâ·þÎñÆ÷¿ÉÒÔÔÚ 60 ÃëÄÚ½¨Á¢£¬ËùÐèµÄÄ£¿é¿ÉÒÔÔÚ 15 ·ÖÖÓÄÚ°²×°£¬ËùÒԷdz£ÓÐÏÞ¡£Äã¿ÉÒÔдһ¸ö shell ½Å±¾»òÕßʲô¶«Î÷À´¸øËü¼ÓËÙ¡£¿¼ÂÇÄãÐèÒª´óÁ¿µÄ´¦Àí£¬²¢ÇÒ»¹Ã»ÓÐһ̨¶¥¼¶¼ÆËã»ú£¬»òÕßÄãʹÓñʼDZ¾¡£Ã»ÓÐÎÊÌ⣬ֻÐèÒªÆô¶¯Ò»Ì¨·þÎñÆ÷¡£

ÎÒ¶ÔÕâ¸ö·½Ê½µÄ×îºóÒ»¸ö×¢½âÊÇ£¬Ê¹ÓÃÈκÎÖ÷»ú£¬Äãͨ³£¶¼¿ÉÒÔ½¨Á¢Ò»¸ö·Ç³£Ð¡Ð͵ķþÎñÆ÷£¬¼ÓÔØËùÐèµÄ¶«Î÷£¬Ö®ºóÀ©Õ¹Õâ¸ö·þÎñÆ÷¡£ÎÒϲ»¶ÒÔÒ»¸öСÐÍ·þÎñÆ÷¿ªÊ¼£¬Ö®ºó£¬ÎÒ×¼±¸ºÃµÄʱºò£¬ÎÒ»á¸Ä±äËüµÄ³ß´ç£¬¸øËüÉý¼¶¡£Íê³ÉÖ®ºó£¬²»ÒªÍüÁË×¢Ïú»òÕß½µ¼¶ÄãµÄ·þÎñÆ÷¡£

ÀíÂÛÒÔ¼°¹¤×÷Ô­Àí

»¶Ó­ÔĶÁµÚÆßƪ½Ì³Ì¡£Ä¿Ç°ÎªÖ¹£¬ÄãÒѾ­¿´µ½ÁËÏßÐԻعéµÄ¼ÛÖµ£¬ÒÔ¼°ÈçºÎʹÓà Sklearn À´Ó¦ÓÃËü¡£ÏÖÔÚÎÒÃÇ´òËãÉîÈëÁ˽âËüÈçºÎ¼ÆËã¡£ËäÈ»ÎÒ¾õµÃ²»±ØÒªÉîÈ뵽ÿ¸ö»úÆ÷ѧϰËã·¨ÊýѧÖУ¨ÄãÓÐûÓнøÈëµ½Äã×îϲ»¶µÄÄ£¿éµÄÔ´ÂëÖУ¬¿´¿´ËüÊÇÈçºÎʵÏֵģ¿£©£¬ÏßÐÔ´úÊýÊÇ»úÆ÷ѧϰµÄ±¾ÖÊ£¬²¢ÇÒ¶ÔÓÚÀí½â»úÆ÷ѧϰµÄ¹¹½¨»ù´¡Ê®·ÖʵÓá£

ÏßÐÔ´úÊýµÄÄ¿±êÊǼÆËãÏòÁ¿¿Õ¼äÖеĵãµÄ¹ØÏµ¡£Õâ¿ÉÒÔÓÃÓںܶàÊÂÇ飬µ«ÊÇijÌ죬ÓиöÈËÓÐÁ˸ö·Ç³£¿ñÒ°µÄÏë·¨£¬ÄÃËû´¦ÀíÊý¾Ý¼¯µÄÌØÕ÷¡£ÎÒÃÇÒ²¿ÉÒÔ¡£¼ÇµÃ֮ǰÎÒÃǶ¨ÒåÊý¾ÝÀàÐ͵Äʱºò£¬ÏßÐԻع鴦ÀíÁ¬ÐøÊý¾ÝÂð£¿Õâ²¢²»ÊÇÒòΪʹÓÃÏßÐԻعéµÄÈË£¬¶øÊÇÒòΪ×é³ÉËüµÄÊýѧ¡£¼òµ¥µÄÏßÐԻعé¿ÉÓÃÓÚѰÕÒÊý¾Ý¼¯µÄ×î¼ÑÄâºÏÖ±Ïß¡£Èç¹ûÊý¾Ý²»ÊÇÁ¬ÐøµÄ£¬¾Í²»ÊÇ×î¼ÑÄâºÏÖ±Ïß¡£ÈÃÎÒÃÇ¿´¿´Ò»Ð©Ê¾Àý¡£

Э·½²î

ÉÏÃæµÄͼÏñÏÔȻӵÓÐÁ¼ºÃµÄЭ·½²î¡£Èç¹ûÄãͨ¹ý¹À¼Æ»­Ò»Ìõ×î¼ÑÄâºÏÖ±Ïߣ¬ÄãÓ¦¸ÃÄܹ»ÇáÒ×»­³öÀ´£º

Èç¹ûͼÏñÊÇÕâÑùÄØ£¿

²¢²»ºÍ֮ǰһÑù£¬µ«ÊÇÊÇÇå³þµÄ¸ºÏà¹Ø¡£Äã¿ÉÄÜÄܹ»»­³ö×î¼ÑÄâºÏÖ±Ïߣ¬µ«ÊǸü¿ÉÄÜ»­²»³öÀ´¡£

×îºó£¬Õâ¸öÄØ£¿

ɶ£¿µÄÈ·ÓÐ×î¼ÑÄâºÏÖ±Ïߣ¬µ«ÊÇÐèÒªÔËÆø½«Æä»­³öÀ´¡£

½«ÉÏÃæµÄͼÏñ¿´×öÌØÕ÷µÄͼÏñ£¬ËùÒÔ X ×ø±êÊÇÌØÕ÷£¬Y ×ø±êÊÇÏà¹ØµÄ±êÇ©¡£X ºÍ Y ÊÇ·ñÓÐÈκÎÐÎʽµÄ½á¹¹»¯¹ØÏµÄØ£¿ËäÈ»ÎÒÃÇ¿ÉÒÔ׼ȷ¼ÆËã¹ØÏµ£¬Î´À´ÎÒÃǾͲ»Ì«¿ÉÄÜÓµÓÐÕâô¶àÖµÁË¡£

ÔÚÆäËüͼÏñµÄ°¸ÀýÖУ¬X ºÍ Y Ö®¼äÏÔÈ»´æÔÚ¹ØÏµ¡£ÎÒÃÇʵ¼ÊÉÏ¿ÉÒÔ̽Ë÷ÕâÖÖ¹ØÏµ£¬Ö®ºóÑØ×ÅÎÒÃÇÏ£ÍûµÄÈÎºÎµã»æÍ¼¡£ÎÒÃÇ¿ÉÒÔÄà Y À´Ô¤²â X£¬»òÕßÄà X À´Ô¤²â Y£¬¶ÔÓÚÈκÎÎÒÃÇ¿ÉÒÔÏëµ½µÄµã¡£ÎÒÃÇÒ²¿ÉÒÔÔ¤²âÎÒÃǵÄÄ£ÐÍÓжàÉÙµÄÎó²î£¬¼´Ê¹Ä£ÐÍÖ»ÓÐÒ»¸öµã¡£ÎÒÃÇÈçºÎʵÏÖÕâ¸öħ·¨ÄØ£¿µ±È»ÊÇÏßÐÔ´úÊý¡£

Ê×ÏÈ£¬ÈÃÎÒÃǻص½ÖÐѧ£¬ÎÒÃÇÔÚÄÇÀ︴ϰֱÏߵ͍Ò壺y = mx + b£¬ÆäÖÐmÊÇбÂÊ£¬bÊÇ×ݽؾࡣÕâ¿ÉÒÔÊÇÓÃÓÚÇó½âyµÄ·½³Ì£¬ÎÒÃÇ¿ÉÒÔ½«Æä±äÐÎÀ´Çó½âx£¬Ê¹Óûù±¾µÄ´úÊýÔ­Ôò£ºx = (y-b)/m¡£

ºÃµÄ£¬ËùÒÔ£¬ÎÒÃǵÄÄ¿±êÊÇѰÕÒ×î¼ÑÄâºÏÖ±Ïß¡£²»Êǽö½öÊÇÄâºÏÁ¼ºÃµÄÖ±Ïߣ¬¶øÊÇ×îºÃµÄÄÇÌõ¡£ÕâÌõÖ±Ïߵ͍Òå¾ÍÊÇy = mx + b¡£y¾ÍÊǴ𰸣¨ÎÒÃÇÆäËûµÄ×ø±ê£¬»òÕßÉõÖÁÊÇÎÒÃǵÄÌØÕ÷£©£¬ËùÒÔÎÒÃÇÈÔÈ»ÐèÒªm£¨Ð±ÂÊ£©ºÍb£¨×ݽؾࣩ£¬ÓÉÓÚx¿ÉÄÜÎªÑØ x ÖáµÄÈÎÒ»µã£¬ËùÒÔËüÊÇÒÑÖªµÄ¡£

×î¼ÑÄâºÏÖ±ÏßµÄбÂÊm¶¨ÒåΪ£º

×¢£º¿É¼òдΪm = cov(x, y) / var(x)¡£

·ûºÅÉÏÃæµÄºá¸Ü´ú±í¾ùÖµ¡£Èç¹ûÁ½¸ö·ûºÅ°¤×Å£¬¾Í½«ÆäÏà³Ë¡£xs ºÍ ys ÊÇËùÓÐÒÑÖª×ø±ê¡£ËùÒÔÎÒÃÇÏÖÔÚÇó³öÁËy=mx+b×î¼ÑÄâºÏÖ±Ïß¶¨ÒåµÄm£¨Ð±ÂÊ£©£¬ÏÖÔÚÎÒÃǽö½öÐèÒªb£¨×ݽؾࣩ¡£ÕâÀïÊǹ«Ê½£º

ºÃµÄ¡£Õû¸ö²¿·Ö²»ÊǸöÊýѧ½Ì³Ì£¬¶øÊǸö±à³Ì½Ì³Ì¡£ÏÂÒ»¸ö½Ì³ÌÖУ¬ÎÒÃÇ´òËãÕâÑù×ö£¬²¢ÇÒ½âÊÍΪʲôÎÒÒª±à³ÌʵÏÖËü£¬¶ø²»ÊÇÖ±½ÓÓÃÄ£¿é¡£

±à³Ì¼ÆËãбÂÊ

»¶Ó­ÔĶÁµÚ°Ëƪ½Ì³Ì£¬ÎÒÃǸոÕÒâʶµ½£¬ÎÒÃÇÐèҪʹÓà Python ÖØ¸´±àдһЩ±È½ÏÖØÒªµÄËã·¨£¬À´³¢ÊÔ¸ø¶¨Êý¾Ý¼¯µÄ¼ÆËã×î¼ÑÄâºÏÖ±Ïß¡£

ÔÚÎÒÃÇ¿ªÊ¼Ö®Ç°£¬ÎªÊ²Ã´ÎÒÃÇ»áÓÐһЩСÂ鷳Ĩ£¿ÏßÐԻعéÊÇ»úÆ÷ѧϰµÄ¹¹½¨»ù´¡¡£Ëü¼¸ºõÓÃÓÚÿ¸öµ¥¶ÀµÄÖ÷Á÷»úÆ÷ѧϰËã·¨Ö®ÖУ¬ËùÒÔ¶ÔËüµÄÀí½âÓÐÖúÓÚÄãÕÆÎÕ¶àÊýÖ÷Á÷»úÆ÷ѧϰËã·¨¡£³öÓÚÎÒÃǵÄÈÈÇ飬Àí½âÏßÐԻعéºÍÏßÐÔ´úÊý£¬ÊDZàдÄã×Ô¼ºµÄ»úÆ÷ѧϰËã·¨£¬ÒÔ¼°¿çÈë»úÆ÷Ñ§Ï°Ç°ÑØ£¬Ê¹Óõ±Ç°×î¼ÑµÄ´¦Àí¹ý³ÌµÄµÚÒ»²½¡£ÓÉÓÚ´¦Àí¹ý³ÌµÄÓÅ»¯ºÍÓ²¼þ¼Ü¹¹µÄ¸Ä±ä¡£ÓÃÓÚ»úÆ÷ѧϰµÄ·½·¨ÂÛÒ²»á¸Ä±ä¡£×î½ü³öÏÖµÄÉñ¾­ÍøÂ磬ʹÓôóÁ¿ GPU À´Íê³É¹¤×÷¡£ÄãÏëÖªµÀʲôÊÇÉñ¾­ÍøÂçµÄºËÐÄÂð£¿Äã²Â¶ÔÁË£¬ÏßÐÔ´úÊý¡£

Èç¹ûÄãÄܼǵã¬×î¼ÑÄâºÏÖ±ÏßµÄбÂÊm£º

Êǵģ¬ÎÒÃǻὫÆä²ð³ÉƬ¶Î¡£Ê×ÏÈ£¬½øÐÐһЩµ¼È룺

from statistics import mean
import numpy as np

ÎÒÃÇ´Óstatisticsµ¼Èëmean£¬ËùÒÔÎÒÃÇ¿ÉÒÔÇáÒ×»ñÈ¡ÁбíµÄ¾ùÖµ¡£ÏÂÃæ£¬ÎÒÃÇʹnumpy as np£¬ËùÒÔÎÒÃÇ¿ÉÒÔÆä´´½¨ NumPy Êý×é¡£ÎÒÃÇ¿ÉÒÔ¶ÔÁбí×öºÜ¶àÊÂÇ飬µ«ÊÇÎÒÃÇÐèÒªÄܹ»×öһЩ¼òµ¥µÄ¾ØÕóÔËË㣬Ëü²¢²»¶Ô¼òµ¥ÁбíÌṩ£¬ËùÒÔÎÒÃÇʹÓà NumPy¡£ÎÒÃÇÔÚÕâ¸ö½×¶Î²»»áʹÓÃÌ«¸´Ô NumPy£¬µ«ÊÇÖ®ºó NumPy ¾Í»á³ÉΪÄãµÄ×î¼Ñ»ï°é¡£ÏÂÃæ£¬ÈÃÎÒÃǶ¨ÒåһЩÆðʼµã°É¡£

xs = [1,2,3,4,5]
ys = [5,4,6,5,6]

ËùÒÔÕâÀïÓÐһЩÎÒÃÇҪʹÓõÄÊý¾Ýµã£¬xsºÍys¡£Äã¿ÉÒÔÈÏΪxs¾ÍÊÇÌØÕ÷£¬ys¾ÍÊDZêÇ©£¬»òÕßËûÃǶ¼ÊÇÌØÕ÷£¬ÎÒÃÇÏëÒª½¨Á¢ËûÃǵÄÁªÏµ¡£Ö®Ç°Ìáµ½¹ý£¬ÎÒÃÇʵ¼ÊÉϰÑËüÃDZä³É NumPy Êý×飬ÒÔ±ãÖ´ÐоØÕóÔËËã¡£ËùÒÔÈÃÎÒÃÇÐÞ¸ÄÕâÁ½ÐУº

xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)

ÏÖÔÚËûÃǶ¼ÊÇ NumPy Êý×éÁË¡£ÎÒÃÇÒ²ÏÔʽÉùÃ÷ÁËÊý¾ÝÀàÐÍ¡£¼òµ¥½²Ò»Ï£¬Êý¾ÝÀàÐÍÓÐÌØÐÔÊÇÊôÐÔ£¬ÕâЩÊôÐÔ¾ö¶¨ÁËÊý¾Ý±¾ÉíÈçºÎ´¢´æºÍ²Ù×÷¡£ÏÖÔÚËü²»ÊÇʲôÎÊÌ⣬µ«ÊÇÈç¹ûÎÒÃÇÖ´ÐдóÁ¿ÔËË㣬²¢Ï£ÍûËûÃÇÅÜÔÚ GPU ¶ø²»ÊÇ CPU ÉϾÍÊÇÁË¡£

½«Æä»­³öÀ´£¬ËûÃÇÊÇ£º

ÏÖÔÚÎÒÃÇ×¼±¸ºÃ¹¹½¨º¯ÊýÀ´¼ÆËãm£¬Ò²¾ÍÊÇÎÒÃǵÄÖ±ÏßбÂÊ£º

def best_fit_slope(xs,ys):
return m

m = best_fit_slope(xs,ys)

ºÃÁË¡£¿ª¸öÍæÐ¦£¬ËùÒÔÕâÊÇÎÒÃǵĿò¼Ü£¬ÏÖÔÚÎÒÃÇÒªÌî³äÁË¡£

ÎÒÃǵĵÚÒ»¸öÂß¼­¾ÍÊǼÆËãxsµÄ¾ùÖµ£¬ÔÙ³ËÉÏysµÄ¾ùÖµ¡£¼ÌÐøÌî³äÎÒÃǵĿò¼Ü£º

def best_fit_slope(xs,ys):
m = (mean(xs) * mean(ys))
return m

ĿǰΪֹ»¹ºÜ¼òµ¥¡£Äã¿ÉÒÔ¶ÔÁÐ±í¡¢Ôª×é»òÕßÊý×éʹÓÃmeanº¯Êý¡£Òª×¢ÒâÎÒÕâÀïʹÓÃÁËÀ¨ºÅ¡£Python µÄ×ñÑ­ÔËËã·ûµÄÊýѧÓÅÏȼ¶¡£ËùÒÔÈç¹ûÄã´òË㱣֤˳Ðò£¬ÒªÏÔʽʹÓÃÀ¨ºÅ¡£Òª¼ÇסÄãµÄÔËËã¹æÔò¡£

ÏÂÃæÎÒÃÇÐèÒª½«Æä¼õÈ¥x*yµÄ¾ùÖµ¡£Õâ¼ÈÊÇÎÒÃǵľØÕóÔËËãmean(xs*ys)¡£ÏÖÔڵĴúÂëÊÇ£º

def best_fit_slope(xs,ys):
m = ( (mean(xs)*mean(ys)) - mean(xs*ys) )
return m

ÎÒÃÇÍê³ÉÁ˹«Ê½µÄ·Ö×Ó²¿·Ö£¬ÏÖÔÚÎÒÃǼÌÐø´¦ÀíµÄ·Öĸ£¬ÒÔxµÄ¾ùֵƽ·½¿ªÊ¼£º(mean(xs)*mean(xs))¡£Python Ö§³Ö** 2£¬Äܹ»´¦ÀíÎÒÃÇµÄ NumPy Êý×éµÄfloat64ÀàÐÍ¡£Ìí¼ÓÕâЩ¶«Î÷£º

def best_fit_slope(xs,ys):
m = ( ((mean(xs)*mean(ys)) - mean(xs*ys)) /
(mean(xs)**2))
return m

ËäÈ»¸ù¾ÝÔËËã·ûÓÅÏȼ¶£¬ÏòÕû¸ö±í´ïʽÌí¼ÓÀ¨ºÅÊDz»±ØÒªµÄ¡£ÎÒÕâÀïÕâÑù×ö£¬ËùÒÔÎÒ¿ÉÒÔÔÚ³ý·¨ºóÃæÌí¼ÓÒ»ÐУ¬Ê¹Õû¸öʽ×Ó¸ü¼ÓÒ×¶ÁºÍÒ×Àí½â¡£²»ÕâÑùµÄ»°£¬ÎÒÃÇ»áÔÚеÄÒ»Ðеõ½Óï·¨´íÎó¡£ÎÒÃǼ¸ºõÍê³ÉÁË£¬ÏÖÔÚÎÒÃÇÖ»ÐèÒª½«xµÄ¾ùֵƽ·½ºÍxµÄƽ·½¾ùÖµ£¨mean(xs*xs)£©Ïà¼õ¡£È«²¿´úÂëΪ£º

def best_fit_slope(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)**2) - mean(xs*xs)))
return m

ºÃµÄ£¬ÏÖÔÚÎÒÃǵÄÍêÕû½Å±¾Îª£º

from statistics import mean
import numpy as np

xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)

def best_fit_slope(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)**2) - mean(xs**2)))
return m

m = best_fit_slope(xs,ys)
print(m)
# 0.3

ÏÂÃæ¸Éʲô£¿ÎÒÃÇÐèÒª¼ÆËã×ݽؾàb¡£ÎÒÃÇ»áÔÚÏÂÒ»¸ö½Ì³ÌÖд¦ÀíËü£¬²¢Íê³ÉÍêÕûµÄ×î¼ÑÄâºÏÖ±Ïß¼ÆËã¡£Ëü±ÈбÂʸü¼ÑÒ×ÓÚ¼ÆË㣬³¢ÊÔ±àдÄã×Ô¼ºµÄº¯ÊýÀ´¼ÆËãËü¡£Èç¹ûÄã×öµ½ÁË£¬Ò²²»ÒªÌø¹ýÏÂÒ»¸ö½Ì³Ì£¬ÎÒÃÇ»á×öһЩ±ðµÄÊÂÇé¡£

¼ÆËã×ݽؾà

»¶Ó­ÔĶÁµÚ¾Åƪ½Ì³Ì¡£ÎÒÃǵ±Ç°ÕýÔÚΪ¸ø¶¨µÄÊý¾Ý¼¯£¬Ê¹Óà Python ¼ÆËã»Ø¹é»òÕß×î¼ÑÄâºÏÖ±Ïß¡£Ö®Ç°£¬ÎÒÃDZàдÁËÒ»¸öº¯ÊýÀ´¼ÆËãбÂÊ£¬ÏÖÔÚÎÒÃÇÐèÒª¼ÆËã×ݽؾࡣÎÒÃÇĿǰµÄ´úÂëÊÇ£º

from statistics import mean
import numpy as np

xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)

def best_fit_slope(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))
return m

m = best_fit_slope(xs,ys)
print(m)

Çë»ØÒ䣬×î¼ÑÄâºÏÖ±ÏßµÄ×ݽؾàÊÇ£º

Õâ¸ö±ÈбÂʼòµ¥¶àÁË¡£ÎÒÃÇ¿ÉÒÔ½«Æäдµ½Í¬Ò»¸öº¯ÊýÀ´½ÚÊ¡¼¸ÐдúÂë¡£ÎÒÃǽ«º¯ÊýÖØÃüÃûΪbest_fit_slope_and_intercept¡£

ÏÂÃæ£¬ÎÒÃÇ¿ÉÒÔÌî³äb = mean(ys) - (m*mean(xs))£¬²¢·µ»Øm, b£º

def best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))

b = mean(ys) - m*mean(xs)

return m, b

ÏÖÔÚÎÒÃÇ¿ÉÒÔµ÷ÓÃËü£º

best_fit_slope_and_intercept(xs,ys)

ÎÒÃÇĿǰΪֹµÄ´úÂ룺

from statistics import mean
import numpy as np

xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)

def best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))

b = mean(ys) - m*mean(xs)

return m, b

m, b = best_fit_slope_and_intercept(xs,ys)

print(m,b)
# 0.3, 4.3

ÏÖÔÚÎÒÃǽö½öÐèҪΪÊý¾Ý´´½¨Ò»ÌõÖ±Ïߣº

Òª¼Çסy=mx+b£¬ÎÒÃÇÄܹ»Îª´Ë±àдһ¸öº¯Êý£¬»òÕß½ö½öʹÓÃÒ»ÐеÄforÑ­»·¡£

regression_line = [(m*x)+b for x in xs]

ÉÏÃæµÄÒ»ÐÐforÑ­»·ºÍÕâ¸öÏàͬ£º

regression_line = []
for x in xs:
regression_line.append((m*x)+b)

ºÃµÄ£¬ÈÃÎÒÃÇÊÕÈ¡ÎÒÃǵÄÀͶ¯¹ûʵ°É¡£Ìí¼ÓÏÂÃæµÄµ¼È룺

import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')

ÎÒÃÇ¿ÉÒÔ»æÖÆÍ¼Ïñ£¬²¢ÇÒ²»»áÌØ±¸ÄÑ¿´¡£ÏÖÔÚ£º

plt.scatter(xs,ys,color='#003F72')
plt.plot(xs, regression_line)
plt.show()

Ê×ÏÈÎÒÃÇ»æÖÆÁËÏÖÓÐÊý¾ÝµÄÉ¢µãͼ£¬Ö®ºóÎÒÃÇ»æÖÆÁËÎÒÃǵĻعéÖ±Ïߣ¬Ö®ºóչʾËü¡£Èç¹ûÄã²»ÊìϤ£¬¿ÉÒԲ鿴 Matplotlib ½Ì³Ì¼¯¡£

Êä³ö£º

¹§Ï²¹§Ï²¡£ËùÒÔ£¬ÈçºÎ»ù´¡Õâ¸öÄ£ÐÍÀ´×öһЩʵ¼ÊµÄÔ¤²âÄØ£¿ºÜ¼òµ¥£¬ÄãÓµÓÐÁËÄ£ÐÍ£¬Ö»ÒªÌî³äx¾ÍÐÐÁË¡£ÀýÈ磬ÈÃÎÒÃÇÔ¤²âһЩµã£º

predict_x = 7

ÎÒÃÇÊäÈëÁËÊý¾Ý£¬Ò²¾ÍÊÇÎÒÃǵÄÌØÕ÷¡£ÄÇô±êÇ©ÄØ£¿

predict_y = (m*predict_x)+b
print(predict_y)
# 6.4

ÎÒÃÇÒ²¿ÉÒÔ»æÖÆËü£º

predict_x = 7
predict_y = (m*predict_x)+b

plt.scatter(xs,ys,color='#003F72',label='data')
plt.plot(xs, regression_line, label='regression line')
plt.legend(loc=4)
plt.show()

Êä³ö£º

ÎÒÃÇÏÖÔÚÖªµÀÁËÈçºÎ´´½¨×Ô¼ºµÄÄ£ÐÍ£¬ÕâºÜºÃ£¬µ«ÊÇÎÒÃÇÈÔ¾ÉȱÉÙÁËһЩ¶«Î÷£¬ÎÒÃǵÄÄ£ÐÍÓжྫȷ£¿Õâ¾ÍÊÇÏÂÒ»¸ö½Ì³ÌµÄ»°ÌâÁË¡£

R ƽ·½ºÍÅж¨ÏµÊýÔ­Àí

»¶Ó­ÔĶÁµÚʮƪ½Ì³Ì¡£ÎÒÃǸոÕÍê³ÉÁËÏßÐÔÄ£Ð͵Ĵ´½¨ºÍ´¦Àí£¬ÏÖÔÚÎÒÃÇºÃÆæ½ÓÏÂÀ´Òª¸Éʲô¡£ÏÖÔÚ£¬ÎÒÃÇ¿ÉÒÔÇáÒ×¹Û²ìÊý£¬²¢¾ö¶¨ÏßÐԻعéÄ£ÐÍÓжàô׼ȷ¡£µ«ÊÇ£¬Èç¹ûÄãµÄÏßÐԻعéÄ£ÐÍÊÇÄÃÉñ¾­ÍøÂçµÄ 20 ¸ö²ã¼¶×ö³öÀ´µÄÄØ£¿²»½ö½öÊÇÕâÑù£¬ÄãµÄÄ£ÐÍÒÔ²½Öè»òÕß´°¿Ú¹¤×÷£¬Ò²¾ÍÊÇÒ»¹² 5 °ÙÍò¸öÊý¾Ýµã£¬Ò»´ÎÖ»ÏÔʾ 100 ¸ö£¬»áÔõôÑù£¿ÄãÐèҪһЩ×Ô¶¯»¯µÄ·½Ê½À´ÅжÏÄãµÄ×î¼ÑÄâºÏÖ±ÏßÓжàºÃ¡£

»ØÒä֮ǰ£¬ÎÒÃÇչʾ¼¸¸ö»æÍ¼µÄʱºò£¬ÄãÒѾ­¿´µ½£¬×î¼ÑÄâºÏÖ±Ïߺû¹ÊDz»ºÃ¡£ÏñÕâÑù£º

ÓëÕâ¸öÏà±È£º

µÚ¶þÕÅͼƬÖУ¬µÄÈ·ÓÐ×î¼ÑÄâºÏÖ±Ïߣ¬µ«ÊÇûÓÐÈËÔÚÒâ¡£¼´Ê¹ÊÇ×î¼ÑÄâºÏÖ±ÏßÒ²ÊÇûÓÐÓõġ£²¢ÇÒ£¬ÎÒÃÇÏëÔÚ»¨·Ñ´óÁ¿¼ÆËãÄÜÁ¦Ö®Ç°¾ÍÖªµÀËü¡£

¼ì²éÎó²îµÄ±ê×¼·½Ê½¾ÍÊÇʹÓÃÆ½·½Îó²î¡£Äã¿ÉÄÜ֮ǰÌý˵¹ý£¬Õâ¸ö·½·¨½Ð×ö R ƽ·½»òÕßÅж¨ÏµÊý¡£Ê²Ã´½Ðƽ·½Îó²îÄØ£¿

»Ø¹éÖ±ÏߺÍÊý¾ÝµÄyÖµµÄ¾àÀ룬¾Í½Ð×öÎó²î£¬ÎÒÃǽ«Æäƽ·½¡£Ö±Ïߵį½·½Îó²îÊÇËüÃÇµÄÆ½¾ù»òÕߺ͡£ÎÒÃǼòµ¥ÇóºÍ°É¡£

ÎÒÃÇʵ¼ÊÉÏÒѾ­½â³ýÁËÆ½·½Îó²î¼ÙÉè¡£ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïß·½³Ì£¬ÓÃÓÚ¼ÆËã×î¼ÑÄâºÏ»Ø¹éÖ±Ïߣ¬¾ÍÊÇÖ¤Ã÷½á¹û¡£ÆäÖлعéÖ±Ïß¾ÍÊÇÓµÓÐ×îСƽ·½Îó²îµÄÖ±Ïߣ¨ËùÒÔËü²Å½Ð×ö×îС¶þ³Ë·¨£©¡£Äã¿ÉÒÔËÑË÷¡°»Ø¹éÖ¤Ã÷¡±£¬»òÕß¡°×î¼ÑÄâºÏÖ±ÏßÖ¤Ã÷¡±À´Àí½âËü¡£ËüºÜÒÖÓôÀí½â£¬µ«ÊÇÐèÒª´úÊý±äÐÎÄÜÁ¦À´µÃ³ö½á¹û¡£

ΪɶÊÇÆ½·½Îó²î£¿ÎªÊ²Ã´²»½ö½ö½«Æä¼ÓÆðÀ´£¿Ê×ÏÈ£¬ÎÒÃÇÏëÒªÒ»ÖÖ·½Ê½£¬½«Îó²î¹æ·¶»¯Îª¾àÀ룬ËùÒÔÎó²î¿ÉÄÜÊÇ -5£¬µ«ÊÇ£¬Æ½·½Ö®ºó£¬Ëü¾ÍÊÇÕýÊýÁË¡£ÁíÒ»¸öÔ­ÒòÊÇÒª½øÒ»²½³Í·£ÀëȺµã¡£½øÒ»²½µÄÒâ˼ÊÇ£¬ËüÓ°ÏìÎó²îµÄ³Ì¶È¸ü´ó¡£Õâ¾ÍÊÇÈËÃÇËùʹÓõıê×¼·½Ê½¡£ÄãÒ²¿ÉÒÔʹÓÃ4, 6, 8µÄÃÝ£¬»òÕ߯äËû¡£ÄãÒ²¿ÉÒÔ½ö½öʹÓÃÎó²îµÄ¾ø¶ÔÖµ¡£Èç¹ûÄãÖ»ÓÐÒ»¸öÌôÕ½£¬Ò²Ðí¾ÍÊÇ´æÔÚһЩÀëȺµã£¬µ«ÊÇÄã²¢²»´òËã¹ÜËüÃÇ£¬Äã¾Í¿ÉÒÔ¿¼ÂÇʹÓþø¶ÔÖµ¡£Èç¹ûÄã±È½ÏÔÚÒâÀëȺµã£¬Äã¾Í¿ÉÒÔʹÓøü¸ß½×µÄÖ¸Êý¡£ÎÒÃÇ»áʹÓÃÆ½·½£¬ÒòΪÕâÊÇ´ó¶àÊýÈËËùʹÓõġ£

ºÃµÄ£¬ËùÒÔÎÒÃǼÆËã»Ø¹éÖ±Ïߵį½·½Îó²î£¬Ê²Ã´¼ÆËãÄØ£¿ÕâÊÇʲôÒâ˼£¿Æ½·½Îó²îÍêÈ«ºÍÊý¾Ý¼¯Ïà¹Ø£¬ËùÒÔÎÒÃDz»ÔÙÐèÒª±ðµÄ¶«Î÷ÁË¡£Õâ¾ÍÊÇ R ƽ·½ÒýÈëµÄʱºòÁË£¬Ò²½Ð×÷Åж¨ÏµÊý¡£·½³ÌÊÇ£º

y_hat = x * m + b
r_sq = 1 - np.sum((y - y_hat) ** 2) / np.sum((y - y.mean()) ** 2)

Õâ¸ö·½³ÌµÄµÄ±¾ÖʾÍÊÇ£¬1 ¼õÈ¥»Ø¹éÖ±Ïߵį½·½Îó²î£¬±ÈÉÏ y ƽ¾ùÖ±Ïߵį½·½Îó²î¡£ y ƽ¾ùÖ±Ïß¾ÍÊÇÊý¾Ý¼¯ÖÐËùÓÐ y ÖµµÄ¾ùÖµ£¬Èç¹ûÄ㽫Æä»­³öÀ´£¬ËüÊÇÒ»¸öˮƽµÄÖ±Ïß¡£ËùÒÔ£¬ÎÒÃǼÆËã y ƽ¾ùÖ±Ïߣ¬ºÍ»Ø¹éÖ±Ïߵį½·½Îó²î¡£ÕâÀïµÄÄ¿±êÊÇʶ±ð£¬ÓëÇ·ÄâºÏµÄÖ±ÏßÏà±È£¬Êý¾ÝÌØÕ÷µÄ±ä»¯²úÉúÁ˶àÉÙÎó²î¡£

ËùÒÔÅж¨ÏµÊý¾ÍÊÇÉÏÃæÄǸö·½³Ì£¬ÈçºÎÅж¨ËüÊǺÃÊÇ»µ£¿ÎÒÃÇ¿´µ½ÁËËüÊÇ 1 ¼õȥһЩ¶«Î÷¡£Í¨³££¬ÔÚÊýѧÖУ¬Äã¿´µ½ËûµÄʱºò£¬Ëü·µ»ØÁËÒ»¸ö°Ù·Ö±È£¬ËüÊÇ 0 ~ 1 Ö®¼äµÄÊýÖµ¡£ÄãÈÏΪʲôÊÇºÃµÄ R ƽ·½»òÕßÅж¨ÏµÊýÄØ£¿ÈÃÎÒÃǼÙÉèÕâÀïµÄ R ƽ·½ÊÇ 0.8£¬ËüÊǺÃÊÇ»µÄØ£¿Ëü±È 0.3 ÊǺû¹ÊÇ»µ£¿¶ÔÓÚ 0.8 µÄ R ƽ·½£¬Õâ¾ÍÒâζׯعéÖ±Ïߵį½·½Îó²î£¬±ÈÉÏ y ¾ùÖµµÄƽ·½Îó²îÊÇ 2 ±È 10¡£Õâ¾ÍÊÇ˵»Ø¹éÖ±ÏßµÄÎó²î·Ç³£Ð¡ÓÚ y ¾ùÖµµÄÎó²î¡£ÌýÆðÀ´²»´í¡£ËùÒÔ 0.8 ·Ç³£ºÃ¡£

ÄÇôÓëÅж¨ÏµÊýµÄÖµ 0.3 Ïà±ÈÄØ£¿ÕâÀËüÒâζׯعéÖ±Ïߵį½·½Îó²î£¬±ÈÉÏ y ¾ùÖµµÄƽ·½Îó²îÊÇ 7 ±È 10¡£ÆäÖÐ 7 ±È 10 Òª»µÓÚ 2 ±È 10£¬7 ºÍ 2 ¶¼ÊǻعéÖ±Ïߵį½·½Îó²î¡£Òò´Ë£¬Ä¿±êÊǼÆËã R ƽ·½Öµ£¬»òÕß½Ð×öÅж¨ÏµÊý£¬Ê¹Æä¾¡Á¿½Ó½ü 1¡£

±à³Ì¼ÆËã R ƽ·½

»¶Ó­ÔĶÁµÚʮһƪ½Ì³Ì¡£¼ÈÈ»ÎÒÃÇÖªµÀÁËÎÒÃÇѰÕҵĶ«Î÷£¬ÈÃÎÒÃÇʵ¼ÊÔÚ Python ÖмÆËãËü°É¡£µÚÒ»²½¾ÍÊǼÆËãÆ½·½Îó²î¡£º¯Êý¿ÉÄÜÊÇÕâÑù£º

def squared_error(ys_orig,ys_line):
return sum((ys_line - ys_orig) * (ys_line - ys_orig))

ʹÓÃÉÏÃæµÄº¯Êý£¬ÎÒÃÇ¿ÉÒÔ¼ÆËã³öÈκÎʵÏÖµ½Êý¾ÝµãµÄƽ·½Îó²î¡£ËùÒÔÎÒÃÇ¿ÉÒÔ½«Õâ¸öÓï·¨ÓÃÓڻعéÖ±ÏßºÍ y ¾ùÖµÖ±Ïß¡£Ò²¾ÍÊÇ˵£¬Æ½·½Îó²îÖ»ÊÇÅж¨ÏµÊýµÄÒ»²¿·Ö£¬ËùÒÔÈÃÎÒÃǹ¹½¨ÄǸöº¯Êý°É¡£ÓÉÓÚÆ½·½Îó²îº¯ÊýÖ»ÓÐÒ»ÐУ¬Äã¿ÉÒÔÑ¡Ôñ½«ÆäǶÈëµ½Åж¨ÏµÊýº¯ÊýÖУ¬µ«ÊÇÆ½·½Îó²îÊÇÄãÔÚÕâ¸öº¯ÊýÖ®Íâ¼ÆËãµÄ¶«Î÷£¬ËùÒÔÎÒÑ¡Ôñ½«Æäµ¥¶Àд³ÉÒ»¸öº¯Êý¡£¶ÔÓÚ R ƽ·½£º

def coefficient_of_determination(ys_orig,ys_line):
y_mean_line = [mean(ys_orig) for y in ys_orig]
squared_error_regr = squared_error(ys_orig, ys_line)
squared_error_y_mean = squared_error(ys_orig, y_mean_line)
return 1 - (squared_error_regr/squared_error_y_mean)

ÎÒÃÇËù×öµÄÊÇ£¬¼ÆËã y ¾ùÖµÖ±Ïߣ¬Ê¹Óõ¥ÐеÄforÑ­»·£¨ÆäʵÊDz»±ØÒªµÄ£©¡£Ö®ºóÎÒÃǼÆËãÁË y ¾ùÖµµÄƽ·½Îó²î£¬ÒÔ¼°»Ø¹éÖ±Ïߵį½·½Îó²î£¬Ê¹ÓÃÉÏÃæµÄº¯Êý¡£ÏÖÔÚ£¬ÎÒÃÇÐèÒª×öµÄ¾ÍÊǼÆËã³ö R ƽ·½Ö®£¬Ëü½ö½öÊÇ 1 ¼õÈ¥»Ø¹éÖ±Ïߵį½·½Îó²î£¬³ýÒÔ y ¾ùÖµÖ±Ïߵį½·½Îó²î¡£ÎÒÃÇ·µ»Ø¸ÃÖµ£¬È»ºó¾ÍÍê³ÉÁË¡£×éºÏÆðÀ´²¢Ìø¹ý»æÍ¼²¿·Ö£¬´úÂëΪ£º

from statistics import mean
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')

xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)

def best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))
b = mean(ys) - m*mean(xs)
return m, b

def squared_error(ys_orig,ys_line):
return sum((ys_line - ys_orig) * (ys_line - ys_orig))

def coefficient_of_determination(ys_orig,ys_line):
y_mean_line = [mean(ys_orig) for y in ys_orig]
squared_error_regr = squared_error(ys_orig, ys_line)
squared_error_y_mean = squared_error(ys_orig, y_mean_line)
return 1 - (squared_error_regr/squared_error_y_mean)

m, b = best_fit_slope_and_intercept(xs,ys)
regression_line = [(m*x)+b for x in xs]

r_squared = coefficient_of_determination(ys,regression_line)
print(r_squared)
# 0.321428571429

##plt.scatter(xs,ys,color='#003F72',label='data')
##plt.plot(xs, regression_line, label='regression line')
##plt.legend(loc=4)
##plt.show()

ÕâÊǸöºÜµÍµÄÖµ£¬ËùÒÔ¸ù¾ÝÕâ¸ö¶ÈÁ¿£¬ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïß²¢²»ÊǺܺá£ÕâÀïµÄ R ƽ·½ÊǸöºÜºÃµÄ¶ÈÁ¿ÊÖ¶ÎÂ𣿿ÉÄÜÈ¡¾öÓÚÎÒÃǵÄÄ¿±ê¡£¶àÊýÇé¿öÏ£¬Èç¹ûÎÒÃǹØÐÄ׼ȷԤ²âδÀ´µÄÖµ£¬R ƽ·½µÄÈ·ºÜÓÐÓá£Èç¹ûÄã¶ÔÔ¤²â¶¯»ú»òÕßÇ÷ÊÆ¸ÐÐËȤ£¬ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïßʵ¼ÊÉÏÒѾ­ºÜºÃÁË¡£R ƽ·½²»Ó¦¸ÃÈç´ËÖØÒª¡£¿´Ò»¿´ÎÒÃÇʵ¼ÊµÄÊý¾Ý¼¯£¬ÎÒÃDZ»Ò»¸ö½ÏµÍµÄÊýÖµ¿¨×¡ÁË¡£ÖµÓëÖµÖ®¼äµÄ±ä»¯ÔÚijЩµãÉÏÊÇ 20% ~ 50%£¬ÕâÒѾ­·Ç³£¸ßÁË¡£ÎÒÃÇÍêÈ«²»Ó¦¸Ã¸Ðµ½ÒâÍ⣬ʹÓÃÕâ¸ö¼òµ¥µÄÊý¾Ý¼¯£¬ÎÒÃǵÄ×î¼ÑÄâºÏÖ±Ïß²¢²»ÄÜÃèÊöÕæÊµÊý¾Ý¡£

µ«ÊÇ£¬ÎÒÃǸղÅ˵µÄÊÇÒ»¸ö¼ÙÉè¡£ËäÈ»ÎÒÃÇÂß¼­ÉÏͳһÕâ¸ö¼ÙÉ裬ÎÒÃÇÐèÒªÌá³öÒ»¸öÐµķ½·¨£¬À´ÑéÖ¤¼ÙÉè¡£µ½Ä¿Ç°ÎªÖ¹µÄËã·¨·Ç³£»ù´¡£¬ÎÒÃÇÏÖÔÚÖ»ÄÜ×öºÜÉÙµÄÊÂÇ飬ËùÒÔûÓÐʲô¿Õ¼äÀ´¸Ä½øÎó²îÁË£¬µ«ÊÇÖ®ºó£¬Äã»áÔÚ¿Õ¼äÖ®ÉÏ·¢Ïֿռ䡣²»½ö½öÒª¿¼ÂÇËã·¨±¾ÉíµÄ²ã´Î¿Õ¼ä£¬»¹ÓÐÓɺܶàËã·¨²ã´Î×éºÏ¶ø³ÉµÄËã·¨¡£ÆäÖУ¬ÎÒÃÇÐèÒª²âÊÔËüÃÇÀ´È·±£ÎÒÃǵļÙÉ裬¹ØÓÚËã·¨ÊǸÉʲôÓõģ¬ÊÇÕýÈ·µÄ¡£¿¼ÂǰѲÙ×÷×é³É³Éº¯ÊýÓɶàô¼òµ¥£¬Ö®ºó£¬´ÓÕâÀ↑ʼ£¬½«Õû¸öÑéÖ¤·Ö½â³ÉÊýǧÐдúÂë¡£

ÎÒÃÇÔÚÏÂһƪ½Ì³ÌËù×öµÄÊÇ£¬¹¹½¨Ò»¸öÏà¶Ô¼òµ¥µÄÊý¾Ý¼¯Éú³ÉÆ÷£¬¸ù¾ÝÎÒÃǵIJÎÊýÀ´Éú³ÉÊý¾Ý¡£ÎÒÃÇ¿ÉÒÔʹÓÃËüÀ´°´ÕÕÒâÔ¸²Ù×÷Êý¾Ý£¬Ö®ºó¶ÔÕâЩÊý¾Ý¼¯²âÊÔÎÒÃǵÄËã·¨£¬¸ù¾ÝÎÒÃǵļÙÉèÐ޸IJÎÊý£¬Ó¦¸Ã»á²úÉúһЩӰÏì¡£ÎÒÃÇÖ®ºó¿ÉÒÔ½«ÎÒÃǵļÙÉèºÍÕæÊµÇé¿ö±È½Ï£¬²¢Ï£ÍûËûÃÇÆ¥Åä¡£ÕâÀïµÄÀý×ÓÖУ¬¼ÙÉèÊÇÎÒÃÇÕýÈ·±àдÕâЩËã·¨£¬²¢ÇÒÅж¨ÏµÊýµÍµÄÔ­ÒòÊÇ£¬y ÖµµÄ·½²îÌ«´óÁË¡£ÎÒÃÇ»áÔÚÏÂÒ»¸ö½Ì³ÌÖÐÑéÖ¤Õâ¸ö¼ÙÉè¡£

Ϊ²âÊÔ´´½¨ÑùÀýÊý¾Ý¼¯

»¶Ó­ÔĶÁµÚÊ®¶þƪ½Ì³Ì¡£ÎÒÃÇÒѾ­Á˽âÁ˻ع飬ÉõÖÁ±àдÁËÎÒÃÇ×Ô¼ºµÄ¼òµ¥ÏßÐԻعéËã·¨¡£²¢ÇÒ£¬ÎÒÃÇÒ²¹¹½¨ÁËÅж¨ÏµÊýËã·¨À´¼ì²é×î¼ÑÄâºÏÖ±ÏßµÄ׼ȷ¶ÈºÍ¿É¿¿ÐÔ¡£ÎÒÃÇ֮ǰÌÖÂÛºÍչʾ¹ý£¬×î¼ÑÄâºÏÖ±Ïß¿ÉÄܲ»ÊÇ×îºÃµÄÄâºÏ£¬Ò²½âÊÍÁËΪʲôÎÒÃǵÄʾÀý·½ÏòÉÏÊÇÕýÈ·µÄ£¬¼´Ê¹²¢²»×¼È·¡£µ«ÊÇÏÖÔÚ£¬ÎÒÃÇʹÓÃÁ½¸ö¶¥¼¶Ëã·¨£¬ËüÃÇÓÉһЩСÐÍËã·¨×é³É¡£Ëæ×ÅÎÒÃǼÌÐø¹¹ÔìÕâÖÖËã·¨²ã´Î£¬Èç¹ûËüÃÇÖ®ÖÐÓиöС´íÎó£¬ÎÒÃǾͻáÓöµ½Âé·³£¬ËùÒÔÎÒÃÇ´òËãÑéÖ¤ÎÒÃǵļÙÉè¡£

ÔÚ±à³ÌµÄÊÀ½çÖУ¬ÏµÍ³»¯µÄ³ÌÐò²âÊÔͨ³£½Ð×ö¡°µ¥Ôª²âÊÔ¡±¡£Õâ¾ÍÊÇ´óÐͳÌÐò¹¹½¨µÄ·½Ê½£¬Ã¿¸öСÐ͵Ä×Óϵͳ¶¼²»¶Ï¼ì²é¡£Ëæ×Å´óÐͳÌÐòµÄÉý¼¶ºÍ¸üУ¬¿ÉÒÔÇáÒ×ÒÆ³ýһЩºÍ֮ǰϵͳ³åÍ»µÄ¹¤¾ß¡£Ê¹ÓûúÆ÷ѧϰ£¬ÕâÒ²ÊǸöÎÊÌ⣬µ«ÊÇÎÒÃǵÄÖ÷Òª¹Ø×¢µã½ö½öÊDzâÊÔÎÒÃǵļÙÉè¡£×îºó£¬ÄãÓ¦¸Ã×ã¹»´ÏÃ÷£¬¿ÉÒÔΪÄãµÄÕû¸ö»úÆ÷ѧϰϵͳ´´½¨µ¥Ôª²âÊÔ£¬µ«ÊÇĿǰΪֹ£¬ÎÒÃÇÐèÒª¾¡¿ÉÄܼòµ¥¡£

ÎÒÃǵļÙÉèÊÇ£¬ÎÒÃÇ´´½¨ÁË×î¼úheÖ±Ïߣ¬Ö®ºóʹÓÃÅж¨ÏµÊý·¨À´²âÁ¿¡£ÎÒÃÇÖªµÀ£¨ÊýѧÉÏ£©£¬R ƽ·½µÄÖµÔ½µÍ£¬×î¼ÑÄâºÏÖ±Ïß¾ÍÔ½²»ºÃ£¬²¢ÇÒÔ½¸ß£¨½Ó½ü 1£©¾ÍÔ½ºÃ¡£ÎÒÃǵļÙÉèÊÇ£¬ÎÒÃǹ¹½¨ÁËÒ»¸öÕâÑù¹¤×÷µÄϵͳ£¬ÎÒÃǵÄϵͳÓÐÐí¶à²¿·Ö£¬¼´Ê¹ÊÇÒ»¸öСµÄ²Ù×÷´íÎó¶¼»á²úÉúºÜ´óµÄÂé·³¡£ÎÒÃÇÈçºÎ²âÊÔËã·¨µÄÐÐΪ£¬±£Ö¤Èκζ«Î÷¶¼Ô¤ÆÚ¹¤×÷ÄØ£¿

ÕâÀïµÄÀíÄîÊÇ´´½¨Ò»¸öÑùÀýÊý¾Ý¼¯£¬ÓÉÎÒÃǶ¨Ò壬Èç¹ûÎÒÃÇÓÐÒ»¸öÕýÏà¹ØµÄÊý¾Ý¼¯£¬Ïà¹ØÐԷdz£Ç¿£¬Èç¹ûÏà¹ØÐÔºÜÈõµÄ»°£¬µãÒ²²»ÊǺܽôÃÜ¡£ÎÒÃÇÓÃÑÛ¾¦ºÜÈÝÒׯÀ²âÕâ¸öÖ±Ïߣ¬µ«ÊÇ»úÆ÷Ó¦¸Ã×öµÃ¸üºÃ¡£ÈÃÎÒÃǹ¹½¨Ò»¸öϵͳ£¬Éú³ÉʾÀýÊý¾Ý£¬ÎÒÃÇ¿ÉÒÔµ÷ÕûÕâЩ²ÎÊý¡£

×ʼ£¬ÎÒÃǹ¹½¨Ò»¸ö¿ò¼Üº¯Êý£¬Ä£ÄâÎÒÃǵÄ×îÖÕÄ¿±ê£º

def create_dataset(hm,variance,step=2,correlation=False):

return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64)

ÎÒÃDz鿴º¯ÊýµÄ¿ªÍ·£¬Ëü½ÓÊÜÏÂÁвÎÊý£º

1.hm£¨how much£©£ºÕâÊÇÉú³É¶àÉÙ¸öÊý¾Ýµã¡£ÀýÈçÎÒÃÇ¿ÉÒÔÑ¡Ôñ 10£¬»òÕßһǧÍò¡£

2.variance£º¾ö¶¨Ã¿¸öÊý¾ÝµãºÍ֮ǰµÄÊý¾ÝµãÏà±È£¬Óжà´ó±ä»¯¡£±ä»¯Ô½´ó£¬¾ÍÔ½²»½ôÃÜ¡£

3.step£ºÃ¿¸öµã¾àÀë¾ùÖµÓжàÔ¶£¬Ä¬ÈÏΪ 2¡£

4.correlation£º¿ÉÒÔΪFalse¡¢pos»òÕßneg£¬¾ö¶¨²»Ïà¹Ø¡¢ÕýÏà¹ØºÍ¸ºÏà¹Ø¡£

ҪעÒ⣬ÎÒÃÇÒ²µ¼ÈëÁËrandom£¬Õâ»á°ïÖúÎÒÃÇÉú³É£¨Î±£©Ëæ»úÊý¾Ý¼¯¡£

ÏÖÔÚÎÒÃÇÒª¿ªÊ¼Ìî³äº¯ÊýÁË¡£

def create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y)

·Ç³£¼òµ¥£¬ÎÒÃǽö½öʹÓÃhm±äÁ¿£¬µü´úÎÒÃÇËùÑ¡µÄ·¶Î§£¬½«µ±Ç°Öµ¼ÓÉÏÒ»¸ö¸º²îÖµµ½Ö¤²îÖµµÄËæ»ú·¶Î§¡£Õâ»á²úÉúÊý¾Ý£¬µ«ÊÇÈç¹ûÎÒÃÇÏëÒªµÄ»°£¬ËüûÓÐÏà¹ØÐÔ¡£ÈÃÎÒÃÇÕâÑù£º

def create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y)
if correlation and correlation == 'pos':
val+=step
elif correlation and correlation == 'neg':
val-=step

·Ç³£°ôÁË£¬ÏÖÔÚÎÒÃǶ¨ÒåºÃÁË y Öµ¡£ÏÂÃæ£¬ÈÃÎÒÃÇ´´½¨ x£¬Ëü¸ü¼òµ¥£¬Ö»ÊÇ·µ»ØËùÓж«Î÷¡£

def create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y)
if correlation and correlation == 'pos':
val+=step
elif correlation and correlation == 'neg':
val-=step

xs = [i for i in range(len(ys))]

return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64)

ÎÒÃÇ×¼±¸ºÃÁË¡£ÎªÁË´´½¨ÑùÀýÊý¾Ý¼¯£¬ÎÒÃÇËùÐèµÄ¾ÍÊÇ£º

xs, ys = create_dataset(40,40,2,correlation='pos')

ÈÃÎÒÃǽ«Ö®Ç°ÏßÐԻعé½Ì³ÌµÄ´úÂë·Åµ½Ò»Æð£º

from statistics import mean
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')


def create_dataset(hm,variance,step=2,correlation=False):
val = 1
ys = []
for i in range(hm):
y = val + random.randrange(-variance,variance)
ys.append(y)
if correlation and correlation == 'pos':
val+=step
elif correlation and correlation == 'neg':
val-=step

xs = [i for i in range(len(ys))]

return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64)

def best_fit_slope_and_intercept(xs,ys):
m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
((mean(xs)*mean(xs)) - mean(xs*xs)))

b = mean(ys) - m*mean(xs)

return m, b


def coefficient_of_determination(ys_orig,ys_line):
y_mean_line = [mean(ys_orig) for y in ys_orig]

squared_error_regr = sum((ys_line - ys_orig) * (ys_line - ys_orig))
squared_error_y_mean = sum((y_mean_line - ys_orig) * (y_mean_line - ys_orig))

print(squared_error_regr)
print(squared_error_y_mean)

r_squared = 1 - (squared_error_regr/squared_error_y_mean)

return r_squared


xs, ys = create_dataset(40,40,2,correlation='pos')
m, b = best_fit_slope_and_intercept(xs,ys)
regression_line = [(m*x)+b for x in xs]
r_squared = coefficient_of_determination(ys,regression_line)
print(r_squared)

plt.scatter(xs,ys,color='#003F72', label = 'data')
plt.plot(xs, regression_line, label = 'regression line')
plt.legend(loc=4)
plt.show()

Ö´ÐдúÂ룬Äã»á¿´µ½£º

Åж¨ÏµÊýÊÇ 0.516508576011£¨Òª×¢ÒâÄãµÄ½á¹û²»»áÏàͬ£¬ÒòΪÎÒÃÇʹÓÃÁËËæ»úÊý·¶Î§£©¡£

²»´í£¬ËùÒÔÎÒÃǵļÙÉèÊÇ£¬Èç¹ûÎÒÃÇÉú³ÉÒ»¸ö¸ü¼Ó½ôÃÜÏà¹ØµÄÊý¾Ý¼¯£¬ÎÒÃÇµÄ R ƽ·½»òÅж¨ÏµÊýÓ¦¸Ã¸üºÃ¡£ÈçºÎʵÏÖËüÄØ£¿ºÜ¼òµ¥£¬°Ñ·¶Î§µ÷µÍ¡£

xs, ys = create_dataset(40,10,2,correlation='pos')

ÏÖÔÚÎÒÃÇµÄ R ƽ·½ÖµÎª 0.939865240568£¬·Ç³£²»´í£¬¾ÍÏñÔ¤ÆÚÒ»Ñù¡£ÈÃÎÒÃDzâÊÔ¸ºÏà¹Ø£º

xs, ys = create_dataset(40,10,2,correlation='neg')

R ƽ·½ÖµÊÇ 0.930242442156£¬¸ú֮ǰһÑùºÃ£¬ÓÉÓÚËüÃDzÎÊýÏàͬ£¬Ö»ÊÇ·½Ïò²»Í¬¡£

ÕâÀÎÒÃǵļÙÉè֤ʵÁË£º±ä»¯Ô½Ð¡ R ÖµºÍÅж¨ÏµÊýÔ½¸ß£¬±ä»¯Ô½´ó R ÖµÔ½µÍ¡£Èç¹ûÊDz»Ïà¹ØÄØ£¿Ó¦¸ÃºÜµÍ£¬½Ó½üÓÚ 0£¬³ý·ÇÎÒÃǵÄËæ»úÊýÅÅÁÐʵ¼ÊÉÏÓÐÏà¹ØÐÔ¡£ÈÃÎÒÃDzâÊÔ£º

xs, ys = create_dataset(40,10,2,correlation=False)

 

Åж¨ÏµÊýΪ 0.0152650900427¡£

ÏÖÔÚΪֹ£¬ÎÒ¾õµÃÎÒÃÇÓ¦¸Ã¸Ðµ½×ÔÐÅ£¬ÒòΪÊÂÇé¶¼·ûºÏÎÒÃǵÄÔ¤ÆÚ¡£

¼ÈÈ»ÎÒÃÇÒѾ­¶Ô¼òµ¥µÄÏßÐԻعéºÜÊìϤÁË£¬Ï¸ö½Ì³ÌÖÐÎÒÃÇ¿ªÊ¼½²½â·ÖÀà¡£

   
3771 ´Îä¯ÀÀ       30
Ïà¹ØÎÄÕÂ

ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖÓë̽ÌÖ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
Ïà¹ØÎĵµ

Android_UI¹Ù·½Éè¼Æ½Ì³Ì
ÊÖ»ú¿ª·¢Æ½Ì¨½éÉÜ
androidÅÄÕÕ¼°ÉÏ´«¹¦ÄÜ
Android½²ÒåÖÇÄÜÊÖ»ú¿ª·¢
Ïà¹Ø¿Î³Ì

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
Androidϵͳ¿ª·¢
AndroidÓ¦Óÿª·¢
ÊÖ»úÈí¼þ²âÊÔ