ÎÒÃÇÔÚʵÑé»úÆ÷ѧϰË㷨ʱ£¬³£³£Óöµ½Ò»ÖÖÇé¿ö£ºÏàͬµÄËã·¨£¬ÏàͬµÄÊý¾Ý£¬µ«Ã¿´Î¼ÆËãµÃµ½µÄ½á¹û¶¼²»Í¬¡£ÕâÊÇÒòΪËã·¨ÖдæÔÚËæ»úµÄÒòËØ£¬µ¼ÖÂ×îÖյĽá¹û²»Îȶ¨¡£Òò´Ë£¬ÎªÁ˱ȽÏËæ»úËã·¨µÄÓÅÁÓ»òÊǼìÑé²ÎÊýµÄ×îÓŽ⣬ÎÒÃÇÐèÒª¶à´ÎÖØ¸´ÊµÑ飬ȡƽ¾ùÖµÀ´ºâÁ¿Ëã·¨¡£
ÄÇôÎÊÌâÀ´ÁË£¬¼ÙÉ賡¾°²»±ä£¬Ëæ»úË㷨ʵÑéÐèÒªÖØ¸´¶àÉٴβÅ×ãÒԿ͹۹«ÕýµØ·´Ó³Ä£Ð͵ÄЧ¹ûÄØ£¿
ÓÐЩÅóÓѽ¨ÒéÖÁÉÙÖØ¸´30´Î£¬ÉõÖÁ100´Î£¬¸üÓÐÉõÕßÖØ¸´ÉÏǧ´ÎµÄʵÑé¡£
ÔÚ±¾ÎÄÖУ¬ÎÒÃǽ«»áÓÃͳ¼ÆÑ§µÄ·½·¨À´½ÌÄãÈçºÎÕýÈ·µØ¹À¼ÆËæ»úË㷨ʵÑéµÄÖØ¸´´ÎÊý¡£±¾ÎÄËùÓдúÂëµÄÖ´Ðл·¾³¿ÉÒÔÊÇPython
2»òÕß3£¬²¢ÇÒ°²×°ÁËNumPy¡¢PandasºÍMatplotlib¡£
×¼±¸Êý¾Ý
¼ÙÉèÎÒÃÇÔÚÒ»×éѵÁ·Êý¾ÝÉÏÖØ¸´ÑµÁ·ÁË1000´Î½á¹¹ÏàͬµÄÉñ¾ÍøÂçÄ£ÐÍ»òÊÇÆäËüËæ»úËã·¨£¬²¢ÇҼǼģÐÍÔÚ²âÊÔ¼¯µÄRMSE¡£ÁíÍ⣬ÎÒÃǼÙÉèÊý¾ÝÊÇÕý̬·Ö²¼µÄ£¬ÕâÊÇ¿ªÕ¹ºóÐø·ÖÎöµÄ±ØÒªÌõ¼þ¡£
¼ÇµÃÿ´Î²é¿´Ô¤²â½á¹ûµÄ·Ö²¼£¬ÍùÍùÒ²ÊdzÊÕý̬·Ö²¼¡£ÕâÀïÎÒÃÇËæ»úÉú³ÉÒ»×é¾ùֵΪ60¡¢±ê×¼²îΪ10µÄÕý̬·Ö²¼Êý¾Ý¡£Éú³ÉÊý¾ÝµÄ´úÂëÈçÏÂͼËùʾ£¬²¢½«½á¹û±£´æÎªCSV¸ñʽµÄÎļþ£¬ÃüÃûΪresults.csv¡£
ÎÒÃÇÓÃseedº¯Êý×÷ÎªËæ»úÊýÉú³ÉÆ÷£¬ÒÔ±£Ö¤Ã¿´ÎÔËÐÐÕâ¶Î´úÂëʱµÃµ½µÄÊý¾Ý¶¼Ò»Ö¡£ÓÃnormal()º¯ÊýÉú³ÉÕý̬·Ö²¼Ëæ»úÊý£¬savetxt()º¯Êý±£´æ½á¹û¡£
from numpy.random import seed from numpy.random import normal from numpy import savetxt # define underlying distribution of results mean = 60 stev = 10 # generate samples from ideal distribution seed(1) results = normal(mean, stev, 1000) # save to ASCII file savetxt('results.csv', results) |
ÔËÐÐÕâ¶Î´úÂ룬ÎÒÃÇ»áµÃµ½°üº¬1000¸öËæ»úÊýµÄÎļþ£¬Ä£ÄâËæ»úËã·¨ÖØ¸´ÔËÐеĽá¹û¡£ÏÂͼÊǸÃÎļþ×îºóÊ®ÐС£
...
6.160564991742511864e+01
5.879850024371251038e+01
6.385602292344325548e+01
6.718290735754342791e+01
7.291188902850875309e+01
5.883555851728335995e+01
3.722702003339634302e+01
5.930375460544870947e+01
6.353870426882840405e+01
5.813044983467250404e+01 |
»ù±¾·ÖÎö
Ê×ÏÈ£¬ÎÒÃǶÔÉÏÒ»²½µÃµ½µÄ½á¹û¼òµ¥µØ×öÒ»¸öͳ¼Æ·ÖÎö¡£
»ù±¾µÄͳ¼Æ·ÖÎöÓÐÈýÖÖ³£Ó÷½·¨£º
¼ÆËãͳ¼ÆÐÅÏ¢£¬±ÈÈç¾ùÖµ¡¢±ê×¼²î¡¢°Ù·ÖλµÈµÈ£»
¶ÔÊý¾Ý»æÖÆÏäÐÎͼ»òÕߣ»
»æÖÆÊý¾ÝµÄÖ±·½Í¼·Ö²¼¡£
ÏÂÃæµÄ´úÂëÓÃÀ´ÊµÏÖ»ù±¾·ÖÎöµÄ¹¦ÄÜ¡£Ê×ÏȼÓÔØresults.csvÎļþ£¬È»ºó¼ÆËãͳ¼ÆÐÅÏ¢ºÍ»æÖÆÍ¼ÐΡ£
from
pandas import DataFrame
from pandas import read_csv
from numpy import mean
from numpy import std
from matplotlib import pyplot
# load results file
results = read_csv('results.csv', header=None)
# descriptive stats
print(results.describe())
# box and whisker plot
results.boxplot()
pyplot.show()
# histogram
results.hist()
pyplot.show() |
ÉÏÊöÑù±¾µÄͳ¼ÆÁ¿ÈçÏÂͼËùʾ£¬Ëã·¨µÄƽ¾ùÐÔÄÜΪ60.3£¬±ê×¼²îΪ9.8¡£Èç¹ûÎÒÃǼÙÉèÕâ¸ö·ÖÖµ±íʾµÄÊÇijÖÖÎó²î£¬ÀýÈçRMSE£¬ÄÇô×î²îµÄÐÔÄÜ»á´ïµ½99.5£¬¶ø×îºÃµÄÇé¿öÊÇ29.4¡£
count
1000.000000
mean 60.388125
std 9.814950
min 29.462356
25% 53.998396
50% 60.412926
75% 67.039989
max 99.586027 |
ÏÂͼËùʾµÄÏäÐÎͼչʾÁËÊý¾ÝµÄ·Ö²¼£¬ÆäÖÐÏä×Ó²¿·ÖÊÇÖжÎ50%µÄÑù±¾£¬Ôµã±íʾÒì³£Öµ£¬ÂÌÏß±íʾÖÐλÊýµÄÖµ¡£

ÏÂͼÊÇÊý¾ÝµÄÖ±·½Í¼·Ö²¼£¬ÕûÌåÇ÷ÊÆ·ûºÏÕý̬·Ö²¼£¬¾ùÖµÂäÔÚ60¸½½ü¡£

ÖØ¸´´ÎÊýµÄÓ°Ïì
ÎÒÃÇ×ܹ²Î±ÔìÁË1000¸öÊý¾Ý¡£ÄÇô1000´Î¾¿¾¹ÊÇÒѾ×ã¹»ÎÒÃÇ×ö³ö׼ȷµÄ¾ö²ßÄØ£¬»¹ÊÇÔ¶²»×ãËùÐèµÄʵÑéÖØ¸´´ÎÊý£¿ÎÒÃǸÃÔõôÅжϣ¿
Ê×ÏÈ£¬ÎÒÃÇ¿ÉÒÔ»æÖÆÊµÑéÖØ¸´´ÎÊýÓë·ÖÖµ¾ùÖµµÄº¯Êý¡£ÆÚ³õ£¬¾ùÖµµÄ²¨¶¯·ù¶ÈÔ¤¼Æ½Ï´ó¡£Ëæ×ÅÖØ¸´´ÎÊýÔö³¤£¬ÎÒÃÇÔ¤ÆÚ¾ùÖµÒ²½«ºÜ¿ìÊÕÁ²µ½ÆÚÍûÖµ¸½½ü¡£
from
pandas import DataFrame
from pandas import read_csv
from numpy import mean
from matplotlib import pyplot
import numpy
# load results file
results = read_csv('results.csv', header=None)
values = results.values
# collect cumulative stats
means = list()
for i in range(1,len(values)+1):
data = values[0:i, 0]
mean_rmse = mean(data)
means.append(mean_rmse)
# line plot of cumulative values
pyplot.plot(means)
pyplot.show() |
Ö´ÐÐÉÏÃæÕâ¶Î´úÂ룬¿ÉÒԵõ½ÏÂͼ¡£ÈçͼËùʾ£¬Öظ´´ÎÊýÔÚ200´ÎÒÔÄÚʱ£¬ÇúÏß²¨¶¯½Ï´ó£»µ±ÊµÑ鳬¹ý600´ÎÖ®ºó£¬¾ùÖµ¼¸ºõÇ÷ÓÚÎȶ¨¡£

½ÓÏÂÀ´£¬ÎÒÃÇֻȡǰ500´ÎʵÑé½á¹û»æÖÆÍ¼ÐΣ¬²¢½«×îÖյį½¾ù½á¹ûÒ²ÓóÈÉ«Ïß»æÖƵ½Í¬Ò»ÕÅͼÉÏ¡£ÏÂÃæÊÇ´úÂëºÍչʾͼÐΡ£
from
pandas import DataFrame
from pandas import read_csv
from numpy import mean
from matplotlib import pyplot
import numpy
# load results file
results = read_csv('results.csv', header=None)
values = results.values
final_mean = mean(values)
# collect cumulative stats
means = list()
for i in range(1,501):
data = values[0:i, 0]
mean_rmse = mean(data)
means.append(mean_rmse)
# line plot of cumulative values
pyplot.plot(means)
pyplot.plot([final_mean for x in range(len(means))])
pyplot.show() |

¿É¼û£¬µ±Öظ´µ½100´Îʱ£¬½á¹ûÒѾ½Ó½üÆÚÍûÖµ¡£µ±Öظ´400´Îʱ£¬½á¹û¸ü¼Ó½Ó½üÆÚÍûÖµ£¬µ«ÊÇÌáÉýµÄ±ÈÀý²»¶à¡£
ÒÔÉÏÖ»ÊǶ¨ÐÔ·ÖÎöÁËʵÑéÖØ¸´´ÎÊý¶Ô¾ö²ßÅжϵÄÓ°Ï죬ÊÇ·ñÓиüºÏÀíµÄ·½·¨ÄØ£¿
¼ÆËã±ê×¼Îó²î
±ê×¼Îó²î£¨ standard error £©ÊÇÑù±¾Í³¼ÆÁ¿µÄ±ê×¼²î£¬ÌåÏÖÑù±¾¾ùÖµÓë×ÜÌå¾ùÖµµÄÆ«²î·¶Î§¡£±ê×¼Îó²îÓë±ê×¼²î²»Í¬¡£±ê×¼²îÊÇÀë¾ù²îƽ·½µÄËãÊõƽ¾ùÊýµÄƽ·½¸ù£¬·´Ó³Ò»¸öÊý¾Ý¼¯µÄÀëÉ¢³Ì¶È¡£
±ê×¼Îó²îÒ»°ãÓÃÀ´Åж¨¸Ã×é²âÁ¿Êý¾ÝµÄ¿É¿¿ÐÔ£¬ÔÚÊýѧÉÏËüµÄÖµµÈÓÚ²âÁ¿ÖµÎó²îµÄƽ·½ºÍµÄƽ¾ùÖµµÄƽ·½¸ù¡£ÓÉÓÚÔÚ²âÁ¿ÖеĴý²âÎïÌåµÄÕæÖµºÜÄѵõ½¡£Òò´ËÎÒÃÇÔÚʵ¼ÊµÄ¼ÆËãÖÐ,Óñê×¼Îó²î¹ÀËãÖµ´úÌæÊµ¼ÊÎó²î¡£
ÎÒÃÇÆÚÍûËæ×ÅʵÑé´ÎÊýµÄÔö¼Ó£¬±ê×¼Îó²îÖð½¥¼õС¡£
ÏÂÃæµÄ´úÂë¼ÆËãÁËÿ´ÎÖØ¸´ÊµÑéÖ®ºóµÄ±ê×¼Îó²î¡£
from
pandas import read_csv
from numpy import std
from numpy import mean
from matplotlib import pyplot
from math import sqrt
# load results file
results = read_csv('results.csv', header=None)
values = results.values
# collect cumulative stats
std_errors = list()
for i in range(1,len(values)+1):
data = values[0:i, 0]
stderr = std(data) / sqrt(len(data))
std_errors.append(stderr)
# line plot of cumulative values
pyplot.plot(std_errors)
pyplot.show() |
ºá×ø±êÊÇʵÑéÖØ¸´´ÎÊý£¬×Ý×ø±ê±íʾ±ê×¼Îó²î¡£ÈçÎÒÃÇÔ¤ÆÚ£¬Ëæ×ÅʵÑéÖØ¸´´ÎÊýÔö¼Ó£¬±ê×¼Îó²îÖð½¥¼õС¡£ÎÒÃÇ»¹ÄÜ·¢ÏÖ£¬±ê×¼Îó²îϽµµ½Ò»¶¨³Ì¶ÈÖ®ºó£¬Ï½µÇ÷ÊÆ±äµÃ·Ç³£»ºÂý£¬Õâ³Æ×÷¿É½ÓÊÜÎó²î£¬´óÔ¼ÔÚ1~2¸öµ¥Î»Á¿¡£

ÎÒÃÇÔÚÉÏͼÖÐÔÚÌí¼ÓÁ½Ìõ¸¨ÖúÏߣ¬·Ö±ð±êʶ±ê×¼Îó²îÔÚ0.5ºÍ1µÄÇé¿ö¡£´úÂëÈçÏÂͼËùʾ¡£
from
pandas import read_csv
from numpy import std
from numpy import mean
from matplotlib import pyplot
from math import sqrt
# load results file
results = read_csv('results.csv', header=None)
values = results.values
# collect cumulative stats
std_errors = list()
for i in range(1,len(values)+1):
data = values[0:i, 0]
stderr = std(data) / sqrt(len(data))
std_errors.append(stderr)
# line plot of cumulative values
pyplot.plot(std_errors)
pyplot.plot([0.5 for x in range(len(std_errors))],
color='red')
pyplot.plot([1 for x in range(len(std_errors))],
color='red')
pyplot.show() |
Èô±ê×¼Îó²îµÍÓÚ1ÔڿɽÓÊܵķ¶Î§£¬ÄÇô´óÔ¼ÖØ¸´100´ÎʵÑé¾Í¹»ÁË¡£Èô±ê×¼Îó²îµÍÓÚ0.5²ÅÄܽÓÊÜ£¬ÄÇô´óÔ¼ÐèÒªÖØ¸´300~350´ÎʵÑé¡£
ÔÙÇ¿µ÷Ò»±é£¬±ê×¼Îó²îÊǺâÁ¿ÔÚÄ£ÐÍÅäÖòÎÊýºÍËæ»ú³õʼÌõ¼þ²»±äµÄǰÌáÏ£¬Ä£ÐÍЧ¹ûµÄÑù±¾¾ùÖµÓëÕûÌå¾ùÖµµÄÆ«²î·¶Î§¡£

ÎÒÃÇÒ²¿ÉÒ԰ѱê×¼Îó²îµ±×öÄ£ÐÍÆ½¾ùЧ¹ûµÄÖÃÐÅÇø¼ä¡£±ÈÈ磬ÈôÖÃÐŶÈΪ95%£¬ÖÃÐÅÇø¼äµÄÉÏϽç¿ÉÒÔ±íʾΪ£º
Ñù±¾¾ùÖµ +/- £¨ ±ê×¼Îó²î * 1.96 £©
ÓÃÏÂÃæÕâ¶Î´úÂëÖØÐ»æÖÆ´øÓÐÖÃÐÅÇø¼äµÄÑù±¾¾ùÖµ¡£
from
pandas import read_csv
from numpy import std
from numpy import mean
from matplotlib import pyplot
from math import sqrt
# load results file
results = read_csv('results.csv', header=None)
values = results.values
# collect cumulative stats
means, confidence = list(), list()
n = len(values) + 1
for i in range(20,n):
data = values[0:i, 0]
mean_rmse = mean(data)
stderr = std(data) / sqrt(len(data))
conf = stderr * 1.96
means.append(mean_rmse)
confidence.append(conf)
# line plot of cumulative values
pyplot.errorbar(range(20, n), means, yerr=confidence)
pyplot.plot(range(20, n), [60 for x in range(len(means))],
color='red')
pyplot.show() |
½á¹ûÈçÏÂͼËùʾ¡£ÆäÖкìÏß±íʾ×ÜÌåµÄ¾ùÖµ¡£Í¨¹ý¹Û²ì¿ÉÒÔ·¢ÏÖ£¬¾¡¹ÜÑù±¾¾ùÖµ¸ß¹ÀÁË×ÜÌå¾ùÖµ£¬µ«ÊÇ×ÜÌå¾ùÖµ»¹ÊÇÂäÔÚÁËÖÃÐŶÈΪ95%µÄÖÃÐÅÇø¼äÖ®ÄÚ¡£95%ÖÃÐŶȵĺ¬ÒåÊÇÈôÑù±¾ÊýÄ¿²»±äµÄÇé¿öÏ£¬×ö100´ÎʵÑ飬ÓÐ95¸öÖÃÐÅÇø¼ä°üº¬ÁË×ÜÌå¾ùÖµµÄÕæÖµ£¬Ê£Óà5¸öÖÃÐÅÇø¼äûÓаüÀ¨¡£
ÈçͼËùʾ£¬Ëæ×ÅʵÑéµÄÖØ¸´´ÎÊýÔö¶à£¬ÖÃÐÅÇø¼äµÄ·¶Î§Öð½¥ËõС£¬µ±Öظ´´ÎÊý³¬¹ý500´ÎÖ®ºó£¬¼ÌÐøÖØ¸´ÊµÑé¶ÔЧ¹ûµÄÌáÉý²¢²»Ã÷ÏÔ¡£

Èô°Ñ20~200´ÎµÄÇø¼ä·Å´ó»æÖÆ£¬Ç÷ÊÆ»á¿´µÄ¸ü¼ÓÃ÷ÏÔ¡£
from
pandas import read_csv
from numpy import std
from numpy import mean
from matplotlib import pyplot
from math import sqrt
# load results file
results = read_csv('results.csv', header=None)
values = results.values
# collect cumulative stats
means, confidence = list(), list()
n = 200 + 1
for i in range(20,n):
data = values[0:i, 0]
mean_rmse = mean(data)
stderr = std(data) / sqrt(len(data))
conf = stderr * 1.96
means.append(mean_rmse)
confidence.append(conf)
# line plot of cumulative values
pyplot.errorbar(range(20, n), means, yerr=confidence)
pyplot.plot(range(20, n), [60 for x in range(len(means))],
color='red')
pyplot.show() |

С½á
ͨ¹ýÔĶÁ±¾ÎÄ£¬ÎÒÃÇÁоÙÁ˼¸ÖÖÑ¡ÔñËæ»úË㷨ʵÑéÖØ¸´´ÎÊýµÄ·½·¨¡£
¼òµ¥µØ³¢ÊÔÖØ¸´30´Î¡¢100´Î»òÕß1000´ÎµÈµÈ£»
»æÖÆÑù±¾¾ùÖµÓëÖØ¸´´ÎÊýµÄ¹ØÏµÍ¼£¬²¢¸ù¾Ý¹ÕµãÑ¡Ôñ£»
»æÖƱê×¼Îó²îÓëÖØ¸´´ÎÊýµÄ¹ØÏµÍ¼£¬²¢¸ù¾ÝÎó²îãÐֵѡÔñ£»
»æÖÆÖÃÐÅÇø¼äÓëÖØ¸´´ÎÊýµÄ¹ØÏµÍ¼£¬²¢¸ù¾ÝÎó²îµÄ·Ö²¼Ñ¡Ôñ¡£ |