±à¼ÍƼö: |
À´Ô´ÓÚcnblogs£¬½éÉÜÁËÀûÓþö²ßÊ÷·ÖÀ࣬ÀûÓÃËæ»úÉÁÖÔ¤²â£¬
ÀûÓöÔÊý½øÐÐfit£¬ºÍexpº¯Êý»¹ÔµÈ¡£ |
|
·ÖÏí

֪ʶҪµã£º
lubridate°ü²ð½âʱ¼ä | POSIXlt
ÀûÓþö²ßÊ÷·ÖÀ࣬ÀûÓÃËæ»úÉÁÖÔ¤²â
ÀûÓöÔÊý½øÐÐfit£¬ºÍexpº¯Êý»¹Ô
ѵÁ·¼¯À´×ÔKaggle»ªÊ¢¶Ù×ÔÐгµ¹²Ïí¼Æ»®ÖеÄ×ÔÐгµ×âÁÞÊý¾Ý£¬·ÖÎö¹²Ïí×ÔÐгµÓëÌìÆø¡¢Ê±¼äµÈ¹ØÏµ¡£Êý¾Ý¼¯¹²11¸ö±äÁ¿£¬10000¶àÐÐÊý¾Ý¡£
https://www.kaggle.com/c/bike-sharing-demand
Ê×ÏÈ¿´Ò»Ï¹ٷ½¸ø³öµÄÊý¾Ý£¬Ò»¹²Á½¸ö±í¸ñ£¬¶¼ÊÇ2011-2012ÄêµÄÊý¾Ý£¬Çø±ðÊÇTestÎļþÊÇÿ¸öÔµÄÈÕÆÚ¶¼ÊÇÈ«µÄ£¬µ«ÊÇûÓÐ×¢²áÓû§ºÍËæÒâÓû§¡£¶øTrainÎļþÊÇÿ¸öÔÂÖ»ÓÐ1-20Ì죬µ«ÓÐÁ½ÀàÓû§µÄÊýÁ¿¡£
Çó½â£º²¹È«TrainÎļþÀï21-30ºÅµÄÓû§ÊýÁ¿¡£ÆÀ¼Û±ê×¼ÊÇÔ¤²âÓëÕæÊµÊýÁ¿µÄ±È½Ï¡£

1.png
Ê×ÏȼÓÔØÎļþºÍ°ü
library (lubridate)
library (randomForest)
library (readr)
setwd ("E:")
data <-read_csv ("train.csv")
head (data) |
ÕâÀïÎÒ¾ÍÓöµ½¿ÓÁË£¬ÓÃrÓïÑÔȱʡµÄread.csvËÀ»î¶Á²»³öÀ´ÕýÈ·µÄÎļþ¸ñʽ£¬»»³Éxlsx¸ü²Ò£¬ËùÓÐʱ¼ä¶¼±ä³É43045ÕâÑùµÄ¹ÖÊý×Ö¡£±¾À´Ö®Ç°ÊÔ¹ýas.Date¿ÉÒÔÕýȷת»»£¬µ«Õâ´ÎÒòΪÓÐʱ·ÖÃ룬¾ÍÖ»ÄÜÓÃʱ¼ä´Á£¬µ«½á¹ûÒ²²»ÐС£
×îºóÊÇÏÂÔØÁË"readr"°ü£¬ÓÃread_csvÓï¾ä£¬Ë³Àû½â¶Á¡£
ÒòΪtest±ÈtrainÈÕÆÚÍêÕû£¬µ«È±ÉÙÓû§Êý£¬ËùÒÔÒª°ÑtrainºÍtestºÏ²¢¡£
test$registered=0
test$casual=0
test$count=0
data<-rbind(train,test) |
ժȡʱ¼ä£º¿ÉÒÔÓÃʱ¼ä´Á£¬ÕâÀïµÄʱ¼ä±È½Ï¼òµ¥£¬¾ÍÊÇСʱÊý£¬ËùÒÔÒ²¿ÉÒÔÖ±½Ó½Ø×Ö·û´®¡£
data$hour1<-substr
(data$datetime,12,13)
table(data$hour1) |
ͳ¼ÆÒ»ÏÂÿ¸öСʱµÄʹÓÃ×ÜÊý£¬ÊÇÕâÑù£¨ÎªÊ²Ã´½éôÕûÆë£©£º

6-hour1.png
½ÓÏÂÀ´ÊÇÔËÓÃÏäÏßͼ£¬¿´Ò»ÏÂʹÓÃÕߺÍʱ¼ä£¬Öܼ¸ÕâЩµÄ¹ØÏµ¡£ÎªÊ²Ã´ÓÃÏäÏßͼ¶ø²»ÓÃhistÖ±·½Í¼£¬ÒòΪÏäÏßͼÓÐÀëÉ¢µã±í´ï£¬ÏÂÃæÒ²Òò´ËÔËÓöÔÊýÇófit
´ÓͼÖпÉÒÔ¿´³ö£¬ÔÚʱ¼ä·½Ã棬ע²áÓû§ºÍ·Ç×¢²áÓû§µÄʹÓÃʱ¼äÓкܴó²»Í¬¡£

5-hour-regestered.png

5-hour-casual.png

4-boxplot-day.png
½ÓÏÂÀ´ÓÃÏà¹ØÏµÊýcor¼ìÑéÓû§£¬Î¶ȣ¬Ìå¸Ðζȣ¬Êª¶È£¬·çËٵĹØÏµ¡£
Ïà¹ØÏµÊý£º±äÁ¿Ö®¼äµÄÏßÐÔ¹ØÁª¶ÈÁ¿£¬¼ìÑ鲻ͬÊý¾ÝµÄÏà¹Ø³Ì¶È¡£
ȡֵ·¶Î§[-1£¬1]£¬Ô½½Ó½ü0Ô½²»Ïà¹Ø¡£
´ÓÔËËã½á¹û¿ÉÒÔ¿´³ö£¬Ê¹ÓÃÈËȺÓë·çËٳʸºÏà¹Ø£¬±ÈζÈÓ°Ï컹´ó¡£

cor.png
½ÓÏÂÀ´¾ÍÊǽ«Ê±¼äµÈÒòËØÓþö²ßÊ÷·ÖÀ࣬ȻºóÓÃËæ»úÉÁÖÀ´Ô¤²â¡£Ëæ»úÉÁֺ;ö²ßÊ÷µÄËã·¨¡£ÌýÆðÀ´ºÜ¸ß´óÉÏ£¬ÆäʵÏÖÔÚÒ²ºÜ³£ÓÃÁË£¬ËùÒÔÒ»¶¨ÒªÑ§»á¡£
¾ö²ßÊ÷Ä£ÐÍÊÇ Ò»ÖÖ¼òµ¥Ò×ÓõķDzÎÊý·ÖÀàÆ÷¡£Ëü²»ÐèÒª¶ÔÊý¾ÝÓÐÈκεÄÏÈÑé¼ÙÉ裬¼ÆËãËٶȽϿ죬½á¹ûÈÝÒ×½âÊÍ£¬¶øÇÒÎȽ¡ÐÔÇ¿£¬²»ÅÂÔëÉùÊý¾ÝºÍȱʧÊý¾Ý¡£
¾ö²ßÊ÷Ä£Ð͵Ļù±¾¼Æ Ëã²½ÖèÈçÏ£ºÏÈ´Ón¸ö×Ô±äÁ¿ÖÐÌôѡһ¸ö£¬Ñ°ÕÒ×î¼Ñ·Ö¸îµã£¬½«Êý¾Ý»®·ÖΪÁ½×é¡£Õë¶Ô·Ö×éºóÊý¾Ý£¬½«ÉÏÊö²½ÖèÖØ¸´ÏÂÈ¥£¬Ö±µ½Âú×ãijÖÖÌõ¼þ¡£
ÔÚ¾ö²ßÊ÷½¨Ä£ÖÐÐèÒª½â¾öµÄÖØÒªÎÊÌâÓÐÈý¸ö£º
ÈçºÎÑ¡Ôñ×Ô±äÁ¿
ÈçºÎÑ¡Ôñ·Ö¸îµã
È·¶¨Í£Ö¹»®·ÖµÄÌõ¼þ
×ö³ö×¢²áÓû§ºÍСʱµÄ¾ö²ßÊ÷£¬
train$hour1<-as.integer(train$hour1)
d<-rpart (registered~hour1,data=train)
rpart .plot(d) > |

3-raprt-hour1.png
È»ºó¾ÍÊǸù¾Ý¾ö²ßÊ÷µÄ½á¹ûÊÖ¶¯·ÖÀ࣬ËùÒÔ»¹ÂúÕ¼´úÂëµÄ...
train$hour1
<-as.integer(train$hour1)
data$dp_reg=0
data$dp_reg[data$hour1 <7.5]=1
data$dp_reg[data$hour1> =22]=2
data$dp_reg[data$hour1 >=9.5 & data$hour1<18]=3
data$dp_reg[data$hour1> =7.5 & data$hour1<18]=4
data$dp_reg[data$hour1> =8.5 & data$hour1<18]=5
data$dp_reg[data$hour1> =20 & data$hour1<20]=6
data$dp_reg[data$hour1> =18 & data$hour1<20]=7 |
ͬÀí£¬×ö³ö £¨Ð¡Ê± | ζȣ© X £¨×¢²á | ËæÒâÓû§£© µÈ¾ö²ßÊ÷£¬¼ÌÐøÊÖ¶¯·ÖÀà....

3-raprt-temp.png
Äê·ÝÔ·ݣ¬ÖÜÄ©¼ÙÈÕµÈÊÖ¶¯·ÖÀà
data$year_part=0
data$month <-month(data$datatime)
data$year_part [data$year=='2011']=1
data$year_part [data$year=='2011' & data $month>3]
= 2
data$year_part[data$year=='2011' & data $month>6]
= 3
data$year_part[data$year=='2011' & data $month>9]
= 4
data$day_type=""
data$day _type [data$holiday ==0 & data$workingday==0]
="weekend"
data$day_type[data$holiday==1] ="holiday"
data$day_type[data$holiday ==0 & data$workingday==1]
="working day"
data$weekend=0
data$weekend [data$day= ="Sunday"|data$day=
=" Saturday "] =1 |
½ÓÏÂÀ´ÓÃËæ»úÉÁÖÓï¾äÔ¤²â
ÔÚ»úÆ÷ѧϰÖУ¬Ëæ»úÉÁÖÊÇÒ»¸ö°üº¬¶à¸ö¾ö²ßÊ÷µÄ·ÖÀàÆ÷£¬ ²¢ÇÒÆäÊä³öµÄÀà±ðÊÇÓɸö±ðÊ÷Êä³öµÄÀà±ðµÄÖÚÊý¶ø¶¨¡£
Ëæ»úÉÁÖÖеÄ×ÓÊ÷µÄÿһ¸ö·ÖÁѹý³Ì²¢Î´Óõ½ËùÓеĴýÑ¡ÌØÕ÷£¬¶øÊÇ´ÓËùÓеĴýÑ¡ÌØÕ÷ÖÐËæ»úѡȡһ¶¨µÄÌØÕ÷£¬ÔÙÔÚÆäÖÐѡȡ×îÓŵÄÌØÕ÷¡£ÕâÑù¾ö²ßÊ÷¶¼Äܹ»±Ë´Ë²»Í¬£¬ÌáÉýϵͳµÄ¶àÑùÐÔ£¬´Ó¶øÌáÉý·ÖÀàÐÔÄÜ¡£
ntreeÖ¸¶¨Ëæ»úÉÁÖËù°üº¬µÄ¾ö²ßÊ÷ÊýÄ¿£¬Ä¬ÈÏΪ500£¬Í¨³£ÔÚÐÔÄÜÔÊÐíµÄÇé¿öÏÂÔ½´óÔ½ºÃ£»
mtryÖ¸¶¨½ÚµãÖÐÓÃÓÚ¶þ²æÊ÷µÄ±äÁ¿¸öÊý£¬Ä¬ÈÏÇé¿öÏÂÊý¾Ý¼¯±äÁ¿¸öÊýµÄ¶þ´Î·½¸ù£¨·ÖÀàÄ£ÐÍ£©»òÈý·ÖÖ®Ò»£¨Ô¤²âÄ£ÐÍ£©¡£Ò»°ãÊÇÐèÒª½øÐÐÈËΪµÄÖð´ÎÌôÑ¡£¬È·¶¨×î¼ÑµÄmÖµ¡ªÕª×Ôdatacruiser±Ê¼Ç¡£ÕâÀïÎÒÖ÷Ҫѧϰ£¬ËùÒÔËäÈ»ÓÐ10000¶àÊý¾Ý¼¯£¬µ«Ò²Ö»¶¨ÁË500¡£¾ÍÕâ500ÎÒµÄСµçÄÔÒ²ÅÜÁ˰ëÌì¡£
train<-data
set.seed (1234)
train$logreg <-log(train$registered+1)
test$logcas <-log(train$casual+1)
fit1 <-randomForest (logreg~hour1+ workingday
+ day + holiday + day_ type+ temp_reg+ humidity
+ atemp + windspeed + season+ weather+ dp_ reg+
weekend + year+year _part ,train ,importance =
TRUE , ntree = 250)
pred1 <-predict (fit1,train)
train $logreg <-pred1 |
ÕâÀï²»ÖªµÀÔõô»ØÊ£¬ÎÒµÄdayºÍday_part¼Ó½øÈ¥¾Í±¨´í£¬Ö»ÓÐɾµôÕâÁ½¸ö±äÁ¿¼ÆË㣬»¹ÒªÑо¿ÐÞ²¹¡£
È»ºóÓÃexpº¯Êý»¹Ô
train$registered<-exp(train$logreg)-1
train$casual<-exp(train$logcas)-1
train$count<-test$casual+train$registered |
×îºó°Ñ20ÈÕºóµÄÈÕÆÚ½Ø³öÀ´£¬Ð´ÈëеÄcsvÎļþÉÏ´«¡£
train2<-train[as.integer
(day(data$datetime))> = 20 ,]
submit_final <-data.frame (datetime= test$
datetime ,count= test$count)
write.csv(submit _final,"submit_ final.csv
", row .names = F) |
|