±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚibm£¬ÎÄÕ½éÉÜÈçºÎ°²×°ºÍÉèÖÃ
MLflow »·¾³£¬ÔÚ R ÖÐѵÁ·ºÍ¸ú×Ù»úÆ÷ѧϰģÐÍ£¬½«Ô´´úÂëºÍÊý¾Ý·â×°ÔÚ
MLproject ÖУ¬²¢Ê¹Óà mlflow run ÃüÁîÔËÐÐÏîÄ¿µÈ¡£ |
|
ÔÚ±¾ÎÄÕÂÖУ¬ÎÒ»á¼òÒªµØ½éÉÜ MLflow ¼°Æä¹¤×÷·½Ê½¡£MLflow
ĿǰÌṩÁË Python ÖÐµÄ API£¬Äú¿ÉÒÔÔÚ»úÆ÷ѧϰԴ´úÂëÖе÷ÓÃÕâЩ API À´¼Ç¼ MLflow
¸ú×Ù·þÎñÆ÷Òª¸ú×ٵIJÎÊý¡¢Ö¸±êºÍ¹¤¼þ¡£
Èç¹ûÄúÊìϤ»úÆ÷ѧϰ²Ù×÷²¢ÔÚ R ÖÐÖ´ÐÐÁËÕâЩ²Ù×÷£¬ÄÇô¿ÉÄÜÏëҪʹÓà MLflow À´¸ú×ÙÄ£ÐͺÍÿ´ÎÔËÐС£Äú¿ÉÒÔʹÓÃÒÔϼ¸ÖÖ·½·¨£º
µÈ´ý MLflow ·¢²¼ R ÖÐµÄ API
·â×° MLflow RESTful API ²¢Í¨¹ý curl ÃüÁî½øÐмǼ
ʹÓÃһЩ¿Éµ÷Óà Python ½âÊÍÆ÷µÄ R °üÀ´µ÷ÓÃÏÖÓÐµÄ Python API
×îºóÒ»ÖÖ·½·¨¼òµ¥Ò×ÐУ¬²¢ÔÊÐíÄúÓë MLflow ½øÐн»»¥£¬¶øÎÞÐèµÈ´ýÌṩ R µÄ API¡£ÔÚ±¾½Ì³ÌÖУ¬ÎÒ½«ËµÃ÷ÈçºÎʹÓÃ
reticulate R °üÀ´Ö´Ðд˲Ù×÷¡£
reticulate ÊÇÒ»¸ö¿ªÔ´ R °ü£¬ËüÔÊÐíͨ¹ýÔÚ R »á»°ÖÐǶÈë Python »á»°À´´Ó R
Öе÷Óà Python¡£¸Ã°üÔÚ R Óë Python Ö®¼äÌṩÎÞ·ìµÄ¸ßÐÔÄÜ»¥²Ù×÷ÐÔ¡£ÔÚ CRAN ´æ´¢¿âÖÐÌṩÁ˸ðü¡£
MLflow »¹Ë渽ÁË Projects ×é¼þ£¬¸Ã×é¼þ»á½«Êý¾Ý¡¢Ô´´úÂë¼°ÃüÁî¡¢²ÎÊýºÍÖ´Ðл·¾³ÉèÖÃÒ»Æð´ò°üΪһ¸ö¶ÀÁ¢¹æ·¶¡£ÔÚ¶¨Òå
MLproject ºó£¬¿ÉÒÔÔÚÈκεط½ÔËÐдËÏîÄ¿¡£Ä¿Ç°£¬MLproject ¿ÉÒÔÔËÐÐ Python
´úÂë»ò shell ÃüÁî¡£Ëü»¹¿ÉÒÔΪÓû§¶¨ÒåµÄ conda.yaml ÎļþÖÐÖ¸¶¨µÄÏîÄ¿ÉèÖà Python
»·¾³¡£
¶ÔÓÚ R Óû§£¬Í¨³£»áÔÚ R Ô´´úÂëÖе¼ÈëһЩ°ü¡£±ØÐë°²×°ÕâЩ°ü²ÅÄÜÔËÐÐ R ´úÂ롣δÀ´¿ÉÒÔ×ö³öµÄÒ»ÏîÓÅ»¯ÊÇ£¬ÔÚ
MLflow ÖÐÌí¼ÓÀàËÆÓÚ conda.yaml µÄ¹¦ÄÜÀ´ÉèÖà R °üÒÀÀµÏî¡£±¾½Ì³Ì½éÉÜÁËÈçºÎ´´½¨°üº¬
R Ô´´úÂëµÄ MLproject£¬ÒÔ¼°ÈçºÎʹÓà mlflow run ÃüÁîÔËÐдËÏîÄ¿¡£
ѧϰĿ±ê
ÔÚ±¾½Ì³ÌÖУ¬Äú½«°²×°ºÍÉèÖà MLflow »·¾³£¬ÔÚ R ÖÐѵÁ·ºÍ¸ú×Ù»úÆ÷ѧϰģÐÍ£¬½«Ô´´úÂëºÍÊý¾Ý·â×°ÔÚ
MLproject ÖУ¬²¢Ê¹Óà mlflow run ÃüÁîÔËÐдËÏîÄ¿¡£
ǰÌáÌõ¼þ
ÔÚ¿ªÊ¼±¾½Ì³Ì֮ǰ£¬Ó¦¸ÃÏÈÔÚÔËÐÐ R µÄƽ̨Éϰ²×° Python¡£ÎÒÊ×Ñ¡°²×°
miniconda¡£ÓÉÓÚ½«ÔÚ R ÖÐÍê³É»úÆ÷ѧϰѵÁ·£¬Òò´ËÒ²Ó¦¸ÃÔÚÆ½Ì¨Éϰ²×°ÁË R¡£
²½Öè
µÚ 1 ²½£º°²×° MLflow
Ϊ MLflow ´´½¨ virtualenv£¬²¢°´ÈçÏ·½Ê½°²×° mlflow °ü£¨Ê¹Óà conda£©£º
conda create
-q -n mlflow python=3.6
source activate mlflow
pip install -U pip
pip install mlflo |
µÚ 2 ²½£º°²×° reticulate R °ü
ͨ¹ý R °²×° reticulate °ü¡£
install.packages("reticulate") |
reticulate ÔÊÐí R ÎÞ·ìµ÷Óà Python º¯Êý¡£Í¨¹ý
import Óï¾ä×°Èë Python °ü¡£Í¨¹ý $ ÔËËã·ûµ÷Óú¯Êý¡£
> library(reticulate)
> path <- import("os.path")
> path$isdir("/tmp")
[1] TRUE |
ÕýÈçÄúËù¼û£¬Ê¹Óô˰ü´Ó R Öе÷Óà os.path Ä£¿éÖÐµÄ Python º¯ÊýÊ®·Ö¼òµ¥¡£Í¨¹ýµ¼Èë
mlflow °ü£¬È»ºóµ÷Óà mlflow$log_param ºÍ mlflow$log_metric
ÒԼǼ R ½Å±¾µÄ²ÎÊýºÍÖ¸±ê£¬¿ÉÒÔ¶Ô mlflow °üÖ´ÐÐÏàͬµÄ²Ù×÷¡£
µÚ 3 ²½£ºÊ¹Óà SparkR ѵÁ· GLM Ä£ÐÍ
ÒÔÏ R ½Å±¾Ê¹Óà SparkR ¹¹½¨ÏßÐԻعéÄ£ÐÍ¡£¶ÔÓÚ´ËʾÀý£¬±ØÐëÒѰ²×°
SparkR °ü¡£
# load the reticulate
package and import mlflow Python module
library(reticulate)
mlflow <- import("mlflow")
# load SparkR package and start spark session
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"),
"R", "lib")))
sparkR.session(master="local[*]")
# convert iris data.frame to SparkDataFrame
df <- as.DataFrame(iris)
# parameter for GLM
family <- c("gaussian")
# log the parameter
mlflow$log_param("family", family)
# fit the GLM model
model <- spark.glm(df, Species ~ ., family
= family)
# exam the model
summary(model)
# path to save the model
model_path <- "/tmp/mlflow-GLM"
# save the model
write.ml(model, model_path)
# log the artifact
mlflow$log_artifacts(model_path)
# stop spark session
sparkR.session.stop() |
Äú¿ÉÒÔ½«½Å±¾¸´ÖƵ½ R »ò Rstudio ²¢ÒÔ½»»¥·½Ê½ÔËÐиýű¾£¬»òÕß½«Æä±£´æµ½ÎļþÖв¢Ê¹Óà Rscript
ÃüÁîÔËÐиýű¾¡£È·±£ PATH »·¾³±äÁ¿°üº¬ mlflow Python virtualenv µÄ·¾¶¡£
µÚ 4 ²½£ºÆô¶¯ MLflow UI
ͨ¹ý´Ó shell ÖÐÔËÐÐ mlflow ui ÃüÁîÀ´Æô¶¯ MLflow UI¡£È»ºó£¬´ò¿ªä¯ÀÀÆ÷²¢Ê¹ÓÃ
URL http://127.0.0.1:5000 תÖÁÒ³ÃæÁ´½Ó¡£ÏÖÒÑÏÔʾÄú֮ǰµÄ GLM Ä£ÐÍѵÁ·£¬Äú¿ÉÒÔ¶ÔÆä½øÐиú×Ù¡£ÏÂͼÏÔʾÁËÆä½ØÆÁ¡£

µÚ 5 ²½£ºÑµÁ·¾ö²ßÊ÷Ä£ÐÍ
½«ÒªÑ§Ï°µÄ wine-quality.csv Êý¾ÝÏÂÔØµ½ÄúµÄƽ̨¡£
ÔÚ R »·¾³Öа²×° rpart °ü£º
install.packages("rpart") |
°´ÕÕ´ËʾÀý rpart-example.R À´ÎªÊ÷Ä£ÐÍ×öºÃ×¼±¸£º
# Source prep.R
file to install the dependencies
source("prep.R")
# Import mlflow python package for tracking
library(reticulate)
mlflow <- import("mlflow")
# Load rpart to build a tree model
library(rpart)
# Read in data
wine <- read.csv("wine-quality.csv")
# Build the model
fit <- rpart(quality ~ ., wine)
# Save the model that can be loaded later
saveRDS(fit, "fit.rpart")
# Save the model to mlflow tracking server
mlflow$log_artifact("fit.rpart")
# Plot
jpeg("rplot.jpg")
par(xpd=TRUE)
plot(fit)
text(fit, use.n=TRUE)
dev.off()
# Save the plot to mlflow tracking server
mlflow$log_artifact("rplot.jpg") |
R ´úÂë°üÀ¨Èý¸ö²¿·Ö£ºÄ£ÐÍѵÁ·¡¢Í¨¹ý MLflow ʵÏֵŤ¼þ¼Ç¼ÒÔ¼° R °üÒÀÀµÏî°²×°¡£
µÚ 6 ²½£ºÎª MLproject ×¼±¸°üÒÀÀµÏî
ÔÚÇ°ÃæµÄʾÀýÖУ¬ÐèÒª reticulate ºÍ rpart R °ü²ÅÄÜÔËÐдúÂë¡£Òª½«ÕâЩ´úÂë·â×°µ½Ò»¸ö¶ÀÁ¢ÏîÄ¿ÖУ¬Èç¹ûƽ̨ûÓа²×°ÕâЩ°ü£¬ÄÇôӦÔËÐÐijÖֽű¾À´×Ô¶¯°²×°ÕâЩ°ü¡£
½«Ê¹ÓÃÒÔÏ´úÂ롢ͨ¹ý prep.R À´°²×°ÏîÄ¿ËùÐèµÄËùÓÐÌØ¶¨ R °ü£º
# Accept parameters,
args[6] is the R package repo url
args <- commandArgs()
# All installed packages
pkgs <- installed.packages()
# List of required packages for this project
reqs <- c("reticulate", "rpart")
# Try to install the dependencies if not installed
sapply(reqs, function(x){
if (!x %in% rownames(pkgs)) {
install.packages(x, repos=c(args[6]))
}
}) |
µÚ 7 ²½£º²âÊÔ´úÂë
ÔÚ½«ÕâЩ´úÂë·â×°µ½ MLproject ֮ǰ£¬Çë³¢ÊÔͨ¹ýÖ±½Óµ÷ÓÃ
Rscript ÃüÁîÀ´²âÊÔÕâЩ´úÂ룬ÈçÏÂËùʾ£º
Rscript rpart-example.R
https://cran.r-project.org/ |
ÔÚ MLflow UI ÖУ¬ÄúÓ¦¸Ã¿´µ½Õâ´ÎÔËÐÐÒѱ»¸ú×Ù£¬ÈçÏÂͼËùʾ£º

µÚ 8 ²½£º´´½¨ MLproject
ÏÖÔÚ£¬ÎÒÃÇÀ´±àд¹æ·¶£¬²¢½«´ËÏîÄ¿·â×°µ½ MLflow ¿Éʶ±ð²¢ÔËÐеÄ
MLproject ÖС£ÄúÖ»ÐèÒªÔÚͬһ¸öĿ¼Öд´½¨ MLproject Îļþ¡£
name: r_example
entry_points:
main:
parameters:
r-repo: {type: string, default: "https://cran.r-project.org/"}
command: "Rscript rpart-example.R {r-repo}"
|
´ËÎļþʹÓà main Èë¿Úµã¶¨Òå r_example ÏîÄ¿¡£¸ÃÈë¿ÚµãÖ¸¶¨ÒªÍ¨¹ý mlflow run
Ö´ÐеÄÃüÁîºÍ²ÎÊý¡£¶ÔÓÚ´ËÏîÄ¿£¬Rscript ÊÇÓÃÓÚµ÷Óà R Ô´´úÂëµÄ shell ÃüÁî¡£r-repo
²ÎÊý»áÌṩ URL ×Ö·û´®£¬Äú¿ÉÒÔͨ¹ýËüÀ´°²×°´ÓÊô°ü¡£ÒÑÉèÖÃÒ»¸öȱʡֵ¡£½«´Ë²ÎÊý´«µÝÖÁÓÃÓÚÔËÐÐ R
Ô´´úÂëµÄÃüÁî¡£
ÏÖÔÚ£¬ÄúÒÑÓµÓÐѵÁ·´ËÊ÷Ä£ÐÍËùÐèµÄËùÓÐÎļþ£¬¿ÉÒÔͨ¹ý´´½¨Ä¿Â¼²¢½«Êý¾ÝºÍ
R Ô´´úÂë¸´ÖÆµ½¸ÃĿ¼À´´´½¨ MLproject¡£
.
©¸©¤©¤ R
©À©¤©¤ MLproject
©À©¤©¤ prep.R
©À©¤©¤ rpart-example.R
©¸©¤©¤ wine-quality.csv |
µÚ 9 ²½£º¼ìÈë²¢²âÊÔ MLproject
¿ÉÒÔ½«ÏÈǰµÄ MLproject ¼ìÈë²¢ÍÆË͵½ GitHub ´æ´¢¿â¡£Ê¹ÓÃÒÔÏÂÃüÁîÀ´²âÊÔ¸ÃÏîÄ¿¡£¿ÉÒÔÔÚ°²×°ÁË
R µÄÈÎºÎÆ½Ì¨ÉÏÔËÐиÃÏîÄ¿¡£
mlflow run https://github.com/adrian555/DocsDump#files/mlflow-projects/R |
Ò²¿ÉÒÔ´Ó MLflow ¸ú×Ù½çÃæÖв鿴¸ÃÏîÄ¿£¬ÈçÏÂͼËùʾ£º

´ËÊÓͼÓëǰһ´ÎÔËÐУ¨ÎÞ Mlproject ¹æ·¶£©Ö®¼äµÄ²îÒìÊÇ Run Command£¨½«²¶»ñÓÃÓÚÔËÐÐÏîÄ¿µÄÈ·ÇÐÃüÁºÍ
Parameters£¨½«×Ô¶¯¼Ç¼´«µÝµ½Èë¿ÚµãµÄÈκβÎÊý£©¡£
½áÊøÓï
ÔÚ±¾½Ì³ÌÖУ¬ÄúÒÑÔÚ R Öгɹ¦µØ´´½¨ÁË MLproject£¬²¢Ê¹Óà MLflow ¸ú×ÙºÍÔËÐÐÁ˸ÃÏîÄ¿¡£´Ë·½·¨ÈÃ
R Óû§Äܹ»Ê¹Óà MLflow Tracking ×é¼þ£¬´Ó¶ø¿ÉÒÔ¿ìËÙ¸ú×Ù R Ä£ÐÍ¡£Ëü»¹ÑÝʾÁË MLflow
µÄ Projects ×é¼þµÄÓÃ;£¬¼´¶¨ÒåÏîÄ¿²¢Ê¹ÏîÄ¿±ãÓÚÖØÐÂÔËÐС£R Óû§¿ÉÒÔ¿ìËÙÉèÖÃÆäÏîÄ¿£¬²¢ÇÒ¿ÉÒÔʹÓÃ
MLflow ÇáËɸú×ÙºÍÔËÐÐÏîÄ¿¡£ |