JuliaCon 2024¶

End-to-End AI (E2EAI) with Julia, K0s, and Argo Workflow¶

Presentor: Paulito Palmes, IBM Research¶
Collaborators: SUNRISE-6G EU Partners¶
Date: July 11, 2024¶

OUTLINE¶

  • The Motivations Behind E2EAI (End-to-End AI)¶

  • Components of E2EAI¶

  • The Julia AI/ML Solution Use-case¶

  • The Future¶

The Motivations Behind E2EAI¶

  • current paradigms do not exploit tight integration of IaC and MLOPs in deploying AI solutions
  • issues with no tight integration of IaC and MLOPs in deploying AI solutions include:
    • difficult to identify optimal infrastructure
    • difficult to predict resource viability and feasibility
    • difficult to infer the cost of deployment
    • difficult to identify performance bottlenecks and root-cause analysis

End-to-End AI (E2EAI)¶

No description has been provided for this image
  • E2EAI is a unified framework tightly integrating MLOps and IaC
    • single yaml file: Infrastructure + ML Pipeline + LifeCycle Management
    • single yaml file to describe both the IaC and MLOPs
    • reliance on yaml workflow templates imply zero to minimal coding
    • collection of yamls can be used as inputs to LLM for intent-driven E2EAI

Components of E2EAI¶

No description has been provided for this image No description has been provided for this image
  • SUNRISE-6G
    • SUstainable federatioN of Research Infrastructures
      for Scaling-up Experimentation in 6G
    • H2020 EU Project (3 years)

The Julia AI/ML Solution Use-case¶

  • AutoMLPipeline workflow
  • Integrating AutoMLPipeline in E2EAI

Load ML pipeline preprocessing components and models¶

In [24]:
using AutoMLPipeline;
import PythonCall; const PYC=PythonCall; warnings = PYC.pyimport("warnings"); warnings.filterwarnings("ignore")

#### Decomposition
pca = skoperator("PCA"); fa  = skoperator("FactorAnalysis"); ica = skoperator("FastICA")
#### Scaler 
rb   = skoperator("RobustScaler"); pt   = skoperator("PowerTransformer"); norm = skoperator("Normalizer")
mx   = skoperator("MinMaxScaler"); std  = skoperator("StandardScaler")
#### categorical preprocessing
ohe = OneHotEncoder()
#### Column selector
catf = CatFeatureSelector(); numf = NumFeatureSelector(); disc = CatNumDiscriminator()
#### Learners
rf = skoperator("RandomForestClassifier"); gb = skoperator("GradientBoostingClassifier"); lsvc = skoperator("LinearSVC")
svc = skoperator("SVC"); mlp = skoperator("MLPClassifier")
ada = skoperator("AdaBoostClassifier"); sgd = skoperator("SGDClassifier")
skrf_reg = skoperator("RandomForestRegressor"); skgb_reg = skoperator("GradientBoostingRegressor")
jrf = RandomForest(); tree = PrunedTree()
vote = VoteEnsemble(); stack = StackEnsemble(); best = BestLearner();

Prepare dataset for classification¶

In [25]:
# Make sure that the input feature is a dataframe and the target output is a 1-D vector.
using AutoMLPipeline
profbdata = getprofb()
X = profbdata[:,2:end] 
Y = profbdata[:,1] |> Vector;
head(x)=first(x,10)
head(profbdata)
Out[25]:
10×7 DataFrame
RowHome.AwayFavorite_PointsUnderdog_PointsPointspreadFavorite_NameUnderdog_nameYear
String7Int64Int64Float64String3String3Int64
1away27244.0BUFMIA89
2at_home17143.0CHICIN89
3away5102.5CLEPIT89
4at_home2805.5NODAL89
5at_home3875.5MINHOU89
6at_home34206.0DENKC89
7away31216.0LANATL89
8at_home24272.5NYJNE89
9away16131.5PHXDET89
10at_home40143.5LAASD89

Pipeline to transform categorical features to one-hot encoding¶

In [26]:
pohe = catf |> ohe
tr = fit_transform!(pohe,X,Y)
head(tr)
Out[26]:
10×56 DataFrame
Rowx1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16x17x18x19x20x21x22x23x24x25x26x27x28x29x30x31x32x33x34x35x36x37x38x39x40x41x42x43x44x45x46x47x48x49x50x51x52x53x54x55x56
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
11.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
20.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
30.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
40.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
50.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
60.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
70.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
80.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
90.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
100.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0

Pipeline to transform numerical features to pca and ica and concatenate them¶

In [27]:
pdec = (numf |> pca) + (numf |> ica)
tr = fit_transform!(pdec,X,Y)
head(tr)
Out[27]:
10×8 DataFrame
Rowx1x2x3x4x1_1x2_1x3_1x4_1
Float64Float64Float64Float64Float64Float64Float64Float64
12.474777.87074-1.104950.9024310.4331840.07962941.21188-0.696399
2-5.47113-3.82946-2.083421.00524-0.851126-0.5668531.16879-0.17776
330.4068-10.8073-6.123390.883938-1.912152.992341.1004-1.30033
48.18372-15.507-1.432031.08255-1.702830.9558141.184390.499145
516.6176-6.68636-1.665970.978243-0.881721.664751.19639-0.180815
610.25885.221120.07316490.9284960.447970.9096971.24403-0.328454
77.134355.609020.3686610.9397970.5108850.6015751.24843-0.232919
8-1.1636910.3011-2.155640.869570.449727-0.334921.18641-1.07025
9-6.38764-4.92017-3.573390.986345-1.20905-0.6733661.13179-0.498682
1017.05670.672-3.294480.879581-0.4867061.574331.16653-1.04527

More complex pipeline with robust scaling and power transform¶

In [28]:
ppt = (numf |> rb |> ica) + (numf |> pt |> pca)
tr = fit_transform!(ppt,X,Y)
head(tr)
Out[28]:
10×8 DataFrame
Rowx1x2x3x4x1_1x2_1x3_1x4_1
Float64Float64Float64Float64Float64Float64Float64Float64
10.07974280.4332330.696339-1.21189-0.645521.40289-0.02844680.111773
2-0.566905-0.8510570.177854-1.1688-0.8324040.475629-1.14881-0.01702
32.99223-1.912361.30034-1.100321.544911.65258-1.35967-2.57866
40.955637-1.70299-0.49905-1.184351.320650.563565-2.05839-0.74898
51.6647-0.8818890.180796-1.196341.12231.45555-0.88864-0.776195
60.9097940.4478950.328349-1.244010.2774621.709360.001309380.0768767
70.6016740.5108340.232825-1.248420.09778211.58007-0.03646380.258464
8-0.3347870.4498551.07021-1.18643-1.318151.274630.00789964-0.0553192
9-0.67344-1.208940.498816-1.1318-1.290560.326316-1.31916-0.511818
101.57436-0.4867841.04522-1.16650.3182241.76616-0.28608-1.02674

Evaluating complex pipeline with RandomForest learner¶

In [29]:
prf = (catf |> ohe) + (numf |> rb |> fa) + (numf |> pt |> pca) |> rf
crossvalidate(prf,X,Y,"accuracy_score")
fold: 1, 0.6716417910447762
fold: 2, 0.6567164179104478
fold: 3, 0.75
fold: 4, 0.6865671641791045
fold: 5, 0.6716417910447762
fold: 6, 0.5970149253731343
fold: 7, 0.5970149253731343
fold: 8, 0.7205882352941176
fold: 9, 0.6865671641791045
fold: 10, 0.6119402985074627
errors: 0
Out[29]:
(mean = 0.6649692712906058, std = 0.05105709742517328, folds = 10, errors = 0)

Evaluating complex pipeline with Linear SVM learner¶

In [30]:
plsvc = ((numf |> rb |> pca)+(numf |> rb |> fa)+(numf |> rb |> ica)+(catf |> ohe )) |> lsvc
crossvalidate(plsvc,X,Y,"accuracy_score")
fold: 1, 0.746268656716418
fold: 2, 0.7313432835820896
fold: 3, 0.7647058823529411
fold: 4, 0.7611940298507462
fold: 5, 0.7761194029850746
fold: 6, 0.6567164179104478
fold: 7, 0.746268656716418
fold: 8, 0.7941176470588235
fold: 9, 0.7164179104477612
fold: 10, 0.7164179104477612
errors: 0
Out[30]:
(mean = 0.740956979806848, std = 0.03870925271242532, folds = 10, errors = 0)

Parallel search of the best ML pipeline¶

In [31]:
using Random, DataFrames, Distributed
nprocs() == 1 && addprocs()
@everywhere using DataFrames; @everywhere using AutoMLPipeline
@everywhere begin
    import PythonCall; const PYC=PythonCall; warnings = PYC.pyimport("warnings"); warnings.filterwarnings("ignore")
end
@everywhere begin
  profbdata = getprofb(); X = profbdata[:,2:end]; Y = profbdata[:,1] |> Vector;
end
@everywhere begin
  jrf  = RandomForest(); ohe  = OneHotEncoder(); catf = CatFeatureSelector(); numf = NumFeatureSelector()
  tree = PrunedTree(); ada  = skoperator("AdaBoostClassifier"); disc = CatNumDiscriminator()
  sgd  = skoperator("SGDClassifier"); std  = skoperator("StandardScaler"); lsvc = skoperator("LinearSVC")
end

learners = @sync @distributed (vcat) for learner in [jrf,ada,sgd,lsvc,tree]
   pcmc = disc |> ((catf |> ohe) + (numf |> std)) |> learner
   println(learner.name[1:end-4])
   mean,sd,_ = crossvalidate(pcmc,X,Y,"accuracy_score",3)
   DataFrame(name=learner.name[1:end-4],mean=mean,sd=sd)
end;
      From worker 4:	AdaBoostClassifier
      From worker 5:	SGDClassifier
      From worker 3:	rf
      From worker 7:	prunetree
      From worker 6:	LinearSVC
      From worker 5:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 5:	│                for entry (13, 2) = SF.
      From worker 5:	│                Patching value to PIT.
      From worker 5:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 5:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 5:	│                for entry (65, 2) = SF.
      From worker 5:	│                Patching value to PIT.
      From worker 5:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 5:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 5:	│                for entry (161, 2) = SF.
      From worker 5:	│                Patching value to PIT.
      From worker 5:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 5:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 5:	│                for entry (196, 2) = SF.
      From worker 5:	│                Patching value to PIT.
      From worker 5:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 7:	fold: 1, 0.6071428571428571
      From worker 3:	fold: 1, 0.7008928571428571
      From worker 6:	fold: 1, 0.7232142857142857
      From worker 5:	fold: 1, 0.6741071428571429
      From worker 7:	fold: 2, 0.6116071428571429
      From worker 3:	fold: 2, 0.6294642857142857
      From worker 6:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 6:	│                for entry (16, 2) = SF.
      From worker 6:	│                Patching value to MIA.
      From worker 6:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 6:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 6:	│                for entry (71, 2) = SF.
      From worker 6:	│                Patching value to MIA.
      From worker 6:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 6:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 6:	│                for entry (154, 2) = SF.
      From worker 6:	│                Patching value to MIA.
      From worker 6:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 6:	┌ Warning: Unseen value found in OneHotEncoder,
      From worker 6:	│                for entry (199, 2) = SF.
      From worker 6:	│                Patching value to MIA.
      From worker 6:	└ @ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106
      From worker 4:	fold: 1, 0.6830357142857143
      From worker 5:	fold: 2, 0.7276785714285714
      From worker 7:	fold: 3, 0.5357142857142857
      From worker 7:	errors: 0
      From worker 6:	fold: 2, 0.7098214285714286
      From worker 3:	fold: 3, 0.6383928571428571
      From worker 3:	errors: 0
      From worker 5:	fold: 3, 0.7098214285714286
      From worker 5:	errors: 0
      From worker 6:	fold: 3, 0.7321428571428571
      From worker 6:	errors: 0
      From worker 4:	fold: 2, 0.6651785714285714
      From worker 4:	fold: 3, 0.6428571428571429
      From worker 4:	errors: 0

Best Pipeline¶

In [32]:
@show sort!(learners,:mean,rev=true);
sort!(learners, :mean, rev = true) = 5×3 DataFrame
 Row │ name                mean      sd
     │ String              Float64   Float64
─────┼─────────────────────────────────────────
   1 │ LinearSVC           0.721726  0.0112349
   2 │ SGDClassifier       0.703869  0.0272772
   3 │ AdaBoostClassifier  0.66369   0.0201306
   4 │ rf                  0.65625   0.0389187
   5 │ prunetree           0.584821  0.0425866

E2EAI Application¶

Infrastructure Creation Automation¶

No description has been provided for this image
No description has been provided for this image

AI as a Service: Zero Coding Using Workflow Template¶

No description has been provided for this image

Explicit ML Pipeline¶

No description has been provided for this image

Optimal Pipeline Discovery by AutoML¶

No description has been provided for this image

Low vs High Pipeline Complexity¶

No description has been provided for this image

Low Complexity Pipeline¶

No description has been provided for this image

High Complexity Pipeline¶

No description has been provided for this image

The Future¶

No description has been provided for this image No description has been provided for this image
  • Unified Control Plane¶

  • Intent-Driven E2EAI¶

Acknowledgement¶

No description has been provided for this image

This work has been funded by the SUNRISE-6G project, Grant number 101139257, co-funded by the European Union and Smart Networks and Services Joint Undertaking (SNS JU).¶

DISCLAIMER¶

"Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union and Smart Networks and Services Joint Undertaking (SNS JU). Neither the European Union nor the granting authority can be held responsible for them."¶



THANK YOU!