Source: UCI Machine Learning repository (http://www.ics.uci.edu/~mlearn/MLSummary.html).
First, let's load the dataset and look what features are collected for the CPU's:
Vendor | Model | MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | ERP | |
---|---|---|---|---|---|---|---|---|---|---|
0 | adviser | 32/60 | 125 | 256 | 6000 | 256 | 16 | 128 | 198 | 199 |
1 | amdahl | 470v/7 | 29 | 8000 | 32000 | 32 | 8 | 32 | 269 | 253 |
2 | amdahl | 470v/7a | 29 | 8000 | 32000 | 32 | 8 | 32 | 220 | 253 |
3 | amdahl | 470v/7b | 29 | 8000 | 32000 | 32 | 8 | 32 | 172 | 253 |
4 | amdahl | 470v/7c | 29 | 8000 | 16000 | 32 | 8 | 16 | 132 | 132 |
In the 1982 landscape, Amdahl delivers 9 CPU's of the premium class.
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | |
---|---|---|---|---|---|---|---|
0 | 125 | 256 | 6000 | 256 | 16 | 128 | 198 |
1 | 29 | 8000 | 32000 | 32 | 8 | 32 | 269 |
2 | 29 | 8000 | 32000 | 32 | 8 | 32 | 220 |
3 | 29 | 8000 | 32000 | 32 | 8 | 32 | 172 |
4 | 29 | 8000 | 16000 | 32 | 8 | 16 | 132 |
The original ERP vs BYTE's PRP metrics, reported on UCI, were measured as mean deviation percentage. This is not standard today, so later authors have chosen to use RMSE.
Apparently, the RMSE results of later preformed linear regressions came not close to the accuracy of the original Linear Regression, as found in the dataset field ERP (Estimated relative performance).
Mean Absolute Error: 24.330143540669855 Mean Squared Error: 1737.3349282296651 Root Mean Squared Error: 41.68134988492653
Mean Absolute Percentage Error (MAPE): 33.91 Accuracy: 66.09
to isolate the ERP and PRP values, before splitting
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | |
---|---|---|---|---|---|---|
10 | 400 | 1000 | 3000 | 0 | 1 | 2 |
14 | 350 | 64 | 64 | 0 | 1 | 4 |
15 | 200 | 512 | 16000 | 0 | 4 | 32 |
17 | 143 | 512 | 5000 | 0 | 7 | 32 |
18 | 143 | 1000 | 2000 | 0 | 5 | 16 |
... | ... | ... | ... | ... | ... | ... |
202 | 180 | 262 | 4000 | 0 | 1 | 3 |
203 | 180 | 512 | 4000 | 0 | 1 | 3 |
204 | 124 | 1000 | 8000 | 0 | 1 | 8 |
206 | 125 | 2000 | 8000 | 0 | 2 | 14 |
208 | 480 | 1000 | 4000 | 0 | 0 | 0 |
69 rows × 6 columns
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | |
---|---|---|---|---|---|---|
122 | 1500 | 768 | 1000 | 0 | 0 | 0 |
123 | 1500 | 768 | 2000 | 0 | 0 | 0 |
124 | 800 | 768 | 2000 | 0 | 0 | 0 |
207 | 480 | 512 | 8000 | 32 | 0 | 0 |
208 | 480 | 1000 | 4000 | 0 | 0 | 0 |
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | |
---|---|---|---|---|---|---|---|
0 | 125 | 256 | 6000 | 256 | 16 | 128 | 198 |
1 | 29 | 8000 | 32000 | 32 | 8 | 32 | 269 |
2 | 29 | 8000 | 32000 | 32 | 8 | 32 | 220 |
3 | 29 | 8000 | 32000 | 32 | 8 | 32 | 172 |
4 | 29 | 8000 | 16000 | 32 | 8 | 16 | 132 |
feat engineering was no success, so I'll use KFold methode...
test_size=0.22, random_state=42
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | |
---|---|---|---|---|---|---|
count | 209.000000 | 209.000000 | 209.000000 | 209.000000 | 209.000000 | 209.000000 |
mean | 203.822967 | 2867.980861 | 11796.153110 | 25.205742 | 4.698565 | 18.267943 |
std | 260.262926 | 3878.742758 | 11726.564377 | 40.628722 | 6.816274 | 25.997318 |
min | 17.000000 | 64.000000 | 64.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 50.000000 | 768.000000 | 4000.000000 | 0.000000 | 1.000000 | 5.000000 |
50% | 110.000000 | 2000.000000 | 8000.000000 | 8.000000 | 2.000000 | 8.000000 |
75% | 225.000000 | 4000.000000 | 16000.000000 | 32.000000 | 6.000000 | 24.000000 |
max | 1500.000000 | 32000.000000 | 64000.000000 | 256.000000 | 52.000000 | 176.000000 |
performing a multiplication of features, while using a cross val score ranking the products upon .
Baseline R2:, 0.837 Top 10 interactions:, [('MMIN', 'CHMAX', 0.963), ('MMAX', 'CHMAX', 0.935), ('MMIN', 'MMAX', 0.928), ('MMAX', 'CACH', 0.92), ('MMAX', 'CHMIN', 0.908), ('MMIN', 'CHMIN', 0.89), ('MYCT', 'CHMAX', 0.869), ('MMIN', 'CACH', 0.853), ('CHMIN', 'CACH', 0.844), ('MYCT', 'MMAX', 0.843)]
[('MYCT', 'MMAX', 0.843), ('MYCT', 'CACH', 0.842), ('MYCT', 'CHMAX', 0.869), ('MMIN', 'MMAX', 0.928), ('MMIN', 'CACH', 0.853), ('MMIN', 'CHMIN', 0.89), ('MMIN', 'CHMAX', 0.963), ('MMAX', 'CACH', 0.92), ('MMAX', 'CHMIN', 0.908), ('MMAX', 'CHMAX', 0.935), ('CHMIN', 'CACH', 0.844)]
Highest correlation is found with features 'MMIN' and'CHMAX'.
Adding interactions and transformed variables leads to an extended linear regression model, a polynomial regression. Data scientists rely on testing and experimenting to validate an approach to solving a problem. We redefine the set of predictors in code using interactions and quadratic terms by squaring the variables:
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | MYCT^2 | MYCT*MMIN | MYCT*MMAX | MYCT*CACH | ... | MMAX^2 | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | CACH^2 | CHMIN^2 | CHMIN*CACH | CHMIN*CHMAX | CHMAX^2 | CHMAX*CACH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 125 | 256 | 6000 | 256 | 16 | 128 | 15625 | 32000 | 750000 | 32000 | ... | 36000000 | 1536000 | 96000 | 768000 | 65536 | 256 | 4096 | 2048 | 16384 | 32768 |
1 | 29 | 8000 | 32000 | 32 | 8 | 32 | 841 | 232000 | 928000 | 928 | ... | 1024000000 | 1024000 | 256000 | 1024000 | 1024 | 64 | 256 | 256 | 1024 | 1024 |
2 | 29 | 8000 | 32000 | 32 | 8 | 32 | 841 | 232000 | 928000 | 928 | ... | 1024000000 | 1024000 | 256000 | 1024000 | 1024 | 64 | 256 | 256 | 1024 | 1024 |
3 | 29 | 8000 | 32000 | 32 | 8 | 32 | 841 | 232000 | 928000 | 928 | ... | 1024000000 | 1024000 | 256000 | 1024000 | 1024 | 64 | 256 | 256 | 1024 | 1024 |
4 | 29 | 8000 | 16000 | 32 | 8 | 16 | 841 | 232000 | 464000 | 928 | ... | 256000000 | 512000 | 128000 | 256000 | 1024 | 64 | 256 | 128 | 256 | 512 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
204 | 124 | 1000 | 8000 | 0 | 1 | 8 | 15376 | 124000 | 992000 | 0 | ... | 64000000 | 0 | 8000 | 64000 | 0 | 1 | 0 | 8 | 64 | 0 |
205 | 98 | 1000 | 8000 | 32 | 2 | 8 | 9604 | 98000 | 784000 | 3136 | ... | 64000000 | 256000 | 16000 | 64000 | 1024 | 4 | 64 | 16 | 64 | 256 |
206 | 125 | 2000 | 8000 | 0 | 2 | 14 | 15625 | 250000 | 1000000 | 0 | ... | 64000000 | 0 | 16000 | 112000 | 0 | 4 | 0 | 28 | 196 | 0 |
207 | 480 | 512 | 8000 | 32 | 0 | 0 | 230400 | 245760 | 3840000 | 15360 | ... | 64000000 | 256000 | 0 | 0 | 1024 | 0 | 0 | 0 | 0 | 0 |
208 | 480 | 1000 | 4000 | 0 | 0 | 0 | 230400 | 480000 | 1920000 | 0 | ... | 16000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
209 rows × 27 columns
0 | dif | |
---|---|---|
CHMAX | -3094.120727 | NaN |
MYCT^2 | 3023.603051 | 6117.723778 |
MYCT*MMIN | -2945.230796 | -5968.833846 |
MYCT*MMAX | -2607.183590 | 338.047206 |
MYCT*CACH | -2482.055014 | 125.128575 |
MYCT*CHMIN | -2444.508179 | 37.546835 |
MYCT*CHMAX | -2218.561650 | 225.946529 |
MMIN^2 | 1096.110347 | 3314.671997 |
MMIN*MMAX | -805.497952 | -1901.608299 |
MMIN*CACH | -599.578255 | 205.919697 |
MMIN*CHMIN | -609.165253 | -9.586999 |
MMIN*CHMAX | -372.011297 | 237.153956 |
MMAX^2 | 280.121013 | 652.132310 |
MMAX*CACH | -218.008912 | -498.129925 |
MMAX*CHMIN | -220.371612 | -2.362701 |
MMAX*CHMAX | -107.654781 | 112.716831 |
CACH^2 | 53.144633 | 160.799414 |
CHMIN^2 | 53.353533 | 0.208901 |
CHMIN*CACH | -66.345342 | -119.698876 |
CHMIN*CHMAX | -90.892193 | -24.546851 |
CHMAX^2 | 89.855308 | 180.747502 |
CHMAX*CACH | -84.194105 | -174.049413 |
I make a graph of the results that demonstrates some additions are great because the squared error decreases, and other additions are terrible because they increase the error instead. Adding the feat. max channels is ok, but adding the square of the f. cycle times is not helping to improve the model.
Let's multiply these more correlated features: ('MMIN', 'MMAX', 0.928), ('MMIN', 'CHMIN', 0.89), ('MMIN', 'CHMAX', 0.963), ('MMAX', 'CACH', 0.92), ('MMAX', 'CHMIN', 0.908), ('MMAX', 'CHMAX', 0.935) # ('MMIN', 'CACH', 0.853),
(209, 27) Mean squared error 84.194
To decide on the importance of the features we are going to use LassoCV
estimator. It has a built in crossvalidator.
The features with the highest absolute coef_
value are considered the most important.
[9.41136681e+00 3.52736473e+01 5.27983382e+01 2.60609989e+01 1.08525978e+01 3.65186475e-02]
Prediction accuracy for the standardized test dataset
To decide on the importance of the 6 features plus CYCinv, we are going to use LassoCV
estimator. The features with the highest absolute coef_
value are
considered the most important.
[0.00456536 0.01535931 0.00566305 0. 0. 0. ]
No success here with the new feature: 0.
Now we want to select the two features which are the most important.
SelectFromModel() allows for setting the threshold. Only the features with
the coef_
higher than the threshold will remain. Here, we want to set the
threshold slightly above the third highest coef_
calculated by LassoCV()
from our data.
5 Selected features: ['MYCT' 'MMIN' 'MMAX'] X_transform: [[ 125 256] [ 29 8000] [ 29 8000]] (209, 2)
0.01
array([0.01896501, 0.01272007, 0.0088509 , 0. , 0. , 0. ])
array([3, 4, 5, 2, 1, 0], dtype=int64)
array([0, 1, 2], dtype=int64)
Finally we will plot the selected two features from the data.
array([1.00000000e-04, 2.44843675e-04, 5.99484250e-04, 1.46779927e-03, 3.59381366e-03, 8.79922544e-03, 2.15443469e-02, 5.27499706e-02, 1.29154967e-01, 3.16227766e-01])
0.9108691389420925
Use of makes a 4% better score!
LassoCV(alphas=array([1.00000000e-04, 2.44843675e-04, 5.99484250e-04, 1.46779927e-03, 3.59381366e-03, 8.79922544e-03, 2.15443469e-02, 5.27499706e-02, 1.29154967e-01, 3.16227766e-01]), cv=5, random_state=100)
0.31622776601683794
The eigenvectors and eigenvalues of a covariance (or correlation) matrix represent the “core” of a PCA: The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes.
Proportion of variance explained by each component: 1st component - 0.96, 2nd component - 0.04
(209, 6)
Typically, they leave at least 90% of data variance to be preserved.
Proportion of variance explained by each component: 1st component - 0.56, 2nd component - 0.14 Directions of principal components: 1st component: [-0.28998081 0.42736541 0.46913674 0.42856007 0.43533454 0.37416675] 2nd component: [ 0.68218424 -0.33298195 -0.11408717 0.15163789 0.27463756 0.55884878]
3rd component - 0.12
(0.5594573887493584, 0.13822677663447377, 0.12320423687138933, 0.08272067788012558, 0.06740408477667023, 0.02898683508798275)
Meaning of the 4 components: -0.290 x MYCT + 0.427 x MMIN + 0.469 x MMAX + 0.429 x CACH + 0.435 x CHMIN + 0.374 x CHMAX 0.682 x MYCT + -0.333 x MMIN + -0.114 x MMAX + 0.152 x CACH + 0.275 x CHMIN + 0.559 x CHMAX 0.669 x MYCT + 0.548 x MMIN + 0.264 x MMAX + 0.020 x CACH + -0.030 x CHMIN + -0.426 x CHMAX 0.027 x MYCT + 0.088 x MMIN + 0.477 x MMAX + -0.714 x CACH + -0.255 x CHMIN + 0.436 x CHMAX
Meaning of the 2 components: -0.290 x MYCT + 0.427 x MMIN + 0.469 x MMAX + 0.429 x CACH + 0.435 x CHMIN + 0.374 x CHMAX 0.682 x MYCT + -0.333 x MMIN + -0.114 x MMAX + 0.152 x CACH + 0.275 x CHMIN + 0.559 x CHMAX
Let’s compare how many unique rows we have for the test metrics.
Total CPUs (amdahl) : Vendor Model MYCT MMIN MMAX CACH CHMIN CHMAX PRP ERP 1 amdahl 470v/7 29 8000 32000 32 8 32 269 253 2 amdahl 470v/7a 29 8000 32000 32 8 32 220 253 3 amdahl 470v/7b 29 8000 32000 32 8 32 172 253 4 amdahl 470v/7c 29 8000 16000 32 8 16 132 132 5 amdahl 470v/b 26 8000 32000 64 8 32 318 290 6 amdahl 580-5840 23 16000 32000 64 16 32 367 381 7 amdahl 580-5850 23 16000 32000 64 16 32 489 381 8 amdahl 580-5860 23 16000 64000 64 16 32 636 749 9 amdahl 580-5880 23 32000 64000 128 32 64 1144 1238 Uniquely specified CPUs (amdahl) : 9
Vendor | Model | MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | ERP | |
---|---|---|---|---|---|---|---|---|---|---|
1 | amdahl | 470v/7 | 29 | 8000 | 32000 | 32 | 8 | 32 | 269 | 253 |
2 | amdahl | 470v/7a | 29 | 8000 | 32000 | 32 | 8 | 32 | 220 | 253 |
3 | amdahl | 470v/7b | 29 | 8000 | 32000 | 32 | 8 | 32 | 172 | 253 |
4 | amdahl | 470v/7c | 29 | 8000 | 16000 | 32 | 8 | 16 | 132 | 132 |
5 | amdahl | 470v/b | 26 | 8000 | 32000 | 64 | 8 | 32 | 318 | 290 |
6 | amdahl | 580-5840 | 23 | 16000 | 32000 | 64 | 16 | 32 | 367 | 381 |
7 | amdahl | 580-5850 | 23 | 16000 | 32000 | 64 | 16 | 32 | 489 | 381 |
8 | amdahl | 580-5860 | 23 | 16000 | 64000 | 64 | 16 | 32 | 636 | 749 |
9 | amdahl | 580-5880 | 23 | 32000 | 64000 | 128 | 32 | 64 | 1144 | 1238 |
lets take a look at Amdahl’s competitors in the market and how many product offerings they have.
Model | |
---|---|
Vendor | |
ibm | 32 |
nas | 19 |
ncr | 13 |
honeywell | 13 |
sperry | 13 |
siemens | 12 |
cdc | 9 |
amdahl | 9 |
burroughs | 8 |
dg | 7 |
harris | 7 |
hp | 7 |
c.r.d | 6 |
magnuson | 6 |
dec | 6 |
ipl | 6 |
prime | 5 |
formation | 5 |
cambex | 5 |
perkin-elmer | 3 |
nixdorf | 3 |
gould | 3 |
wang | 2 |
bti | 2 |
basf | 2 |
apollo | 2 |
microdata | 1 |
four-phase | 1 |
sratus | 1 |
adviser | 1 |
The individual technical metrics in the sourced data set, to see how Amdahl products compare to that of the competition.
Log transformation was done in this plot, and reveals that scaling is advisable.
CHMIN CHMAX CACH
The green points are Amdahl's CPU's.
CHMIN | CHMAX | CACH | Vendor | |
---|---|---|---|---|
1 | 8 | 32 | 32 | amdahl |
2 | 8 | 32 | 32 | amdahl |
3 | 8 | 32 | 32 | amdahl |
4 | 8 | 16 | 32 | amdahl |
5 | 8 | 32 | 64 | amdahl |
I want to cluster the CPUs into three main groups, which I call Budget, Mid range and Premium.
number of estimated clusters : 3
(209,)
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | 0 | |
---|---|---|---|---|---|---|---|---|
0 | 125 | 256 | 6000 | 256 | 16 | 128 | 198 | Budget |
1 | 29 | 8000 | 32000 | 32 | 8 | 32 | 269 | Mid |
2 | 29 | 8000 | 32000 | 32 | 8 | 32 | 220 | Mid |
3 | 29 | 8000 | 32000 | 32 | 8 | 32 | 172 | Mid |
4 | 29 | 8000 | 16000 | 32 | 8 | 16 | 132 | Budget |
... | ... | ... | ... | ... | ... | ... | ... | ... |
204 | 124 | 1000 | 8000 | 0 | 1 | 8 | 42 | Budget |
205 | 98 | 1000 | 8000 | 32 | 2 | 8 | 46 | Budget |
206 | 125 | 2000 | 8000 | 0 | 2 | 14 | 52 | Budget |
207 | 480 | 512 | 8000 | 32 | 0 | 0 | 67 | Budget |
208 | 480 | 1000 | 4000 | 0 | 0 | 0 | 45 | Budget |
209 rows × 8 columns
<bound method BaseEstimator.get_params of KMeans(n_clusters=3)>
Ein-Dor is said to have bucketed the benchmark in 3 intervals: "6 to 33", "33 to 72", "72 to 1200".
Curiously, the 3 classes are only found in the highest PRP bin. In fact not a surprise as the budget class was overcrowded.
I'll stop EDA here now, and move on to experiment with some recent algorithms.
[t-SNE] Computing 16 nearest neighbors... [t-SNE] Indexed 163 samples in 0.000s... [t-SNE] Computed neighbors for 163 samples in 0.000s... [t-SNE] Computed conditional probabilities for sample 163 / 163 [t-SNE] Mean sigma: 56.061938 [t-SNE] KL divergence after 250 iterations with early exaggeration: 58.126873 [t-SNE] KL divergence after 500 iterations: 0.233176
(163, 2)
(163, 6)
(163, 2)
Shall I work with the raw values, or the logarithms, or a combination? A or B
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | |
---|---|---|---|---|---|---|
189 | 59 | 8000 | 16000 | 64 | 12 | 24 |
190 | 26 | 8000 | 24000 | 32 | 8 | 16 |
191 | 26 | 8000 | 32000 | 64 | 12 | 16 |
192 | 26 | 8000 | 32000 | 128 | 24 | 32 |
193 | 116 | 2000 | 8000 | 32 | 5 | 28 |
194 | 50 | 2000 | 32000 | 24 | 6 | 26 |
195 | 50 | 2000 | 32000 | 48 | 26 | 52 |
196 | 50 | 2000 | 32000 | 112 | 52 | 104 |
197 | 50 | 4000 | 32000 | 112 | 52 | 104 |
198 | 30 | 8000 | 64000 | 96 | 12 | 176 |
199 | 30 | 8000 | 64000 | 128 | 12 | 176 |
200 | 180 | 262 | 4000 | 0 | 1 | 3 |
201 | 180 | 512 | 4000 | 0 | 1 | 3 |
202 | 180 | 262 | 4000 | 0 | 1 | 3 |
203 | 180 | 512 | 4000 | 0 | 1 | 3 |
204 | 124 | 1000 | 8000 | 0 | 1 | 8 |
205 | 98 | 1000 | 8000 | 32 | 2 | 8 |
206 | 125 | 2000 | 8000 | 0 | 2 | 14 |
207 | 480 | 512 | 8000 | 32 | 0 | 0 |
208 | 480 | 1000 | 4000 | 0 | 0 | 0 |
I don't like the 0's in 200-208. Those are the poorest performers.
RandomForestRegressor(bootstrap=False, max_features=2, n_estimators=1900, n_jobs=3, random_state=1100)
Return the coefficient of determination of the prediction.
0.6598361173983542
Mean Absolute Error: 43.330152555301304 Mean Squared Error: 22331.653435564116 Root Mean Squared Error: 149.4377911893913 Mean Absolute Percentage Error (MAPE): 8.33 Accuracy: 91.67
The ERP values were in 1988 calculated with a linear regression method on an Amdahl 470v/7 cpu.
array([ 0.38434077, -0.40992288, -0.55884732, -0.55884732, -0.21135697, -0.55884732])
DecisionTreeRegressor(max_features='auto', random_state=186422792)
(163, 6)
This is an experimental regressor in sklearn.
Let’s now fit a GradientBoostingRegressor and compute the partial dependence plots either or one or two variables at a time.
Training GradientBoostingRegressor... done in 2.607s Test R2 score: 0.63
It appears that tree-based models are naturally robust to monotonic transformations of numerical features.
Note that on this tabular dataset, Gradient Boosting Machines are both significantly faster to train and more accurate than neural networks. It is also significantly cheaper to tune their hyperparameters (the default tend to work well while this is not often the case for neural networks).
Finally, as we will see next, computing partial dependence plots tree-based models is also orders of magnitude faster making it cheap to compute partial dependence plots for pairs of interacting features:
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | |
---|---|---|---|---|---|---|---|
156 | 3.40 | 9.68 | 10.37 | 5.55 | 2.77 | 3.18 | 6.23 |
86 | 4.94 | 7.60 | 8.29 | 0.00 | 1.39 | 2.08 | 3.69 |
100 | 5.31 | 6.91 | 7.60 | 0.00 | 0.00 | 1.61 | 3.18 |
38 | 3.91 | 7.60 | 8.99 | 2.08 | 0.00 | 1.61 | 4.26 |
24 | 5.77 | 4.85 | 8.70 | 0.00 | 0.00 | 2.48 | 3.14 |
Training Gradient Boosting Regressor... Iter Train Loss Remaining Time 1 11113.1373 49.40s 2 11073.4339 29.68s 3 11032.0866 19.79s 4 10991.3328 17.40s 5 10955.7716 15.91s 6 10917.2051 13.26s 7 10876.5504 12.79s 8 10841.3320 11.19s 9 10803.1316 11.05s 10 10763.6635 9.95s 20 10376.6993 6.46s 30 10015.3743 5.63s 40 9665.3099 4.96s 50 9322.4610 4.75s 60 8995.4775 4.62s 70 8678.8888 4.38s 80 8376.5406 4.32s 90 8082.9441 4.17s 100 7805.7378 4.14s 200 5519.5250 3.76s 300 3954.3375 3.48s 400 2881.5316 3.35s 500 2149.5985 3.26s 600 1648.4570 3.19s 700 1304.1813 3.14s 800 1068.9743 3.09s 900 906.0326 3.03s 1000 793.3602 2.97s 2000 496.1013 2.61s 3000 453.8973 2.32s 4000 435.4178 2.10s 5000 426.0387 1.78s 6000 421.1400 1.43s 7000 418.6041 1.08s 8000 417.2622 0.72s 9000 416.5002 0.36s 10000 416.0655 0.00s done in 3.666s Test R2 score: 0.63 Predict R2 score: 1.00
Print out the mean absolute error (mae) with initial log transform.:
Mean Absolute Error: 49.85 s. Accuracy: 89.55 %.
without log transform.: Mean Absolute Error: 125.2 s. Accuracy: 49.3 %.
array([0.02, 0.13, 0.37, 0.05, 0.09, 0.01, 0.33])
GradientBoostingRegressor(learning_rate=0.002, max_depth=19, max_features=0.5, min_samples_leaf=4, n_estimators=10000, random_state=1000, verbose=1)
n_estimators : the number of boosting stages that will be performed. Later, we will plot deviance against boosting iterations.
max_depth : limits the number of nodes in the tree. The best value depends on the interaction of the input variables.
min_samples_split : the minimum number of samples required to split an internal node.
learning_rate : how much the contribution of each tree will shrink.
loss : loss function to optimize. The least squares function is used in this case however, there are many other options (see GradientBoostingRegressor ).
Next, we will split our dataset to use 90% for training and leave the rest for testing. We will also set the regression model parameters.
just taking my own parameters without an extra train split...
Now we will initiate the gradient boosting regressors and fit it with our training data. Let’s also look and the mean squared error on the test data.
Or I use my regressor that can set with extra params.
The mean squared error (MSE) on test set: 30006.4254
Careful, impurity-based feature importances can be misleading for high cardinality features (many unique values). As an alternative, the permutation importances of reg can be computed on a held out test set. See Permutation feature importance for more details.
For this example, the impurity-based and permutation methods identify the same 2 strongly predictive features but not in the same order. The third most predictive feature, “bp”, is also the same for the 2 methods. The remaining features are less predictive and the error bars of the permutation plot show that they overlap with 0.
array([5, 3, 4, 1, 0, 2, 6], dtype=int64)
Only 6 features: minimum memory size and the number of channels are toppers.
6 features plus PRP: minimum and max. memory size are chosen to be a good indicator to achieve higher cpu performance ratings.
The decision tree is a simple machine learning model for getting started with regression tasks.
Background
A decision tree is a flow-chart-like structure, where each internal (non-leaf) node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the root node. (see here for more details).
DecisionTreeRegressor(max_depth=5)
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | |
---|---|---|---|---|---|---|---|
30 | 25 | 1310 | 2620 | 131 | 12 | 24 | 274 |
171 | 200 | 1000 | 4000 | 0 | 1 | 4 | 30 |
84 | 330 | 1000 | 4000 | 0 | 3 | 6 | 22 |
198 | 30 | 8000 | 64000 | 96 | 12 | 176 | 915 |
60 | 800 | 256 | 8000 | 0 | 1 | 4 | 16 |
DecisionTreeRegressor(max_depth=100)
We'll plot the predicted performance rate as function of cycle time, maximum memory and minimum memory.
Let’s evaluate the regressor on a grid of points where cycle time and minimum memory are combined:
After that, these two lists are combined using meshgrid, which generates a grid of all the combinations of the values. Finally, the result is passed to the regressor.
The brown horizontal bar comes from a fast cpu with a low minimum memory, perhaps a transition model with a broader hardware compatibility. This led me to review the selection of the 2 pr. components:
the predicted performance rate as function of maximum memory and minimum memory.
This metric compares N predictions with the target value, returning an average. It is already implemented in sklearn:
20.11374658061775
18.9739263803681
We can visualize the Decision Tree using "import tree"
2.2.0
As Tensorflow migrated to 2.0 and moved on to 2.1, 2.2 and 2.3, many changes in the code happened. The TF / regress. code I had, could still run on v. 2.1, but no longer on 2.2. Thus I had to find a recent implementation, and so I found a nice one written by Mr. Sina.
Vendor | Model | MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | ERP | |
---|---|---|---|---|---|---|---|---|---|---|
199 | sperry | 1100/94 | 30 | 8000 | 64000 | 128 | 12 | 176 | 1150 | 978 |
200 | sperry | 80/3 | 180 | 262 | 4000 | 0 | 1 | 3 | 12 | 24 |
201 | sperry | 80/4 | 180 | 512 | 4000 | 0 | 1 | 3 | 14 | 24 |
202 | sperry | 80/5 | 180 | 262 | 4000 | 0 | 1 | 3 | 18 | 24 |
203 | sperry | 80/6 | 180 | 512 | 4000 | 0 | 1 | 3 | 21 | 24 |
204 | sperry | 80/8 | 124 | 1000 | 8000 | 0 | 1 | 8 | 42 | 37 |
205 | sperry | 90/80-model-3 | 98 | 1000 | 8000 | 32 | 2 | 8 | 46 | 50 |
206 | sratus | 32 | 125 | 2000 | 8000 | 0 | 2 | 14 | 52 | 41 |
207 | wang | vs-100 | 480 | 512 | 8000 | 32 | 0 | 0 | 67 | 47 |
208 | wang | vs-90 | 480 | 1000 | 4000 | 0 | 0 | 0 | 45 | 25 |
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | |
---|---|---|---|---|---|---|---|
0 | 125 | 256 | 6000 | 256 | 16 | 128 | 198 |
The last 9 cpu's are too poor and disturb the fitting on the high end.
Split data into train/test
Target = trainDataset['PRP']
Model compiling settings
Add a mechanism that stops training if the validation loss is not improving for more than n_idle_epochs.
Epoch 0, loss 20575.75, val_loss 4381.52, mae 71.57, val_mae 35.67, mse 20575.75, val_mse 4381.52 Epoch 100, loss 5661.70, val_loss 2375.28, mae 46.31, val_mae 36.11, mse 5661.70, val_mse 2375.28 Epoch 200, loss 4306.03, val_loss 1967.21, mae 35.08, val_mae 33.57, mse 4306.03, val_mse 1967.21 Epoch 300, loss 3211.29, val_loss 1569.37, mae 35.06, val_mae 29.72, mse 3211.29, val_mse 1569.37 Epoch 400, loss 3474.19, val_loss 2782.59, mae 34.49, val_mae 37.30, mse 3474.19, val_mse 2782.59
The fit.model returns a history object (a callback) for each model. This object stores useful information that we desire to extract and visualize. Let’s explore what is inside history:
which are the training and validation losses. Let’s visualize the MAE loss for training and validation with the code below:
Mean Absolute Error: 23.22691581447919 Mean Squared Error: 1201.2346470377404 Root Mean Squared Error: 34.65883216494376
Mean Absolute Percentage Error (MAPE): 97.39 Accuracy: 2.61
0.973850652990251
checkpoints: []
I could not reach a descent accuracy with the original features and ERP as target, so I added a layer with a logarithmic conversion.
Vendor | Model | MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | ERP | |
---|---|---|---|---|---|---|---|---|---|---|
199 | sperry | 1100/94 | 30 | 8000 | 64000 | 128 | 12 | 176 | 1150 | 978 |
200 | sperry | 80/3 | 180 | 262 | 4000 | 0 | 1 | 3 | 12 | 24 |
201 | sperry | 80/4 | 180 | 512 | 4000 | 0 | 1 | 3 | 14 | 24 |
202 | sperry | 80/5 | 180 | 262 | 4000 | 0 | 1 | 3 | 18 | 24 |
203 | sperry | 80/6 | 180 | 512 | 4000 | 0 | 1 | 3 | 21 | 24 |
204 | sperry | 80/8 | 124 | 1000 | 8000 | 0 | 1 | 8 | 42 | 37 |
205 | sperry | 90/80-model-3 | 98 | 1000 | 8000 | 32 | 2 | 8 | 46 | 50 |
206 | sratus | 32 | 125 | 2000 | 8000 | 0 | 2 | 14 | 52 | 41 |
207 | wang | vs-100 | 480 | 512 | 8000 | 32 | 0 | 0 | 67 | 47 |
208 | wang | vs-90 | 480 | 1000 | 4000 | 0 | 0 | 0 | 45 | 25 |
Let's get the logarithm of dataframe
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | ERP | |
---|---|---|---|---|---|---|---|
0 | 4.83 | 5.55 | 8.70 | 5.55 | 2.77 | 4.85 | 5.29 |
1 | 3.37 | 8.99 | 10.37 | 3.47 | 2.08 | 3.47 | 5.53 |
2 | 3.37 | 8.99 | 10.37 | 3.47 | 2.08 | 3.47 | 5.53 |
3 | 3.37 | 8.99 | 10.37 | 3.47 | 2.08 | 3.47 | 5.53 |
4 | 3.37 | 8.99 | 9.68 | 3.47 | 2.08 | 2.77 | 4.88 |
... | ... | ... | ... | ... | ... | ... | ... |
204 | 4.82 | 6.91 | 8.99 | 0.00 | 0.00 | 2.08 | 3.61 |
205 | 4.58 | 6.91 | 8.99 | 3.47 | 0.69 | 2.08 | 3.91 |
206 | 4.83 | 7.60 | 8.99 | 0.00 | 0.69 | 2.64 | 3.71 |
207 | 6.17 | 6.24 | 8.99 | 3.47 | 0.00 | 0.00 | 3.85 |
208 | 6.17 | 6.91 | 8.29 | 0.00 | 0.00 | 0.00 | 3.22 |
209 rows × 7 columns
Split data into train/test
Target = trainDataset['ERP']
Model: "sequential_5" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= batch_normalization_3 (Batch (None, 6) 24 _________________________________________________________________ dense_12 (Dense) (None, 32) 224 _________________________________________________________________ dense_13 (Dense) (None, 72) 2376 _________________________________________________________________ dropout_3 (Dropout) (None, 72) 0 _________________________________________________________________ dense_14 (Dense) (None, 32) 2336 _________________________________________________________________ dense_15 (Dense) (None, 8) 264 _________________________________________________________________ out (Dense) (None, 1) 9 ================================================================= Total params: 5,233 Trainable params: 5,221 Non-trainable params: 12 _________________________________________________________________
Model compiling settings
Add a mechanism that stops training if the validation loss is not improving for more than n_idle_epochs.
Epoch 0, loss 19.32, val_loss 11.10, mae 4.26, val_mae 3.25, mse 19.32, val_mse 11.10 Epoch 100, loss 0.10, val_loss 0.03, mae 0.26, val_mae 0.15, mse 0.10, val_mse 0.03 Epoch 200, loss 0.18, val_loss 0.01, mae 0.34, val_mae 0.07, mse 0.18, val_mse 0.01 Epoch 300, loss 0.05, val_loss 0.00, mae 0.17, val_mae 0.05, mse 0.05, val_mse 0.00 Epoch 400, loss 0.10, val_loss 0.01, mae 0.23, val_mae 0.05, mse 0.10, val_mse 0.01
The fit.model returns a history object (a callback) for each model. This object stores useful information that we desire to extract and visualize. Let’s explore what is inside history:
keys: dict_keys(['loss', 'mae', 'mse', 'val_loss', 'val_mae', 'val_mse'])
which are the training and validation losses. Let’s visualize the MAE loss for training and validation with the code below:
(42,)
array([5.53, 4.88, 6.62, ..., 5.42, 3.71, 3.85])
Mean Absolute Error: 0.060863911367997264 Mean Squared Error: 0.008955683565599743 Root Mean Squared Error: 0.09463447345232996
Mean Absolute Percentage Error (MAPE): 100.19 Accuracy: -0.19
Mean Absolute Percentage Error (MAPE): 108.92 Accuracy: -8.92
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | MMIN*MMAX | MMIN*CHMIN | MMIN*CHMAX | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 125 | 256 | 6000 | 256 | 16 | 128 | 198 | 1536000 | 4096 | 32768 | 1536000 | 96000 | 768000 |
1 | 29 | 8000 | 32000 | 32 | 8 | 32 | 269 | 256000000 | 64000 | 256000 | 1024000 | 256000 | 1024000 |
2 | 29 | 8000 | 32000 | 32 | 8 | 32 | 220 | 256000000 | 64000 | 256000 | 1024000 | 256000 | 1024000 |
3 | 29 | 8000 | 32000 | 32 | 8 | 32 | 172 | 256000000 | 64000 | 256000 | 1024000 | 256000 | 1024000 |
4 | 29 | 8000 | 16000 | 32 | 8 | 16 | 132 | 128000000 | 64000 | 128000 | 512000 | 128000 | 256000 |
5 | 26 | 8000 | 32000 | 64 | 8 | 32 | 318 | 256000000 | 64000 | 256000 | 2048000 | 256000 | 1024000 |
6 | 23 | 16000 | 32000 | 64 | 16 | 32 | 367 | 512000000 | 256000 | 512000 | 2048000 | 512000 | 1024000 |
7 | 23 | 16000 | 32000 | 64 | 16 | 32 | 489 | 512000000 | 256000 | 512000 | 2048000 | 512000 | 1024000 |
8 | 23 | 16000 | 64000 | 64 | 16 | 32 | 636 | 1024000000 | 256000 | 512000 | 4096000 | 1024000 | 2048000 |
9 | 23 | 32000 | 64000 | 128 | 32 | 64 | 1144 | 2048000000 | 1024000 | 2048000 | 8192000 | 2048000 | 4096000 |
10 | 400 | 1000 | 3000 | 0 | 1 | 2 | 38 | 3000000 | 1000 | 2000 | 0 | 3000 | 6000 |
11 | 400 | 512 | 3500 | 4 | 1 | 6 | 40 | 1792000 | 512 | 3072 | 14000 | 3500 | 21000 |
12 | 60 | 2000 | 8000 | 65 | 1 | 8 | 92 | 16000000 | 2000 | 16000 | 520000 | 8000 | 64000 |
13 | 50 | 4000 | 16000 | 65 | 1 | 8 | 138 | 64000000 | 4000 | 32000 | 1040000 | 16000 | 128000 |
15 | 200 | 512 | 16000 | 0 | 4 | 32 | 35 | 8192000 | 2048 | 16384 | 0 | 64000 | 512000 |
(207, 13)
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | MMIN*MMAX | MMIN*CHMIN | MMIN*CHMAX | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 125 | 256 | 6000 | 256 | 16 | 128 | 198 | 1536000 | 4096 | 32768 | 1536000 | 96000 | 768000 |
Why wasn't the StandardScaler used in previous example?
Split data into train/test
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | MMIN*MMAX | MMIN*CHMIN | MMIN*CHMAX | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
164 | 56 | 4000 | 16000 | 0 | 1 | 8 | 46 | 64000000 | 4000 | 32000 | 0 | 16000 | 128000 |
190 | 26 | 8000 | 24000 | 32 | 8 | 16 | 173 | 192000000 | 64000 | 128000 | 768000 | 192000 | 384000 |
60 | 800 | 256 | 8000 | 0 | 1 | 4 | 16 | 2048000 | 256 | 1024 | 0 | 8000 | 32000 |
82 | 300 | 1000 | 16000 | 8 | 2 | 112 | 38 | 16000000 | 2000 | 112000 | 128000 | 32000 | 1792000 |
191 | 26 | 8000 | 32000 | 64 | 12 | 16 | 248 | 256000000 | 96000 | 128000 | 2048000 | 384000 | 512000 |
65 | 75 | 2000 | 16000 | 128 | 1 | 38 | 259 | 32000000 | 2000 | 76000 | 2048000 | 16000 | 608000 |
163 | 56 | 4000 | 12000 | 0 | 1 | 8 | 42 | 48000000 | 4000 | 32000 | 0 | 12000 | 96000 |
125 | 50 | 2000 | 4000 | 0 | 3 | 6 | 27 | 8000000 | 6000 | 12000 | 0 | 12000 | 24000 |
176 | 160 | 512 | 4000 | 2 | 1 | 5 | 30 | 2048000 | 512 | 2560 | 8000 | 4000 | 20000 |
137 | 150 | 512 | 4000 | 0 | 8 | 128 | 30 | 2048000 | 4096 | 65536 | 0 | 32000 | 512000 |
89 | 140 | 2000 | 8000 | 32 | 1 | 54 | 66 | 16000000 | 2000 | 108000 | 256000 | 8000 | 432000 |
199 | 30 | 8000 | 64000 | 128 | 12 | 176 | 1150 | 512000000 | 96000 | 1408000 | 8192000 | 768000 | 11264000 |
113 | 225 | 2000 | 4000 | 8 | 3 | 6 | 34 | 8000000 | 6000 | 12000 | 32000 | 12000 | 24000 |
42 | 50 | 2000 | 16000 | 8 | 3 | 6 | 52 | 32000000 | 6000 | 12000 | 128000 | 48000 | 96000 |
75 | 300 | 768 | 12000 | 6 | 6 | 24 | 50 | 9216000 | 4608 | 18432 | 72000 | 72000 | 288000 |
Split off the target = trainDataset['PRP']
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | MMIN*MMAX | MMIN*CHMIN | MMIN*CHMAX | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
164 | 56 | 4000 | 16000 | 0 | 1 | 8 | 64000000 | 4000 | 32000 | 0 | 16000 | 128000 |
190 | 26 | 8000 | 24000 | 32 | 8 | 16 | 192000000 | 64000 | 128000 | 768000 | 192000 | 384000 |
61 | 800 | 256 | 8000 | 0 | 1 | 4 | 2048000 | 256 | 1024 | 0 | 8000 | 32000 |
83 | 330 | 1000 | 2000 | 0 | 1 | 2 | 2000000 | 1000 | 2000 | 0 | 2000 | 4000 |
191 | 26 | 8000 | 32000 | 64 | 12 | 16 | 256000000 | 96000 | 128000 | 2048000 | 384000 | 512000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
137 | 150 | 512 | 4000 | 0 | 8 | 128 | 2048000 | 4096 | 65536 | 0 | 32000 | 512000 |
206 | 125 | 2000 | 8000 | 0 | 2 | 14 | 16000000 | 4000 | 28000 | 0 | 16000 | 112000 |
93 | 57 | 4000 | 16000 | 1 | 6 | 12 | 64000000 | 24000 | 48000 | 16000 | 96000 | 192000 |
4 | 29 | 8000 | 16000 | 32 | 8 | 16 | 128000000 | 64000 | 128000 | 512000 | 128000 | 256000 |
102 | 1100 | 512 | 1500 | 0 | 1 | 1 | 768000 | 512 | 512 | 0 | 1500 | 1500 |
176 rows × 12 columns
164 46 190 173 60 16 82 38 191 248 ... 136 65 206 52 92 22 4 132 101 45 Name: PRP, Length: 177, dtype: int64
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | PRP | MMIN*MMAX | MMIN*CHMIN | MMIN*CHMAX | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
164 | 56 | 4000 | 16000 | 0 | 1 | 8 | 46 | 64000000 | 4000 | 32000 | 0 | 16000 | 128000 |
190 | 26 | 8000 | 24000 | 32 | 8 | 16 | 173 | 192000000 | 64000 | 128000 | 768000 | 192000 | 384000 |
61 | 800 | 256 | 8000 | 0 | 1 | 4 | 22 | 2048000 | 256 | 1024 | 0 | 8000 | 32000 |
83 | 330 | 1000 | 2000 | 0 | 1 | 2 | 16 | 2000000 | 1000 | 2000 | 0 | 2000 | 4000 |
191 | 26 | 8000 | 32000 | 64 | 12 | 16 | 248 | 256000000 | 96000 | 128000 | 2048000 | 384000 | 512000 |
8 636 15 35 25 69 35 208 54 26 ... 156 510 160 21 177 32 181 6 207 67 Name: PRP, Length: 31, dtype: int64
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | MMIN*MMAX | MMIN*CHMIN | MMIN*CHMAX | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | 23 | 32000 | 64000 | 128 | 32 | 64 | 2048000000 | 1024000 | 2048000 | 8192000 | 2048000 | 4096000 |
16 | 167 | 524 | 2000 | 8 | 4 | 15 | 1048000 | 2096 | 7860 | 16000 | 8000 | 30000 |
26 | 320 | 256 | 6000 | 0 | 1 | 6 | 1536000 | 256 | 1536 | 0 | 6000 | 36000 |
36 | 50 | 500 | 2000 | 8 | 1 | 4 | 1000000 | 500 | 2000 | 16000 | 2000 | 8000 |
55 | 110 | 1000 | 12000 | 16 | 1 | 2 | 12000000 | 1000 | 2000 | 192000 | 12000 | 24000 |
MYCT | MMIN | MMAX | CACH | CHMIN | CHMAX | MMIN*MMAX | MMIN*CHMIN | MMIN*CHMAX | MMAX*CACH | MMAX*CHMIN | MMAX*CHMAX | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
164 | 56 | 4000 | 16000 | 0 | 1 | 8 | 64000000 | 4000 | 32000 | 0 | 16000 | 128000 |
190 | 26 | 8000 | 24000 | 32 | 8 | 16 | 192000000 | 64000 | 128000 | 768000 | 192000 | 384000 |
61 | 800 | 256 | 8000 | 0 | 1 | 4 | 2048000 | 256 | 1024 | 0 | 8000 | 32000 |
83 | 330 | 1000 | 2000 | 0 | 1 | 2 | 2000000 | 1000 | 2000 | 0 | 2000 | 4000 |
191 | 26 | 8000 | 32000 | 64 | 12 | 16 | 256000000 | 96000 | 128000 | 2048000 | 384000 | 512000 |
(176, 12) (176,) (31, 12) (31,)
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
Model: "sequential_23" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_74 (Dense) (None, 24) 312 _________________________________________________________________ batch_normalization_4 (Batch (None, 24) 96 _________________________________________________________________ dense_75 (Dense) (None, 64) 1600 _________________________________________________________________ dropout_12 (Dropout) (None, 64) 0 _________________________________________________________________ dense_76 (Dense) (None, 64) 4160 _________________________________________________________________ dense_77 (Dense) (None, 15) 975 _________________________________________________________________ out (Dense) (None, 1) 16 ================================================================= Total params: 7,159 Trainable params: 7,111 Non-trainable params: 48 _________________________________________________________________
Model compiling settings
Add a mechanism that stops training if the validation loss is not improving for more than n_idle_epochs.
Epoch 0, loss 4486.89, val_loss 630812047310848.00, mae 41.13, val_mae 4833605.50, mse 4486.89, val_mse 630812047310848.00 Epoch 100, loss 9252.69, val_loss 506061802962944.00, mae 48.49, val_mae 4329354.50, mse 9252.69, val_mse 506061802962944.00 Epoch 200, loss 3186.86, val_loss 538540412764160.00, mae 30.34, val_mae 4466121.00, mse 3186.86, val_mse 538540412764160.00 Epoch 300, loss 3265.62, val_loss 461747202818048.00, mae 32.80, val_mae 4135459.00, mse 3265.62, val_mse 461747202818048.00 Epoch 400, loss 2510.48, val_loss 307521235451904.00, mae 30.61, val_mae 3374891.75, mse 2510.48, val_mse 307521235451904.00 Epoch 500, loss 4945.82, val_loss 120401573183488.00, mae 38.67, val_mae 2111740.50, mse 4945.82, val_mse 120401573183488.00 Epoch 600, loss 4094.86, val_loss 245210219020288.00, mae 35.97, val_mae 3013642.00, mse 4094.86, val_mse 245210219020288.00 Epoch 700, loss 3158.92, val_loss 243132964798464.00, mae 30.36, val_mae 3000852.25, mse 3158.92, val_mse 243132964798464.00 Epoch 800, loss 5938.23, val_loss 270668771885056.00, mae 38.92, val_mae 3166221.00, mse 5938.23, val_mse 270668771885056.00 Epoch 900, loss 3791.64, val_loss 333104912793600.00, mae 38.05, val_mae 3512465.00, mse 3791.64, val_mse 333104912793600.00
history.history
The fit.model returns a history object (a callback) for each model. This object stores useful information that we desire to extract and visualize. Let’s explore what is inside history:
keys: dict_keys(['loss', 'mae', 'mse', 'val_loss', 'val_mae', 'val_mse'])
which are the training and validation losses. Let’s visualize the MAE loss for training and validation with the code below:
9 1144 16 19 26 33 36 20 55 60 ... 157 8 161 24 178 38 182 11 207 67 Name: PRP, Length: 31, dtype: int64