Scikit - SGDRegressor doesn't fit

3

Hello I am trying to fit a small set of data using scilearn.

import numpy as np
from sklearn import linear_model, model_selection

X = np.array([[86.5999984741211,    9.10000038146973,   14.3000001907349,1],
            [66.9000015258789,  17.3999996185303,   11.5,1],
            [66.3000030517578,  20              ,   10.6999998092651,1],
            [78.6999969482422,  15.3999996185303,   12.1000003814697,1],
            [76.1999969482422,  18.2000007629395,   12.5,1],
            [84.4000015258789,  9.89999961853027,   12.1000003814697,1],
            [79.1999969482422,  8.5             ,   10.1000003814697,1],
            [77.5           ,   10.1999998092651,   11.3999996185303,1],
            [74.4000015258789,  17.7999992370605,   10.6000003814697,1],
            [870.9000015258789, 13.5            ,   13,1],
            [80.0999984741211,  8               ,   9.10000038146973,1],
            [80.0999984741211,  10.3000001907349,   9,1],
            [79.6999969482422,  13.1000003814697,   9.5,1],
            [76.1999969482422,  13.6000003814697,   11.5,1],
            [75.5999984741211,  12.1999998092651,   10.8000001907349,1],
            [81.3000030517578,  13.1000003814697,   9.89999961853027,1],
            [64.5999984741211,  20.3999996185303,   10.6000003814697,1],
            [68.3000030517578,  26.3999996185303,   14.8999996185303,1],
            [80             ,   10.6999998092651,   10.8999996185303,1],
            [78.4000015258789,  9.69999980926514,   12,1],
            [78.8000030517578,  10.6999998092651,   10.6000003814697,1],
            [76.8000030517578,  15.3999996185303,   13,1],
            [82.4000015258789,  11.6000003814697,   9.89999961853027,1],
            [73.9000015258789,  16.1000003814697,   10.8999996185303,1],
            [64.3000030517578,  24.7000007629395,   14.6999998092651,1],
            [81             ,   14.8999996185303,   10.8000001907349,1],
            [70             ,   14.3999996185303,   11.1000003814697,1],
            [76.6999969482422,  11.1999998092651,   8.39999961853027,1],
            [81.8000030517578,  10.3000001907349,   9.39999961853027,1],
            [82.1999969482422,  9.89999961853027,   9.19999980926514,1],
            [76.6999969482422,  10.8999996185303,   9.60000038146973,1],
            [75.0999984741211,  17.3999996185303,   13.8000001907349,1],
            [78.8000030517578,  9.80000019073486,   12.3999996185303,1],
            [74.8000030517578,  16.3999996185303,   12.6999998092651,1],
            [75.6999969482422,  13              ,   11.3999996185303,1],
            [74.5999984741211,  19.8999996185303,   11.1000003814697,1],
            [81.5           ,   11.8000001907349,   11.3000001907349,1],
            [74.6999969482422,  13.1999998092651,   9.60000038146973,1],
            [72             ,   11.1999998092651,   10.8000001907349,1],
            [68.3000030517578,  18.7000007629395,   12.3000001907349,1],
            [77.0999984741211,  14.1999998092651,   9.39999961853027,1],
            [67.0999984741211,  19.6000003814697,   11.1999998092651,1],
            [72.0999984741211,  17.3999996185303,   11.8000001907349,1],
            [85.0999984741211,  10.6999998092651,   10,1],
            [75.1999969482422,  9.69999980926514,   10.3000001907349,1],
            [80.8000030517578,  10              ,   11,1],
            [83.8000030517578,  12.1000003814697,   11.6999998092651,1],
            [78.5999984741211,  12.6000003814697,   10.3999996185303,1],
            [66             ,   22.2000007629395,   9.39999961853027,1],
            [83             ,   13.3000001907349,   10.8000001907349,1],
            [73.0999984741211,  26.3999996185303,   22.1000003814697,1]])

y = np.array([761,
            780,
            593,
            715,
            1078,
            567,
            456,
            686,
            1206,
            723,
            261,
            326,
            282,
            960,
            489,
            496,
            463,
            1062,
            805,
            998,
            126,
            792,
            327,
            744,
            434,
            178,
            679,
            82,
            339,
            138,
            627,
            930,
            875,
            1074,
            504,
            635,
            503,
            418,
            402,
            1023,
            208,
            766,
            762,
            301,
            372,
            114,
            515,
            264,
            208,
            286,
            2922])

model = linear_model.SGDRegressor(max_iter=0x7FFFFFFF, tol=1e-12, learning_rate="constant", eta0=.1, shuffle=False)
"""model = linear_model.Lasso(max_iter=0x7FFFFFF,tol=1e-12)"""

model.fit(X,y)
print(model.coef_)

print (model.score(X,y))
"""
for i in range(0,len(X)):
    print (np.dot(X[i],model.coef_))"""

Ridge/Lasso/ElasticNet fits to a certain degree(~0.7 score), but even though i set a ultra high iteration and ultra low tol values, SGDRegressor cant even come close fitting these values.

Tuning max_iter or tol doesn't have any effect on the outcome, I keep getting huge coefficients.

python
machine-learning
scikit-learn
linear-regression
asked on Stack Overflow May 27, 2018 by omerfirmak • edited Feb 24, 2019 by Cœur

1 Answer

2

Before applying gradient descent techniques you need to make sure your features are scaled. Looking at your X, this should solve the problem.

from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)
answered on Stack Overflow May 27, 2018 by Jan K

User contributions licensed under CC BY-SA 3.0