Summary


The first model was a binary classifier using logistic regression. The model yielded predictive accuracy of approximately .664 on the testing data. After scaling the X values (using MinMaxScaler), the model yielded a far-improved predictive accuracy of .837. A Support Vector Machine (SVM) did better, with an accuracy score of .847. After tuning hyperparameters C and Gamma using Gridsearch, the best model I tested had C=10, and Gamma = .01, with a predictive accuracy of .870.


A deep-learning model, using the adam optimizer, and a categorical crossentropy loss function, with 100 inut layers, 100 intermediate layers, and 3 output layers was trained, and subsequently achieved a predictive accuracy score of .876 on the test data.

Solution


In [1]:
import pandas as pd

Read the CSV and Perform Basic Data Cleaning

In [2]:
df = pd.read_csv("cumulative.csv")
df = df.drop(columns=["rowid", "kepid", "kepoi_name", "kepler_name", "koi_pdisposition", "koi_score", "koi_tce_delivname"])
# Drop the null columns where all values are null
df = df.dropna(axis='columns', how='all')
# Drop the null rows
df = df.dropna()
df.head()
Out[2]:
koi_disposition koi_fpflag_nt koi_fpflag_ss koi_fpflag_co koi_fpflag_ec koi_period koi_period_err1 koi_period_err2 koi_time0bk koi_time0bk_err1 ... koi_steff_err2 koi_slogg koi_slogg_err1 koi_slogg_err2 koi_srad koi_srad_err1 koi_srad_err2 ra dec koi_kepmag
0 CONFIRMED 0 0 0 0 9.488036 2.775000e-05 -2.775000e-05 170.538750 0.002160 ... -81.0 4.467 0.064 -0.096 0.927 0.105 -0.061 291.93423 48.141651 15.347
1 CONFIRMED 0 0 0 0 54.418383 2.479000e-04 -2.479000e-04 162.513840 0.003520 ... -81.0 4.467 0.064 -0.096 0.927 0.105 -0.061 291.93423 48.141651 15.347
2 FALSE POSITIVE 0 1 0 0 19.899140 1.494000e-05 -1.494000e-05 175.850252 0.000581 ... -176.0 4.544 0.044 -0.176 0.868 0.233 -0.078 297.00482 48.134129 15.436
3 FALSE POSITIVE 0 1 0 0 1.736952 2.630000e-07 -2.630000e-07 170.307565 0.000115 ... -174.0 4.564 0.053 -0.168 0.791 0.201 -0.067 285.53461 48.285210 15.597
4 CONFIRMED 0 0 0 0 2.525592 3.761000e-06 -3.761000e-06 171.595550 0.001130 ... -211.0 4.438 0.070 -0.210 1.046 0.334 -0.133 288.75488 48.226200 15.509

5 rows × 41 columns

Create a Train Test Split

Use koi_disposition for the y values

In [3]:
y = df["koi_disposition"]
X = df.drop(columns=["koi_disposition"])
y.head()
Out[3]:
0         CONFIRMED
1         CONFIRMED
2    FALSE POSITIVE
3    FALSE POSITIVE
4         CONFIRMED
Name: koi_disposition, dtype: object
In [4]:
X.head()
Out[4]:
koi_fpflag_nt koi_fpflag_ss koi_fpflag_co koi_fpflag_ec koi_period koi_period_err1 koi_period_err2 koi_time0bk koi_time0bk_err1 koi_time0bk_err2 ... koi_steff_err2 koi_slogg koi_slogg_err1 koi_slogg_err2 koi_srad koi_srad_err1 koi_srad_err2 ra dec koi_kepmag
0 0 0 0 0 9.488036 2.775000e-05 -2.775000e-05 170.538750 0.002160 -0.002160 ... -81.0 4.467 0.064 -0.096 0.927 0.105 -0.061 291.93423 48.141651 15.347
1 0 0 0 0 54.418383 2.479000e-04 -2.479000e-04 162.513840 0.003520 -0.003520 ... -81.0 4.467 0.064 -0.096 0.927 0.105 -0.061 291.93423 48.141651 15.347
2 0 1 0 0 19.899140 1.494000e-05 -1.494000e-05 175.850252 0.000581 -0.000581 ... -176.0 4.544 0.044 -0.176 0.868 0.233 -0.078 297.00482 48.134129 15.436
3 0 1 0 0 1.736952 2.630000e-07 -2.630000e-07 170.307565 0.000115 -0.000115 ... -174.0 4.564 0.053 -0.168 0.791 0.201 -0.067 285.53461 48.285210 15.597
4 0 0 0 0 2.525592 3.761000e-06 -3.761000e-06 171.595550 0.001130 -0.001130 ... -211.0 4.438 0.070 -0.210 1.046 0.334 -0.133 288.75488 48.226200 15.509

5 rows × 40 columns

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)
In [6]:
X_train.head()
Out[6]:
koi_fpflag_nt koi_fpflag_ss koi_fpflag_co koi_fpflag_ec koi_period koi_period_err1 koi_period_err2 koi_time0bk koi_time0bk_err1 koi_time0bk_err2 ... koi_steff_err2 koi_slogg koi_slogg_err1 koi_slogg_err2 koi_srad koi_srad_err1 koi_srad_err2 ra dec koi_kepmag
3206 0 1 1 0 31.805143 0.000042 -0.000042 186.38905 0.00105 -0.00105 ... -161.0 4.545 0.044 -0.176 0.863 0.217 -0.072 298.04453 40.086361 14.517
3954 0 0 0 0 24.560711 0.000375 -0.000375 146.79380 0.01610 -0.01610 ... -135.0 4.192 0.137 -0.125 1.499 0.272 -0.245 289.92145 46.744560 12.805
1410 0 0 0 0 7.560522 0.000026 -0.000026 134.47889 0.00270 -0.00270 ... -211.0 4.503 0.052 -0.208 0.940 0.282 -0.094 283.84515 44.609089 15.986
5865 0 0 0 0 4.644901 0.000038 -0.000038 133.67436 0.00786 -0.00786 ... -85.0 4.540 0.052 -0.017 0.770 0.027 -0.046 282.34305 48.340778 14.480
340 0 0 1 1 2.037441 0.000010 -0.000010 133.59962 0.00390 -0.00390 ... -207.0 4.415 0.087 -0.203 1.015 0.312 -0.134 295.79526 47.663960 14.187

5 rows × 40 columns

Pre-processing

Scale the data using the MinMaxScaler

In [7]:
from sklearn.preprocessing import MinMaxScaler
# from sklearn.preprocessing import StandardScaler

X_scaler = MinMaxScaler().fit(X_train)

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py:334: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by MinMaxScaler.
return self.partial_fit(X, y)

Test a Logistic Regression on the Scaled Values

In [8]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model
Out[8]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
        intercept_scaling=1, max_iter=100, multi_class='warn',
        n_jobs=None, penalty='l2', random_state=None, solver='warn',
        tol=0.0001, verbose=0, warm_start=False)
In [9]:
model.fit(X_train, y_train)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
"this warning.", FutureWarning)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\svm\base.py:931: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
"the number of iterations.", ConvergenceWarning)
Out[9]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
        intercept_scaling=1, max_iter=100, multi_class='warn',
        n_jobs=None, penalty='l2', random_state=None, solver='warn',
        tol=0.0001, verbose=0, warm_start=False)
In [10]:
print(f"Training Data Score: {model.score(X_train, y_train)}")
print(f"Testing Data Score: {model.score(X_test, y_test)}")
Training Data Score: 0.6590423909728576
Testing Data Score: 0.6642268984446478
In [11]:
model.fit(X_train_scaled, y_train)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
"this warning.", FutureWarning)
Out[11]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
        intercept_scaling=1, max_iter=100, multi_class='warn',
        n_jobs=None, penalty='l2', random_state=None, solver='warn',
        tol=0.0001, verbose=0, warm_start=False)
In [12]:
print(f"Training Data Score: {model.score(X_train_scaled, y_train)}")
print(f"Testing Data Score: {model.score(X_test_scaled, y_test)}")
Training Data Score: 0.845684659957304
Testing Data Score: 0.8371454711802379

Train the Support Vector Machine

In [13]:
from sklearn.svm import SVC 
model2 = SVC(kernel='linear')
model2.fit(X_train_scaled, y_train)
Out[13]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
In [14]:
print(f"Training Data Score: {model2.score(X_train_scaled, y_train)}")
print(f"Testing Data Score: {model2.score(X_test_scaled, y_test)}")
Training Data Score: 0.8508691674290942
Testing Data Score: 0.8472095150960659

Hyperparameter Tuning

Use GridSearchCV to tune the C and gamma parameters

In [17]:
# Create the GridSearchCV model
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [1, 5, 10],
            'gamma': [0.0001, 0.001, 0.01]}
grid = GridSearchCV(model2, param_grid, verbose=3)
In [18]:
# Train the model with GridSearch
grid.fit(X_train_scaled, y_train)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py:2053: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22.
warnings.warn(CV_WARNING, FutureWarning)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[CV] C=1, gamma=0.0001 ...............................................
[CV] ...... C=1, gamma=0.0001, score=0.8395061728395061, total=   0.3s
[CV] C=1, gamma=0.0001 ...............................................
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.6s remaining:    0.0s
[CV] ...... C=1, gamma=0.0001, score=0.8394327538883806, total=   0.3s
[CV] C=1, gamma=0.0001 ...............................................
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    1.2s remaining:    0.0s
[CV] ....... C=1, gamma=0.0001, score=0.854462242562929, total=   0.3s
[CV] C=1, gamma=0.001 ................................................
[CV] ....... C=1, gamma=0.001, score=0.8395061728395061, total=   0.3s
[CV] C=1, gamma=0.001 ................................................
[CV] ....... C=1, gamma=0.001, score=0.8394327538883806, total=   0.3s
[CV] C=1, gamma=0.001 ................................................
[CV] ........ C=1, gamma=0.001, score=0.854462242562929, total=   0.3s
[CV] C=1, gamma=0.01 .................................................
[CV] ........ C=1, gamma=0.01, score=0.8395061728395061, total=   0.3s
[CV] C=1, gamma=0.01 .................................................
[CV] ........ C=1, gamma=0.01, score=0.8394327538883806, total=   0.3s
[CV] C=1, gamma=0.01 .................................................
[CV] ......... C=1, gamma=0.01, score=0.854462242562929, total=   0.3s
[CV] C=5, gamma=0.0001 ...............................................
[CV] ...... C=5, gamma=0.0001, score=0.8587105624142661, total=   0.4s
[CV] C=5, gamma=0.0001 ...............................................
[CV] ...... C=5, gamma=0.0001, score=0.8618481244281794, total=   0.3s
[CV] C=5, gamma=0.0001 ...............................................
[CV] ...... C=5, gamma=0.0001, score=0.8713958810068649, total=   0.3s
[CV] C=5, gamma=0.001 ................................................
[CV] ....... C=5, gamma=0.001, score=0.8587105624142661, total=   0.4s
[CV] C=5, gamma=0.001 ................................................
[CV] ....... C=5, gamma=0.001, score=0.8618481244281794, total=   0.3s
[CV] C=5, gamma=0.001 ................................................
[CV] ....... C=5, gamma=0.001, score=0.8713958810068649, total=   0.3s
[CV] C=5, gamma=0.01 .................................................
[CV] ........ C=5, gamma=0.01, score=0.8587105624142661, total=   0.4s
[CV] C=5, gamma=0.01 .................................................
[CV] ........ C=5, gamma=0.01, score=0.8618481244281794, total=   0.3s
[CV] C=5, gamma=0.01 .................................................
[CV] ........ C=5, gamma=0.01, score=0.8713958810068649, total=   0.3s
[CV] C=10, gamma=0.0001 ..............................................
[CV] ..... C=10, gamma=0.0001, score=0.8664837677183356, total=   0.4s
[CV] C=10, gamma=0.0001 ..............................................
[CV] ..... C=10, gamma=0.0001, score=0.8705397987191217, total=   0.3s
[CV] C=10, gamma=0.0001 ..............................................
[CV] ..... C=10, gamma=0.0001, score=0.8759725400457666, total=   0.3s
[CV] C=10, gamma=0.001 ...............................................
[CV] ...... C=10, gamma=0.001, score=0.8664837677183356, total=   0.4s
[CV] C=10, gamma=0.001 ...............................................
[CV] ...... C=10, gamma=0.001, score=0.8705397987191217, total=   0.3s
[CV] C=10, gamma=0.001 ...............................................
[CV] ...... C=10, gamma=0.001, score=0.8759725400457666, total=   0.3s
[CV] C=10, gamma=0.01 ................................................
[CV] ....... C=10, gamma=0.01, score=0.8664837677183356, total=   0.4s
[CV] C=10, gamma=0.01 ................................................
[CV] ....... C=10, gamma=0.01, score=0.8705397987191217, total=   0.3s
[CV] C=10, gamma=0.01 ................................................
[CV] ....... C=10, gamma=0.01, score=0.8759725400457666, total=   0.3s
[Parallel(n_jobs=1)]: Done  27 out of  27 | elapsed:   17.9s finished
Out[18]:
GridSearchCV(cv='warn', error_score='raise-deprecating',
     estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False),
     fit_params=None, iid='warn', n_jobs=None,
     param_grid={'C': [1, 5, 10], 'gamma': [0.0001, 0.001, 0.01]},
     pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
     scoring=None, verbose=3)
In [20]:
print(grid.best_params_)
print(grid.best_score_)
{'C': 10, 'gamma': 0.0001}
0.8709972552607502
 

Deep Learning Solution

In [1]:
import pandas as pd
df = pd.read_csv("cumulative.csv")
df = df.drop(columns=["rowid", "kepid", "kepoi_name", "kepler_name", "koi_pdisposition", "koi_score", "koi_tce_delivname"])
# Drop the null columns where all values are null
df = df.dropna(axis='columns', how='all')
# Drop the null rows
df = df.dropna()
df.head()
Out[1]:
koi_disposition koi_fpflag_nt koi_fpflag_ss koi_fpflag_co koi_fpflag_ec koi_period koi_period_err1 koi_period_err2 koi_time0bk koi_time0bk_err1 ... koi_steff_err2 koi_slogg koi_slogg_err1 koi_slogg_err2 koi_srad koi_srad_err1 koi_srad_err2 ra dec koi_kepmag
0 CONFIRMED 0 0 0 0 9.488036 2.775000e-05 -2.775000e-05 170.538750 0.002160 ... -81.0 4.467 0.064 -0.096 0.927 0.105 -0.061 291.93423 48.141651 15.347
1 CONFIRMED 0 0 0 0 54.418383 2.479000e-04 -2.479000e-04 162.513840 0.003520 ... -81.0 4.467 0.064 -0.096 0.927 0.105 -0.061 291.93423 48.141651 15.347
2 FALSE POSITIVE 0 1 0 0 19.899140 1.494000e-05 -1.494000e-05 175.850252 0.000581 ... -176.0 4.544 0.044 -0.176 0.868 0.233 -0.078 297.00482 48.134129 15.436
3 FALSE POSITIVE 0 1 0 0 1.736952 2.630000e-07 -2.630000e-07 170.307565 0.000115 ... -174.0 4.564 0.053 -0.168 0.791 0.201 -0.067 285.53461 48.285210 15.597
4 CONFIRMED 0 0 0 0 2.525592 3.761000e-06 -3.761000e-06 171.595550 0.001130 ... -211.0 4.438 0.070 -0.210 1.046 0.334 -0.133 288.75488 48.226200 15.509

5 rows × 41 columns

In [3]:
mask = df["koi_disposition"] == "FALSE POSITIVE"
df.loc[mask, "koi_disposition"] = "False_Positive"
df["koi_disposition"].head()
Out[3]:
0         CONFIRMED
1         CONFIRMED
2    False_Positive
3    False_Positive
4         CONFIRMED
Name: koi_disposition, dtype: object
In [4]:
y = df["koi_disposition"]
X = df.drop(columns=["koi_disposition"])
y.head()
Out[4]:
0         CONFIRMED
1         CONFIRMED
2    False_Positive
3    False_Positive
4         CONFIRMED
Name: koi_disposition, dtype: object
In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)
from sklearn.preprocessing import MinMaxScaler
# from sklearn.preprocessing import StandardScaler

X_scaler = MinMaxScaler().fit(X_train)

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py:334: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by MinMaxScaler.
  return self.partial_fit(X, y)
In [7]:
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
X_scaler = MinMaxScaler().fit(X_train)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
C:\Users\donis\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py:334: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by MinMaxScaler.
  return self.partial_fit(X, y)
In [8]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dense
In [9]:
label_encoder = LabelEncoder()
label_encoder.fit(y_train)
encoded_y_train = label_encoder.transform(y_train)
encoded_y_test = label_encoder.transform(y_test)
y_train_categorical = to_categorical(encoded_y_train)
y_test_categorical = to_categorical(encoded_y_test)
In [14]:
model = Sequential()
model.add(Dense(units=100, activation='relu', input_dim=40))
model.add(Dense(units=100, activation='relu'))
model.add(Dense(units=3, activation='softmax'))
WARNING:tensorflow:From C:\Users\donis\Anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
In [15]:
model.compile(optimizer='adam',
             loss='categorical_crossentropy',
             metrics=['accuracy'])
In [16]:
model.fit(
    X_train_scaled,
    y_train_categorical,
    epochs=100,
    shuffle=True,
    verbose=2
)
WARNING:tensorflow:From C:\Users\donis\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/100
 - 1s - loss: 0.5329 - acc: 0.7386
Epoch 2/100
 - 0s - loss: 0.3686 - acc: 0.8097
Epoch 3/100
 - 0s - loss: 0.3516 - acc: 0.8263
Epoch 4/100
 - 0s - loss: 0.3417 - acc: 0.8339
Epoch 5/100
 - 0s - loss: 0.3317 - acc: 0.8472
Epoch 6/100
 - 0s - loss: 0.3262 - acc: 0.8468
Epoch 7/100
 - 0s - loss: 0.3238 - acc: 0.8464
Epoch 8/100
 - 0s - loss: 0.3179 - acc: 0.8556
Epoch 9/100
 - 0s - loss: 0.3139 - acc: 0.8530
Epoch 10/100
 - 0s - loss: 0.3134 - acc: 0.8559
Epoch 11/100
 - 0s - loss: 0.3007 - acc: 0.8644
Epoch 12/100
 - 0s - loss: 0.3000 - acc: 0.8690
Epoch 13/100
 - 0s - loss: 0.3016 - acc: 0.8620
Epoch 14/100
 - 0s - loss: 0.2944 - acc: 0.8683
Epoch 15/100
 - 1s - loss: 0.2880 - acc: 0.8734
Epoch 16/100
 - 0s - loss: 0.2936 - acc: 0.8687
Epoch 17/100
 - 0s - loss: 0.2860 - acc: 0.8768
Epoch 18/100
 - 0s - loss: 0.2845 - acc: 0.8762
Epoch 19/100
 - 0s - loss: 0.2835 - acc: 0.8759
Epoch 20/100
 - 0s - loss: 0.2802 - acc: 0.8747
Epoch 21/100
 - 0s - loss: 0.2723 - acc: 0.8824
Epoch 22/100
 - 1s - loss: 0.2808 - acc: 0.8762
Epoch 23/100
 - 0s - loss: 0.2805 - acc: 0.8803
Epoch 24/100
 - 0s - loss: 0.2735 - acc: 0.8829
Epoch 25/100
 - 0s - loss: 0.2747 - acc: 0.8769
Epoch 26/100
 - 0s - loss: 0.2691 - acc: 0.8829
Epoch 27/100
 - 0s - loss: 0.2686 - acc: 0.8844
Epoch 28/100
 - 0s - loss: 0.2693 - acc: 0.8838
Epoch 29/100
 - 0s - loss: 0.2674 - acc: 0.8823
Epoch 30/100
 - 0s - loss: 0.2641 - acc: 0.8858
Epoch 31/100
 - 0s - loss: 0.2645 - acc: 0.8858
Epoch 32/100
 - 0s - loss: 0.2645 - acc: 0.8809
Epoch 33/100
 - 0s - loss: 0.2626 - acc: 0.8843
Epoch 34/100
 - 0s - loss: 0.2630 - acc: 0.8870
Epoch 35/100
 - 0s - loss: 0.2637 - acc: 0.8823
Epoch 36/100
 - 0s - loss: 0.2567 - acc: 0.8888
Epoch 37/100
 - 1s - loss: 0.2622 - acc: 0.8833
Epoch 38/100
 - 0s - loss: 0.2578 - acc: 0.8873
Epoch 39/100
 - 0s - loss: 0.2592 - acc: 0.8846
Epoch 40/100
 - 0s - loss: 0.2571 - acc: 0.8838
Epoch 41/100
 - 0s - loss: 0.2546 - acc: 0.8878
Epoch 42/100
 - 0s - loss: 0.2508 - acc: 0.8896
Epoch 43/100
 - 0s - loss: 0.2480 - acc: 0.8939
Epoch 44/100
 - 0s - loss: 0.2514 - acc: 0.8875
Epoch 45/100
 - 0s - loss: 0.2529 - acc: 0.8907
Epoch 46/100
 - 0s - loss: 0.2497 - acc: 0.8940
Epoch 47/100
 - 0s - loss: 0.2475 - acc: 0.8949
Epoch 48/100
 - 0s - loss: 0.2490 - acc: 0.8930
Epoch 49/100
 - 0s - loss: 0.2486 - acc: 0.8902
Epoch 50/100
 - 0s - loss: 0.2444 - acc: 0.8898
Epoch 51/100
 - 0s - loss: 0.2471 - acc: 0.8908
Epoch 52/100
 - 0s - loss: 0.2444 - acc: 0.8927
Epoch 53/100
 - 0s - loss: 0.2494 - acc: 0.8902
Epoch 54/100
 - 0s - loss: 0.2430 - acc: 0.8948
Epoch 55/100
 - 0s - loss: 0.2483 - acc: 0.8867
Epoch 56/100
 - 0s - loss: 0.2500 - acc: 0.8893
Epoch 57/100
 - 0s - loss: 0.2418 - acc: 0.8949
Epoch 58/100
 - 0s - loss: 0.2366 - acc: 0.8983
Epoch 59/100
 - 0s - loss: 0.2458 - acc: 0.8905
Epoch 60/100
 - 0s - loss: 0.2397 - acc: 0.8963
Epoch 61/100
 - 0s - loss: 0.2416 - acc: 0.8930
Epoch 62/100
 - 0s - loss: 0.2381 - acc: 0.8954
Epoch 63/100
 - 0s - loss: 0.2354 - acc: 0.8989
Epoch 64/100
 - 0s - loss: 0.2379 - acc: 0.8960
Epoch 65/100
 - 0s - loss: 0.2449 - acc: 0.8917
Epoch 66/100
 - 0s - loss: 0.2440 - acc: 0.8928
Epoch 67/100
 - 0s - loss: 0.2378 - acc: 0.8954
Epoch 68/100
 - 0s - loss: 0.2390 - acc: 0.8933
Epoch 69/100
 - 0s - loss: 0.2342 - acc: 0.8959
Epoch 70/100
 - 1s - loss: 0.2300 - acc: 0.8998
Epoch 71/100
 - 0s - loss: 0.2321 - acc: 0.9007
Epoch 72/100
 - 0s - loss: 0.2368 - acc: 0.8942
Epoch 73/100
 - 0s - loss: 0.2332 - acc: 0.8971
Epoch 74/100
 - 0s - loss: 0.2310 - acc: 0.8968
Epoch 75/100
 - 0s - loss: 0.2301 - acc: 0.9009
Epoch 76/100
 - 0s - loss: 0.2301 - acc: 0.9006
Epoch 77/100
 - 0s - loss: 0.2320 - acc: 0.9004
Epoch 78/100
 - 0s - loss: 0.2348 - acc: 0.8971
Epoch 79/100
 - 0s - loss: 0.2326 - acc: 0.8983
Epoch 80/100
 - 0s - loss: 0.2290 - acc: 0.8974
Epoch 81/100
 - 0s - loss: 0.2267 - acc: 0.8992
Epoch 82/100
 - 0s - loss: 0.2302 - acc: 0.8992
Epoch 83/100
 - 0s - loss: 0.2264 - acc: 0.8992
Epoch 84/100
 - 0s - loss: 0.2267 - acc: 0.8980
Epoch 85/100
 - 0s - loss: 0.2287 - acc: 0.9001
Epoch 86/100
 - 0s - loss: 0.2256 - acc: 0.9020
Epoch 87/100
 - 0s - loss: 0.2287 - acc: 0.8989
Epoch 88/100
 - 0s - loss: 0.2300 - acc: 0.8987
Epoch 89/100
 - 0s - loss: 0.2251 - acc: 0.9004
Epoch 90/100
 - 0s - loss: 0.2236 - acc: 0.9023
Epoch 91/100
 - 0s - loss: 0.2272 - acc: 0.9010
Epoch 92/100
 - 0s - loss: 0.2295 - acc: 0.9007
Epoch 93/100
 - 0s - loss: 0.2226 - acc: 0.9026
Epoch 94/100
 - 0s - loss: 0.2245 - acc: 0.9038
Epoch 95/100
 - 0s - loss: 0.2338 - acc: 0.8936
Epoch 96/100
 - 0s - loss: 0.2245 - acc: 0.9024
Epoch 97/100
 - 0s - loss: 0.2226 - acc: 0.9010
Epoch 98/100
 - 0s - loss: 0.2216 - acc: 0.9030
Epoch 99/100
 - 0s - loss: 0.2233 - acc: 0.9053
Epoch 100/100
 - 0s - loss: 0.2206 - acc: 0.9021
Out[16]:
<tensorflow.python.keras.callbacks.History at 0x18e0f3c7630>
In [17]:
model_loss, model_accuracy = model.evaluate(
   X_test_scaled, y_test_categorical, verbose=2)
print(
   f"Normal Neural Network - Loss: {model_loss}, Accuracy: {model_accuracy}")
 - 0s - loss: 0.2821 - acc: 0.8760
Normal Neural Network - Loss: 0.2821025672246858, Accuracy: 0.8760292530059814
In [25]:
encoded_predictions = model.predict_classes(X_test_scaled[:10])
prediction_labels = label_encoder.inverse_transform(encoded_predictions)
In [26]:
encoded_predictions
Out[26]:
array([2, 1, 2, 2, 1, 2, 2, 1, 1, 0], dtype=int64)
In [27]:
prediction_labels
Out[27]:
array(['False_Positive', 'CONFIRMED', 'False_Positive', 'False_Positive',
       'CONFIRMED', 'False_Positive', 'False_Positive', 'CONFIRMED',
       'CONFIRMED', 'CANDIDATE'], dtype=object)
In [28]:
print(f"Predicted classes: {prediction_labels}")
print(f"Actual Labels: {list(y_test[:10])}")
Predicted classes: ['False_Positive' 'CONFIRMED' 'False_Positive' 'False_Positive'
 'CONFIRMED' 'False_Positive' 'False_Positive' 'CONFIRMED' 'CONFIRMED'
 'CANDIDATE']
Actual Labels: ['False_Positive', 'CONFIRMED', 'False_Positive', 'False_Positive', 'CONFIRMED', 'False_Positive', 'False_Positive', 'CONFIRMED', 'CONFIRMED', 'CANDIDATE']
In [ ]: