### Introduction

This is the hands-on portion of the machine learning section of the summer school.  You are tasked with developing a binary classifier to separate continuum events from signal B events, based primarily on event topology.

To do this we will be using Tensorflow/Keras.  Tensorflow is a FOSS machine learning library devleoped by Google and widely used in research.  The other popular library is PyTorch, developed by Meta and the Linux Foundation, but we will not use that here.

### Setup

To begin, we will import the relevant libraries needed to successfully perform our task.  If you have the `tensorflow-gpu` package installed, there is no special `import` for it.  Simply import `tensorflow` and the rest is automatic.  You do not need to do anyting different than when using regular `tensorflow`. 

After that, we will load our data and put it into a usuable format (e.g. Pandas dataframe), if needed.  How you do all of this is entirely up to you.  This skeleton code has been provided for you to get easily on your way.  You can use it as it and simply plug into places where propmpted, or rewrite things completely.  It's up to you.

### Performing the Exercise

All you need to do is fill in the blanks where you see "?".

In [None]:
import pandas as pd

import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras

#from sklearn.preprocessing import MinMaxScaler

In [None]:
# Set the data path for the training and test CSV files
train_file = "?"
test_file = "?"

In [None]:
# Create Pandas DataFrames (df) from csv files
df_train = pd.read_csv(train_file)
df_test = pd.read_csv(test_file)

In [None]:
# Display for a quick check of the data
display(df_train)
display(df_test)

In [None]:
df_train_shuffled = df_train.sample(frac=1)

In [None]:
display(df_train_shuffled)

### Feature Selection

1. Select the appropriate variables for training; if you can plot them, separated by B_isContinuum, you might get a better sense of what's useful.  The list of the ones you select will be stored in the `feature` list.  The `target` list only contains `B_isContinuumEvent`, which provides the class label.

2. At the end of a physics analysis, fitting is typically perform to extract the signal; it is a good idea not to train your NN with variables that are used for fitting such as $ M_{\textrm{bc}} $ or $ \Delta E $

3. Neural networks $ \textit{sometimes} $ work better when the features are scaled to be between [0, 1].

In [None]:
features = ['?']
target = ['B_isContinuumEvent']

In [None]:
training = df_train_shuffled[features]
target = df_train_shuffled[target]

In [None]:
# Scale training features, if needed
#min_max_scaler = MinMaxScaler()
#training[features] = min_max_scaler.fit_transform(training[features])

### Creating the training set

We will now prepare the training data to be used in training the model.

NumPy arrays fit naturally into Tensorflow/Keras machine learning models, so we will convert the feature-selected dataframe to NumPy arrays.

In [None]:
training = training.to_numpy()
target = target.to_numpy()

In [None]:
X_train = training
y_train = target

In [None]:
# Look at the training variables; modify the ones you want to look at in the "feature" list above
#%matplotlib inline
# The above line is only for Jupyter notebooks
#df_train_shuffled[features].hist(bins=50, figsize=(20,15))
#plt.show()

### The Model

This is the heart of the exercise.  The `build_model()` (you can call it whatever you want) function is what creates the neural network model.  Your task is to select the appropriate hyperparameters.

You must select or tune

1. The number of hidden layers, `n_hidden`

2. The number of neurons per layer, `n_neurons`

3. The learning rate, `learning_rate`

4. The activation function
    * Check "Available activations" at https://keras.io/api/layers/activations/

5. The optimizer (below we default to SGD, but do we stay with SGD or use something else?)
    * Check "Available Optimizers" at https://keras.io/api/optimizers/. 

6. The loss function 
    * You can select your loss function from https://keras.io/api/losses/.  Use the name of the given loss function when compiling your model

The metric is not relevant for learning and is only used to monitor progress after each $\textbf{epoch}$ (one pass of the training data).  You can chang the metric from "accuracy" to something closer to your evaluation metric (i.e. ROC AUC), to monitor your training progress.

Note that there are two different activation functions used: one for the hidden layers and one or the output layer.  Can you think why this is?  For binary classification, why might we use a sigmoid function for the output layer?

Also note the `input_shape` argument.  In this exercise, this corresponds to the length of a 1D feature vector.  So depending on how many features you use, the shape will change.

Finally, if you don't want to keep tweaking your model hyperparameters by hand and you are very ambitious, you can use the `GridSearchCV` [1] or `RandomizedSearchCV` [2] methods in scikit-learn.  These allow you to define a parameter space that your model will use to select parameters and evaluate itself with, finally giving you an optimized set.  `GridSearchCV` does not scale well with parameter space size and may take a long time to complete, as it searches the entire space.  `RandomizedSearchCV` will randomly search the parameter space and can thus be faster, however it still may take a while.  Keras provides something with a similar functionality, called `Keras Tuner` [3] .  All three of these require you to modify the code in this notebook in a non-trivial, as they are not implemented by default here.

You are not required to implement them but you should at least be aware of their existence.

[1] https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

[2] https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

[3] https://keras.io/keras_tuner/


### A note on model building

We use the `Sequential` model here, where layers are stacked one top of each other, one after the other.  There is also a `functional` model that can also be used.  The advantage of this is its greater flexibility.  The `functional` API allows for more complex models that are not simply layers that are stacked linearly.

In [None]:
# define neural network model
def build_model(n_hidden = ?, n_neurons = ?, learning_rate = ?, input_shape=[?]):
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation="?"))

    model.add(keras.layers.Dense(1, activation="?"))
    optimizer = keras.optimizers.SGD(learning_rate=learning_rate) # Change the optimizer here, if you don't want to use SGD
    model.compile(loss="?", optimizer=optimizer, metrics=["accuracy"])
    
    return model

In [None]:
# plot model
def model_describe_and_plot():
    model_to_plot = build_model()
        
    model_to_plot.summary()
    keras.utils.plot_model(model_to_plot, "cs_exercise_summer_school_2023.png", show_shapes=True)

In [None]:
# Get a description of the model
model_describe_and_plot()

In [None]:
# create the model instance that will be used for training.
model = build_model()

In [None]:
# plot a graphical representation of the model
keras.utils.plot_model(model, "cs_exercise_summer_school_2023_model_display.png", show_shapes=True)

### Training the model

Here is the step where we train the model.  There are a few things that need to be explained.  

There are a couple of things implemented below called "callbacks".  Callback allow you to control how the learning is performed.  There are two callbacks used here.

1. `ReduceLROnPlateau` is a callback that reduces the learning rate when the model training does not find an improvement in (this case) the validation loss after 5 epochs.  It reduces the learning rate by a factor of 0.2 down to a minimum of 0.00001.  This can help with problems like getting stuck in a local minimum.

2. `EarlyStopping` stops the training after a certain number of epochs that you set (`patience`).  This helps reduce the chance of overfitting.  A rule of thumb is to set the patience to 10\% the total number of epochs, though this really depends on you and your project.

Callbacks are optional and you can remove or alter them here as you like.

There is also the optional `validation_split` argument in the fit (train) function.  What this does is hold out a user-defined random portion of the training data at each epoch to perform a self-evaluation.

In [None]:
#train and save model
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.00001)
my_callbacks = [keras.callbacks.EarlyStopping(patience=10), reduce_lr]
training_history = model.fit(X_train, y_train, epochs=100, validation_split=0.2, callbacks=[my_callbacks], verbose=2)
model.save("cs_exercise_summer_school_2023")

In [None]:
# plot training history
pd.DataFrame(training_history.history).plot(figsize=(20, 15))
plt.grid(True)
plt.ylim(top=1.2)
plt.ylim(bottom=0)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.title("Training History", fontsize=16)
plt.xlabel('Epoch', fontsize=16)
fname = "cs_exercise_summer_school_2023_training_history.png"
plt.savefig(fname, dpi=None, facecolor='w', edgecolor='w', orientation='portrait', transparent=False, bbox_inches=None, pad_inches=0.1)
plt.show()

In [None]:
# save training history
df_training_hist = pd.DataFrame(training_history.history)

filename_csv = "continuum_suppressions_summer_school_2023_model_training_history.csv"
df_training_hist.to_csv(filename_csv)

### Saving the competition CSV submission file

Here we will save the results of the model evaluation on test.csv, and save only the `Id` and `B_isContinuumEvent` columns, as that is what is required by the Kaggle compeition page

In [None]:
# Prepare the test data for evaluation
test = df_test[features]
test = test.to_numpy()

X_test = test

In [None]:
# Evaluate the model on the test data
y_predict_test = model.predict(X_test)

In [None]:
# Append the results of the evaluation to the dataframe with the test data
df_test['B_isContinuumEvent'] = y_predict_test

In [None]:
display(df_test[['Id', 'B_isContinuumEvent']])

In [None]:
vars_for_submission = ['Id', 'B_isContinuumEvent']

In [None]:
# Select only the columns you need for the competition
df_for_submission = df_test[vars_for_submission]

In [None]:
display(df_for_submission)

In [None]:
# Save to CSV for competition submission
competition_file_submission_csv = 'example_submission.csv'
df_for_submission.to_csv(competition_file_submission_csv, index=False)