#!/usr/bin/env python
# coding: utf-8
#
#
#
# # Exercises: Day 2
#
#
#
# This notebook contains exercises for Day 2. If you are taking this workshop for UCLA credits, you will need to submit this notebook and the one for tomorrow as your homework. Please submit BOTH `.ipynb` file and the HTML (`.html`) file converted from this notebook. (`File > Export Notebook As... > HTML`)
#
# **Requirement: Complete Exercise 2.**
# ## Exercise 1: Wisconsin Breast Cancer dataset
#
# Repeat what we did in the class for the Wisconsin Breast Cancer dataset we used on Day 1.
# In[1]:
from sklearn.datasets import load_breast_cancer
bcancer = load_breast_cancer()
# In[2]:
bcancer.data.shape
# In[3]:
bcancer.target
# ### 0) Split out the test data first.
# ### 1) Fit a decision tree using this data.
# ### 2) Consider a set of candidates for `max_depth`. Choose hyperparameter `max_depth` using 4-fold cross validation.
# ### 3) Check the final performance with the test set.
# ## Exercise 2: Banana dataset
# The banana dataset is included in this directory in `banana_dataset.csv`. The first column contains labels, and the remaining columns are two features. This is a comma-separated file, so you will need to use the keyword parameter `delimiter=','` in `np.loadtxt()`.
# ### 0) Read in the data, and split out the test data.
# In[4]:
import numpy as np
bdataset = np.loadtxt("banana_dataset.csv", delimiter=',')
print("Shape of the bdataset: ", bdataset.shape )
# In[5]:
bfeat = bdataset[:,1:]
blabl = bdataset[:,0]
# ### 1) Fit a support vector classifier with this data.
# ### 2) Consider a set of candidates for $\gamma$ and $C$. Choose hyperparameters $\gamma$ and $C$ using 4-fold cross validation.
# ### 3) Check the final performance with the test set.
# In[ ]: