Support Vector Machine (SVM)

The objective of this part of the experiment is to implement a Support Vector Machine (SVM) classifier on a real-world dataset to study linear class separability.

Part 1: Linear SVM on Wine Recognition Dataset

The Wine Recognition dataset is used, where selected chemical attributes are employed to classify three types of wines: Barolo, Grignolino, and Barbera. This part focuses on understanding linear decision boundaries, margin maximization, and the effect of feature selection on classification accuracy.

Step 1: Import numpy and pandas for numerical computation and data handling, matplotlib and seaborn for data visualization, and sklearn for pipelines, model calls and evaluations.

Step 2: Dataset Loading and Description:

  • Load the Wine Recognition Dataset using load_wine() from sklearn.datasets
  • The dataset consists of:
    • 178 instances
    • 13 numerical chemical attributes
    • 3 target classes representing wine cultivars:
      • Class 0: Barolo
      • Class 1: Grignolino
      • Class 2: Barbera
  • Select only two features: Flavanoids and Color Intensity as using two features enables direct 2D visualization and gives clear interpretation of class separability

Step 3: Exploratory Data Analysis (EDA):

  • Plot scatter plots of Flavanoids vs Color Intensity with class-wise colour coding.
  • Plot histograms for each selected feature across different classes.
  • Analyse overlap between classes to assess linear separability.

Step 4: Data Preprocessing:

  • Define feature matrix X using the selected attributes and target vector y using class labels.
  • Split the dataset into training and testing sets using an 80:20 ratio.
  • Apply standardization using StandardScaler() to normalize feature distributions.

Step 5: Model Training:

  • Train a Support Vector Machine classifier with a linear kernel on the training data.
  • The model attempts to find an optimal hyperplane that maximizes the margin between classes.

Step 6: Model Evaluation (Linear SVM):

  • Evaluate the trained model on the test dataset using Accuracy, Precision, Recall and F1-score metrics and visualize using Confusion Matrix.
  • Analyse misclassifications and class-wise performance.

Step 7: Decision Boundary Visualization:

  • Plot the linear decision boundary along with support vectors.
  • Observe how a straight hyperplane separates the wine classes in feature space.

Part 2: Non-Linear SVM on Two Moons Dataset

The Two Moons dataset is used to demonstrate the limitations of linear classifiers and the necessity of kernel-based transformations. By applying an RBF kernel, this part highlights how kernel selection enables flexible decision boundaries and improves classification performance on non-linearly structured data.

Step 1: Import numpy, matplotlib, and relevant modules from sklearn.

Step 2: Generate the Two Moons dataset using make_moons()

Step 3: Exploratory Data Analysis (EDA):

  • Plot a scatter plot of the dataset to visualize the non-linear class distribution.
  • Observe the curved structure that motivates the use of kernel methods.

Step 4: Data Preprocessing:

  • Define X as the 2D coordinates of the points and y as the binary class labels.
  • Split the dataset into training and testing sets using an 80:20 ratio.
  • Apply StandardScaler() to improve kernel performance and convergence.

Step 5: Train an SVM classifier using the Radial Basis Function (RBF) kernel.

Step 6: Evaluate the model using Accuracy, Precision, Recall and F1-score metrics and visualize using Confusion Matrix.

Step 7: Decision Boundary Visualization:

  • Plot the non-linear decision boundary produced by the RBF SVM.
  • Observe the flexible, curved boundary adapting to the moon-shaped clusters.