Transfer Learning with Deep CNNs

Theory

Introduction to Transfer Learning

In most traditional supervised machine learning approaches, it is assumed that the training data and test data are represented using the same feature space and are drawn from the same probability distribution. Under this assumption, models can perform well only when sufficient training data is available. However, in many real-world scenarios, collecting high-quality data is expensive and time-consuming, which becomes a major bottleneck in practical applications.

Semi-supervised learning attempts to reduce data requirements by using unlabelled data along with training data, but it still assumes that training and test data belong to the same domain and distribution. Active learning reduces the manual effort by selecting the most informative samples, but it is limited by the available resources and may still not provide enough data to train an accurate classifier.


Domain and Task Definitions (Formal Notation)

Domain (DD)

A domain consists of:

  • A feature space XX
  • A marginal distribution P(x)P(x)

So, a domain is:

D={X,P(x)}D = \{X, P(x)\}

Task (TT)

A task consists of:

  • A label space YY
  • A predictive function ff, which maps the feature space XX to the label space YY, i.e.,

f:XYf: X \rightarrow Y

where XX represents the feature space and YY represents the label space.

So, a task is defined as:

T={Y,f()}T = \{Y, f(\cdot)\}


Transfer Learning Definition

Given:

  • Source domain DSD_S and task TST_S
  • Target domain DTD_T and task TTT_T

Transfer learning aims to improve the target predictive function fT()f_T(\cdot) in DTD_T using the knowledge from DSD_S and TST_S, where:

DSDT or TSTTD_S \neq D_T \text{ or } T_S \neq T_T

This means:

  • The domains may have different feature spaces or distributions
  • The tasks may have different label spaces or predictive distributions

When both domains and tasks are the same, the learning becomes a traditional machine learning problem.


Homogeneous vs Heterogeneous Transfer Learning

Homogeneous Transfer Learning: Homogeneous Transfer Learning refers to a type of transfer learning in which the source domain and target domain share the same or highly similar feature representation. In this case, the input data from both domains lies in an overlapping feature space, meaning that the source and target datasets are represented in a comparable manner. This type of transfer learning is widely applied in practical scenarios where the domains involve similar kinds of data, such as transferring knowledge from one image dataset to another image dataset (image → image).

Heterogeneous Transfer Learning: Heterogeneous Transfer Learning is a more complex form of transfer learning in which the feature representation of the source domain and target domain is different. Here, the datasets are described using different feature spaces, making direct transfer difficult. Since the representations do not overlap, it becomes necessary to apply feature mapping or transformation techniques to build a relationship between the source and target domains. Due to the mismatch in feature spaces, heterogeneous transfer learning is considered more challenging compared to homogeneous transfer learning and requires additional processing steps to align or bridge the domains effectively.


Deep CNNs

Convolutional neural networks or CNNs, are a specialized kind of neural network for processing data that has a known, grid-like topology. Examples include time-series data, which can be thought of as a 1D grid taking samples at regular time intervals, and image data, which can be thought of as a 2D grid of pixels. Convolutional networks have been tremendously successful in practical applications. The name "convolutional neural network" indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

Deep Convolutional Neural Networks (CNNs), also called ConvNets, have become one of the most important architectures for large-scale image recognition. Their success in tasks such as image classification and object recognition was made possible due to the availability of large public image repositories like ImageNet, along with high-performance computing resources such as GPUs and distributed computing systems. A major milestone in the progress of deep CNNs was the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC), which provided a benchmark platform for evaluating and comparing image classification systems. Early approaches were based on shallow feature representations, but later deep CNN-based models achieved significantly better performance and became dominant in visual recognition tasks.

In deep learning, transfer learning is typically implemented by using a pretrained CNN backbone (trained on large datasets like ImageNet). The pretrained model learns rich hierarchical representations:

  • Early layers learn general patterns (edges, textures)
  • Middle layers learn shapes and structures
  • Deep layers learn high-level category-specific features

For a small dataset like Oxford Flowers, transfer learning improves performance by:

  • Reducing training time
  • Preventing overfitting
  • Reusing learned feature extraction ability

Hence, fine-tuning pretrained models such as VGG19 or MobileNetV2 is an effective strategy for building accurate classification models with limited training samples.


Merits of Transfer Learning with Deep CNNs

  • Reduced Training Time: Transfer learning reduces the training time because the model does not need to be trained from scratch.
  • Works Well on Small Datasets: It provides good accuracy even when the target dataset is small by using pretrained knowledge.
  • Better Feature Extraction: Pretrained CNNs already learn powerful features like edges, textures, and shapes, improving performance.
  • Reduced Overfitting: Since the model starts with pretrained weights, the chances of overfitting are reduced.
  • Faster Convergence: The model converges faster because weights are already optimized on large datasets like ImageNet.

Demerits of Transfer Learning with Deep CNNs

  • Domain Dependency: If the source and target datasets are very different, transfer learning may not give good results.
  • Complexity in Layer Freezing: Choosing which layers to freeze or unfreeze needs careful tuning and experimentation.
  • Risk of Overfitting during Fine-tuning: Fine-tuning on a very small dataset can cause overfitting if not controlled properly.
  • High Computation for Large Models: Large models like VGG19 require high memory and more computation, making training slow.
  • Hardware Requirement: Deep CNN training may still require GPU support for faster training and performance.