CNNs Demystified: From Image Recognition to ECG Analysis

Published on April 18, 2025 | 12 min read

As a Senior Chief AI Engineer at Samsung R&D, I’ve spent the last few years working at the intersection of machine learning and human health. From improving smartphone vision systems to decoding heartbeats, I’ve seen Convolutional Neural Networks (CNNs) evolve into powerful tools beyond just image recognition.

In this blog, I’ll unravel the “why” behind CNNs, show you how they preserve meaningful structure in data, and share how we’re adapting them to save lives—one heartbeat at a time.

The CNN Advantage: Preserving Spatial Relationships

Imagine trying to understand a sentence after randomly shuffling its words—that’s what traditional neural networks do to images. Flattening a 256×256 MRI scan into a 1D vector throws away the spatial relationships that matter most.

# Problem: Flattening a 256×256 MRI scan flattened = image.reshape(1, 256*256*3) # 196,608 features! # Loses all spatial context between pixels
Flattening Problem

Flattening destroys local relationships

That’s where CNNs shine:

Core CNN Operations (2D Example)

Core CNN Operations (2D Example)

Let's break down the key operations that make CNNs so powerful for spatial data:

1. Convolution: The Pattern Detector

# Typical 2D convolution in PyTorch/Keras conv_layer = Conv2D(filters=32, kernel_size=(3,3), strides=1, padding='same')
CNN Convolution Layers

Three-layer CNN architecture showing feature map evolution

  • What happens: A 3×3 kernel slides across the image, computing dot products at each position
  • Why it matters: Learns local features like edges, textures, or medical anomalies
  • Key parameters:
    • Filters: Number of feature detectors (32 here)
    • Kernel size: Receptive field (3×3 is standard)
    • Stride: How many pixels to shift (1=high resolution)

2. Pooling: The Information Compressor

# Max pooling reduces spatial dimensions pool_layer = MaxPool2D(pool_size=(2,2), strides=2)
Max Pooling Operation

Max pooling operation preserving strongest activations

  • What happens: Downsamples feature maps by taking maximum values in 2×2 windows
  • Why it matters:
    • Makes network invariant to small translations
    • Reduces computational load
    • Expands receptive field without adding parameters
  • Medical imaging tip: Sometimes use average pooling for smoother feature maps

3. Activation: The Non-Linear Transformer

# ReLU activation (standard for CNNs) activation = ReLU() # or LeakyReLU(alpha=0.1) for medical data
  • Mathematically: max(0, x) - zeros out negative activations
  • Why ReLU?:
    • Mitigates vanishing gradients
    • Computationally efficient
    • Encourages sparse activations (helpful for localized medical features)
  • Medical variant: LeakyReLU often works better for ECG/EEG data

Putting It All Together

A typical CNN block for medical imaging:

Sequential(
Conv2D(64, (3,3), padding='same'), # Feature extraction
BatchNormalization(), # Stabilizes training
ReLU(),
MaxPool2D((2,2)), # Dimensionality reduction
Dropout(0.2) # Regularization
)
💡
Pro Tip: For medical images, we often use smaller strides (1 instead of 2) in early layers to preserve fine diagnostic details that might indicate early-stage pathology.

From Pixels to Pulses: CNNs for 1D Signals

What if I told you the same magic behind facial recognition also helps decode heart rhythms? While CNNs were born in the image world, their DNA works just as well for 1D signals like ECGs.

Think about it: An ECG is like a one-dimensional image. Local waveform patterns like P-waves and QRS complexes hold crucial information—exactly the kind of features CNNs are good at extracting.

1D Convolution for ECG Analysis

1D Convolution on ECG

CNN kernels sliding over ECG signals to detect QRS complexes—just like detecting edges in images

CNN kernels sliding over ECG signals to detect QRS complexes—just like detecting edges in images

# 1D vs 2D convolution comparison # 2D (images) # 1D (ECG) Conv2D(32, (3,3)) Conv1D(32, 3) Input: (256,256,3) Input: (1000,1) # 1000 samples

Case Study: Cleaning Noisy ECGs with CNNs + Attention

In one of my recent research projects, we developed a hybrid deep learning model called the Attention-Based Convolutional Denoising Autoencoder (ACDAE). Our mission: Clean up highly noisy two-lead ECG signals and accurately detect arrhythmias, even under extreme real-world noise conditions.

The ACDAE architecture blends the strengths of convolutional autoencoders and attention mechanisms:

1. Noise Robustness

ECG Denoising

The denoising autoencoder filters out movement artifacts and high-frequency noise, ensuring that clinical features (like P and T waves) remain intact for accurate diagnosis.

2. Attention Mechanism

ECG Attention

Attention gates learn to prioritize signal segments with clinical significance—such as an ST-segment elevation—enhancing interpretability and model trust.

We evaluated ACDAE across four major ECG databases and under varying noise levels from -20 dB to +20 dB SNR. The model achieved impressive results:

Under the Hood: Read the full IEEE paper

Practical Tips: Images vs Signals

Medical signals behave differently from natural images. Here's a cheat sheet I always keep in mind:

Factor Natural Images Medical Signals (e.g. ECG)
Augmentation Flip, rotate freely Use time-warping, scaling only
Kernel Size Typically 3×3 Depends on waveform duration (e.g., QRS ≈ 120ms)

Takeaway: The CNN Mindset

Whether I’m analyzing X-rays or heartbeats, my approach remains the same:

  1. Look locally: Slide filters to extract small patterns
  2. Go deep: Stack layers to build abstraction
  3. Tune wisely: Adapt kernel shapes to the data’s domain

CNNs are no longer just for selfies or cat videos—they’re helping diagnose heart disease, detect tumors, and even enhance driver safety. And as we adapt them to new domains, their impact only grows.

Thanks for reading! Feel free to connect if you're working on AI in healthcare—I’d love to learn from your journey too.