Feed

Deep Learning

Track deep learning advances covering neural network architectures, training techniques, and framework updates. Our AI-summarized digest highlights PyTorch, TensorFlow developments, and model research from developer communities.

Articles from the last 30 days

Show HN: I trained a 9M speech model to fix my Mandarin tones
01Saturday, January 31, 2026

Show HN: I trained a 9M speech model to fix my Mandarin tones

The author developed a specialized deep learning-based Computer-Assisted Pronunciation Training (CAPT) system to improve their Mandarin pronunciation. Frustrated by the limitations of traditional pitch visualization and commercial APIs, the developer built a custom model using a Conformer encoder trained with CTC (Connectionist Temporal Classification) loss. They utilized approximately 300 hours of transcribed speech from datasets like AISHELL-1 and Primewords. By treating pinyin and tones as distinct tokens, the system avoids the auto-correction pitfalls of standard ASR models, providing frame-by-frame feedback. The final 9M-parameter model was quantized to 11MB, enabling it to run entirely on-device via onnxruntime-web without compromising accuracy. This project highlights the effectiveness of small, specialized models for language education.

Sources:Hacker News392 pts
Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser
02Wednesday, February 4, 2026

Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser

This project presents a native Rust implementation of the Voxtral Mini 4B Realtime speech recognition model by Mistral, utilizing the Burn ML framework. A key achievement of this work is enabling high-performance streaming transcription directly within a browser tab via WASM and WebGPU. By leveraging a Q4 GGUF quantized version of the model, the memory footprint is reduced to approximately 2.5 GB, overcoming significant browser constraints such as the 4 GB address space and 2 GB allocation limits. The implementation includes custom WGSL shaders for fused dequantization and matrix multiplication. Technical improvements were made to the audio padding strategy to prevent transcription errors in quantized models, ensuring robust performance for real-time microphone input. The repository provides a full suite of tools including a CLI, local development server, and WASM bindings, demonstrating the potential for secure, client-side AI processing without server dependencies.

Sources:Hacker News374 pts
Visual Introduction to PyTorch
03Friday, February 13, 2026

Visual Introduction to PyTorch

This technical guide introduces PyTorch, a leading deep learning framework. It explains core concepts like Tensors, Autograd, and Gradient Descent while demonstrating how to build a complete machine learning pipeline. The tutorial includes data preprocessing, model architecture design, and training a neural network for tabular data regression.

Sources:Hacker News356 pts
Understanding Neural Network, Visually
04Tuesday, February 3, 2026

Understanding Neural Network, Visually

This interactive project provides an accessible introduction to the foundational principles of neural networks. By visualizing the process of handwriting recognition, the content explains how input data, such as pixel brightness, is transformed into numerical values for processing. It demystifies technical concepts including neurons, weights, and activation functions, illustrating how individual neurons identify simple patterns that coalesce into complex information across multiple layers. The summary highlights how mathematical operations at each stage determine the final output and prediction. While focus is placed on the forward-pass mechanism, the project serves as a bridge for beginners to understand the structural logic of machine learning without getting lost in high-level jargon. It emphasizes visual learning to explain how AI systems move from raw data to pattern recognition and decision-making.

Sources:Hacker News286 pts
Audio is the one area small labs are winning
05Thursday, February 12, 2026

Audio is the one area small labs are winning

The article explores the rise of specialized startups like Gradium and Kyutai in the audio AI space. Despite limited funding compared to major labs, these small teams outperform giants through deep domain expertise, innovative full-duplex architectures for real-time conversation, and efficient neural codecs like Mimi, positioning audio as a critical future modality.

Sources:Hacker News269 pts
GLM-OCR: Accurate × Fast × Comprehensive
06Saturday, February 7, 2026

GLM-OCR: Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal OCR model based on the GLM-V architecture. Featuring Multi-Token Prediction and 0.9B parameters, it leads benchmarks like OmniDocBench V1.5. It supports complex layouts, including tables and formulas, offering both cloud API and local deployment via vLLM, SGLang, or Ollama for efficient, high-performance document understanding.

Sources:Hacker News263 pts
Kanchipuram Saris and Thinking Machines
07Saturday, February 7, 2026

Kanchipuram Saris and Thinking Machines

This analysis explores the existential crisis facing Kanchipuram silk weaving, where material degradation and underpaid labor threaten a millennium-old tradition. It proposes a digital transformation using Capsule Networks (CapsNets) for design integrity, precision fermentation for sustainable bio-dyes, and blockchain-based digital passports to restore consumer trust and ensure fair, immediate compensation for master artisans through smart contracts.

Sources:Hacker News190 pts
Chess engines do weird stuff
08Tuesday, February 17, 2026

Chess engines do weird stuff

The development of chess engines like lc0 reveals insights for LLMs, particularly regarding distillation from search versus reinforcement learning. Techniques like SPSA allow for weight optimization without gradients by evaluating win rates. Additionally, runtime adaptation and specialized transformer architectures demonstrate how search-based distillation and heuristic tuning significantly outperform traditional training objectives and model scaling.

Sources:Hacker News124 pts
Learnings from 4 months of Image-Video VAE experiments
09Monday, February 23, 2026

Learnings from 4 months of Image-Video VAE experiments

Linum researchers detailed their 2024 journey building an Image-Video VAE for latent diffusion models. They addressed architectural stability, NaNs, and 'splotch' artifacts using techniques like Self-Modulating Convolutions. A key conclusion was that over-optimizing reconstruction quality can lead to overfitting on noise, potentially harming the downstream performance of generative models.

Sources:Hacker News110 pts