Feed

NLP

NLP news covering text processing, tokenization, transformer models, and language AI from developer communities.

Articles from the last 30 days

Voxtral Transcribe 2
01Wednesday, February 4, 2026

Voxtral Transcribe 2

Mistral has announced the launch of Voxtral Transcribe 2, a sophisticated suite of speech-to-text models featuring Voxtral Mini Transcribe V2 and Voxtral Realtime. These models deliver state-of-the-art accuracy in 13 languages, introducing features like speaker diarization, word-level timestamps, and context biasing for technical terminology. Voxtral Realtime stands out with sub-200ms latency and an open-weights Apache 2.0 license, making it ideal for edge deployment and privacy-focused voice agents. Meanwhile, Voxtral Mini Transcribe V2 offers industry-leading price-performance at $0.003 per minute, outperforming competitors like Gemini and GPT-4o mini in accuracy and speed. The release also includes a dedicated audio playground in Mistral Studio for instant testing.

Sources:Hacker News880 pts
GLM-OCR: Accurate × Fast × Comprehensive
02Saturday, February 7, 2026

GLM-OCR: Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal OCR model based on the GLM-V architecture. Featuring Multi-Token Prediction and 0.9B parameters, it leads benchmarks like OmniDocBench V1.5. It supports complex layouts, including tables and formulas, offering both cloud API and local deployment via vLLM, SGLang, or Ollama for efficient, high-performance document understanding.

Sources:Hacker News263 pts
Nanobot: Ultra-Lightweight Alternative to OpenClaw
03Sunday, February 1, 2026

Nanobot: Ultra-Lightweight Alternative to OpenClaw

Nanobot is an ultra-lightweight personal AI assistant designed to offer core agent functionality with a minimal footprint of approximately 4,000 lines of code. This makes it 99% smaller than its inspiration, Clawdbot, facilitating easier research, modification, and extension. The project supports multi-provider LLM integration via OpenRouter, OpenAI, and DeepSeek, as well as local model hosting through vLLM. A key feature of Nanobot is its cross-platform accessibility, allowing users to interact with their AI agent via Telegram, WhatsApp, and Feishu. It also includes built-in tools for web searching, scheduled cron tasks, and proactive heartbeats. With its focus on readability and efficiency, Nanobot serves as a versatile framework for developers looking to deploy a full-stack personal assistant quickly using Docker, PyPI, or direct source installation.

Sources:Hacker News227 pts
Consistency diffusion language models: Up to 14x faster, no quality loss
04Thursday, February 19, 2026

Consistency diffusion language models: Up to 14x faster, no quality loss

Consistency diffusion language models (CDLM) significantly accelerate Diffusion Language Models (DLMs) by integrating consistency-based multi-token finalization with block-wise KV caching. This post-training approach achieves up to 14.5x faster inference in math and coding tasks, effectively addressing inefficiencies in bidirectional attention and refinement step counts while maintaining high-quality generation and competitive accuracy.

Sources:Hacker News191 pts
Show HN: Respectify – A comment moderator that teaches people to argue better
05Wednesday, February 25, 2026

Show HN: Respectify – A comment moderator that teaches people to argue better

Respectify is an AI-powered moderation tool designed to foster healthy online communities. It identifies disrespectful language, logical fallacies, dog whistles, and sophisticated spam through context analysis. By providing real-time feedback and configurable settings, it helps users rephrase comments for better clarity and relevance, ensuring a safe and engaging environment without relying on traditional blacklists.

Sources:Hacker News183 pts
Half million 'Words with Spaces' missing from dictionaries
06Wednesday, February 25, 2026

Half million 'Words with Spaces' missing from dictionaries

This analysis explores 'multi-word expressions' (MWEs) that function as singular conceptual units despite being omitted from traditional dictionaries. Using computational linguistics and LLM probing, the study reveals that phrases like 'boiling water' or 'Saturday night' carry significant semantic weight, arguing for their inclusion in word games and a reevaluation of what constitutes vocabulary.

Sources:Hacker News114 pts