Feed

Big Data

Big data news covering Spark, Hadoop, distributed computing, and large-scale data processing insights from developer communities.

Articles from the last 30 days

Show HN: Algorithmically Finding the Longest Line of Sight on Earth
01Monday, February 9, 2026

Show HN: Algorithmically Finding the Longest Line of Sight on Earth

This project highlights the discovery of the world's longest lines of sight using a custom-developed algorithm named CacheTVS. By exhaustively analyzing topographical data across the planet, researchers identified the longest possible direct view: a 530km span from the Hindu Kush to Pik Dankova. Other notable mentions include a 504km line from Antioquia to Pico Cristobal in Colombia and a 483km view from Mount Elbrus in Russia across the Black Sea to the Pontic Mountains in Turkey. The project provides an interactive map containing approximately 4.5 billion calculated lines of sight, showcasing how geography, atmospheric conditions, and advanced computational algorithms can uncover unique geographical records.

Sources:Hacker News374 pts
Apache Arrow is 10 years old
02Thursday, February 12, 2026

Apache Arrow is 10 years old

Apache Arrow celebrates its 10th anniversary, reflecting on its 2016 launch as a stable, language-agnostic standard for columnar data. Since its inception, the project has maintained exceptional backward compatibility, expanded to include native libraries for 12 languages, and fostered a massive ecosystem of subprojects like DataFusion, ADBC, and GeoArrow for high-performance data exchange.

Sources:Hacker News234 pts
How we made geo joins 400× faster with H3 indexes
03Thursday, February 5, 2026

How we made geo joins 400× faster with H3 indexes

Geospatial joins, which use spatial predicates like ST_Intersects, often suffer from performance issues at scale due to their quadratic complexity. While conventional joins utilize efficient hash partitioning, spatial predicates lack a clean join key, forcing expensive row-by-row comparisons. Floe addresses this by automatically rewriting queries to utilize H3 indexes, a hexagonal hierarchical tiling system that converts complex geometries into big-integer cell IDs. By representing shapes as sets of H3 cells, the system performs a fast integer equi-join as a pre-filter, followed by an exact spatial recheck to eliminate false positives. This method effectively reduces candidate pairs by up to 99 percent. Benchmark results show that optimizing H3 resolution can lead to speedups of nearly 400x compared to baseline queries, transforming expensive spatial operations into efficient parallel hash joins.

Sources:Hacker News154 pts