GeekPeak

Methodology

Built and maintained by @__yaso · geekpeak.dev@gmail.com · About
Last updated: April 2026 · Pipeline v1.1.0

Overview

GeekPeak exists because book-recommendation lists for developers are usually written by publishers, bestseller aggregators, or affiliate marketers. I wanted one that only counts books developers actually recommended to other developers in writing, with the source article visible for every mention. This page documents exactly how that count is built so you can judge whether to trust it.

GeekPeak discovers which programming books real developers recommend by scanning every public article on DEV.to. We don't rely on bestseller lists, publisher data, or expert panels — we listen to what working developers actually write about.

1.27M

Articles scanned

12,568

Book articles found

664

Books ranked

4,616

Mentions tracked

Data Pipeline

Our pipeline has five stages, each with measurable quality metrics.

1

Corpus Collection

We crawled all 2.42M article IDs on DEV.to’s public API and retrieved 1,271,389 existing articles (the rest are deleted/draft). Every public article was saved — 100% recovery rate.

2.42M IDs checked 1.27M articles saved 3.1 GB corpus
2

Book Article Detection

A multi-layer detector identifies articles that recommend books. It looks for Amazon links, ISBNs, publisher URLs, recommendation phrases, and known book titles.

Layer 1: Deterministic

Amazon ASINs, ISBNs, publisher links

Layer 2: Heuristic

Title patterns, recommendation phrases

Layer 3: Lexical

Known book title dictionary

12,568 book articles detected 0.99% of all articles
3

Book Extraction & Deduplication

From each detected article, we extract individual book references (via ASINs, ISBNs, Markdown links, and text patterns), then merge duplicates using fuzzy title matching and 100+ manual merge rules.

74,734 raw candidates 43,617 after noise removal 2,830 articles with valid mentions 664 unique books
4

Metadata Enrichment

We fill in authors, publication years, and cover images using the Google Books API and the Open Library API, with manual verification for the top books.

Authors known: 99.6% Pub year known: 99.1% Cover images: 99.6%
5

Scoring & Ranking

Each book gets a score based on how many articles mention it, how many different authors recommend it, and how recent the recommendations are.

Scoring Formula

score = (unique_article_mentions × 1.0)

+ (unique_authors × 1.5)

+ (recency_boost × 0.8)

- (duplicate_author_penalty)

A
Article mentions — The number of distinct articles that recommend this book. More articles = stronger community signal.
U
Unique authors (×1.5) — Weighted higher because diverse recommendations are more meaningful than one person mentioning a book repeatedly.
R
Recency boost (×0.8) — Recent mentions (last 90 days) receive a bonus, so trending books surface naturally.
D
Duplicate penalty (−0.5) — When the same author mentions a book in multiple articles, extra mentions are discounted to prevent gaming.

Accuracy Metrics

We measure three key quality metrics using random sampling and manual review.

99.7%

Book Precision

Of the 657-book snapshot audited in March, 99.7% were verified as real published books with correct metadata. 23 confirmed non-books have been removed since; the current 664-book set has equal or better precision.

99.0%

Article Recall

Of articles not flagged as book articles, only 1% actually contained book recommendations.

97.6%

Extraction Recall

Of books present in detected articles, 97.6% were successfully extracted and counted.

How we measured these numbers

Book Precision: We sampled 98 books (every 7th from score-sorted list) and manually verified each is a real published book with correct title and author. Found 2 non-books and 10 minor issues. After a full audit of all 684 candidate entries, we removed 27 non-book entries and corrected 345 title/author issues, leaving 657 published books in the March snapshot. Subsequent maintenance has removed a further 23 non-books and added 30 reclassified entries, yielding the current 664.

Article Recall: We sampled 100 articles from the 1.26M non-detected articles (stratified by engagement: 25 each from 0–4, 5–19, 20–99, 100+ reactions). Only 1 article was a clear miss — an article summarizing Fowler's PoEAA book without using typical recommendation language.

Extraction Recall: We sampled 20 detected articles and compared extractor output against all books actually present in the text. Of 41 total books, 40 were found. The one miss was a book title mentioned in prose without any link or formatting.

Quality Assurance

Full manual audit — All 684 initial candidate entries were individually reviewed. 27 non-book entries (courses, novels, duplicates) were removed, 345 title/author corrections were applied. Ongoing maintenance has further trimmed the set to the current 664 ranked books.
Non-book filtering — Physical products (keyboards, monitors), GitHub repositories, video courses, and spam are excluded using 70+ filter patterns.
Deduplication — 100+ manual merge rules handle common variants (e.g., "DDIA" and "Designing Data-Intensive Applications" are the same book).
Source transparency — Every book's detail page links to the actual articles that recommended it, so you can verify the data yourself.

Known Limitations

DEV.to only — We currently scan DEV.to articles. Hashnode, Medium, and personal blogs are not yet included. This means some books recommended on other platforms may be underrepresented.
Pattern-based detection — Our detector uses regular expressions and heuristics, not AI/LLM. Books mentioned without any structural signal (no link, no bold, no "I recommend") may be missed.
English articles only — Non-English articles may contain book recommendations that our patterns don't capture well.
Popularity bias — Widely-known books get mentioned more often. Excellent niche books with smaller audiences may rank lower than their quality deserves.

Changelog

A public log of pipeline updates, manual corrections, and methodology changes. We track changes here so you can see how the dataset evolves over time.

  1. v1.1.0 — Apr 2026

    • Added 5-axis taxonomy (Role / Topic / Level / Intent / Tag).
    • Inline filters on Topic and Role pages with URL hash sync.
    • Show more / pagination on rankings (now up to 100 books per ranking page).
    • Site-wide search with Pagefind (intent filter, taxonomy detection).
    • Provenance banners and per-page ItemList structured data.
  2. v1.0.0 — Mar 2026

    • Initial public release. 1.27M DEV articles scanned, 657 books published.
    • Full manual audit completed: 27 non-book entries removed, 345 title/author corrections applied.
    • Accuracy metrics measured: book precision 99.7%, article recall 99.0%, extraction recall 97.6%.

How We Improve

We continuously refine our pipeline:

  • New detection patterns are added as we discover missed book formats
  • Deduplication rules grow as new book aliases appear
  • Accuracy metrics are re-measured with each major pipeline update
  • Additional data sources (Hashnode, etc.) are planned

Found an issue with our data?

If you notice a wrong book, missing title, or data error, please let us know at geekpeak.dev@gmail.com

Affiliate disclosure. GeekPeak participates in the Amazon Associates affiliate program; book detail pages contain affiliate links. This does not affect ranking order — scores are computed from DEV.to mentions only, before any monetization layer (see Scoring Formula above).