Methodology
Built and maintained by
@__yaso
·
geekpeak.dev@gmail.com
·
About
Last updated: April 2026 · Pipeline v1.1.0
Overview
GeekPeak exists because book-recommendation lists for developers are usually written by publishers, bestseller aggregators, or affiliate marketers. I wanted one that only counts books developers actually recommended to other developers in writing, with the source article visible for every mention. This page documents exactly how that count is built so you can judge whether to trust it.
GeekPeak discovers which programming books real developers recommend by scanning every public article on DEV.to. We don't rely on bestseller lists, publisher data, or expert panels — we listen to what working developers actually write about.
1.27M
Articles scanned
12,568
Book articles found
664
Books ranked
4,616
Mentions tracked
Data Pipeline
Our pipeline has five stages, each with measurable quality metrics.
Corpus Collection
We crawled all 2.42M article IDs on DEV.to’s public API and retrieved 1,271,389 existing articles (the rest are deleted/draft). Every public article was saved — 100% recovery rate.
Book Article Detection
A multi-layer detector identifies articles that recommend books. It looks for Amazon links, ISBNs, publisher URLs, recommendation phrases, and known book titles.
Layer 1: Deterministic
Amazon ASINs, ISBNs, publisher links
Layer 2: Heuristic
Title patterns, recommendation phrases
Layer 3: Lexical
Known book title dictionary
Book Extraction & Deduplication
From each detected article, we extract individual book references (via ASINs, ISBNs, Markdown links, and text patterns), then merge duplicates using fuzzy title matching and 100+ manual merge rules.
Metadata Enrichment
We fill in authors, publication years, and cover images using the Google Books API and the Open Library API, with manual verification for the top books.
Scoring & Ranking
Each book gets a score based on how many articles mention it, how many different authors recommend it, and how recent the recommendations are.
Scoring Formula
score = (unique_article_mentions × 1.0)
+ (unique_authors × 1.5)
+ (recency_boost × 0.8)
- (duplicate_author_penalty)
Accuracy Metrics
We measure three key quality metrics using random sampling and manual review.
99.7%
Book Precision
Of the 657-book snapshot audited in March, 99.7% were verified as real published books with correct metadata. 23 confirmed non-books have been removed since; the current 664-book set has equal or better precision.
99.0%
Article Recall
Of articles not flagged as book articles, only 1% actually contained book recommendations.
97.6%
Extraction Recall
Of books present in detected articles, 97.6% were successfully extracted and counted.
How we measured these numbers
Book Precision: We sampled 98 books (every 7th from score-sorted list) and manually verified each is a real published book with correct title and author. Found 2 non-books and 10 minor issues. After a full audit of all 684 candidate entries, we removed 27 non-book entries and corrected 345 title/author issues, leaving 657 published books in the March snapshot. Subsequent maintenance has removed a further 23 non-books and added 30 reclassified entries, yielding the current 664.
Article Recall: We sampled 100 articles from the 1.26M non-detected articles (stratified by engagement: 25 each from 0–4, 5–19, 20–99, 100+ reactions). Only 1 article was a clear miss — an article summarizing Fowler's PoEAA book without using typical recommendation language.
Extraction Recall: We sampled 20 detected articles and compared extractor output against all books actually present in the text. Of 41 total books, 40 were found. The one miss was a book title mentioned in prose without any link or formatting.
Quality Assurance
Known Limitations
Changelog
A public log of pipeline updates, manual corrections, and methodology changes. We track changes here so you can see how the dataset evolves over time.
-
v1.1.0 — Apr 2026
- •Added 5-axis taxonomy (Role / Topic / Level / Intent / Tag).
- •Inline filters on Topic and Role pages with URL hash sync.
- •Show more / pagination on rankings (now up to 100 books per ranking page).
- •Site-wide search with Pagefind (intent filter, taxonomy detection).
- •Provenance banners and per-page
ItemListstructured data.
-
v1.0.0 — Mar 2026
- •Initial public release. 1.27M DEV articles scanned, 657 books published.
- •Full manual audit completed: 27 non-book entries removed, 345 title/author corrections applied.
- •Accuracy metrics measured: book precision 99.7%, article recall 99.0%, extraction recall 97.6%.
How We Improve
We continuously refine our pipeline:
- •New detection patterns are added as we discover missed book formats
- •Deduplication rules grow as new book aliases appear
- •Accuracy metrics are re-measured with each major pipeline update
- •Additional data sources (Hashnode, etc.) are planned
Found an issue with our data?
If you notice a wrong book, missing title, or data error, please let us know at geekpeak.dev@gmail.com
Affiliate disclosure. GeekPeak participates in the Amazon Associates affiliate program; book detail pages contain affiliate links. This does not affect ranking order — scores are computed from DEV.to mentions only, before any monetization layer (see Scoring Formula above).