Our Test Protocol
How We Test AI Summarizers
Published April 2026. Updated when test protocol changes.
The 10-Point Rubric
Every AI summarizer we review is scored against the same rubric. The rubric is applied by a human reviewer who has read the original document in full before reading the AI summary. Criteria:
1. Key findings captured (2 points)
The summary correctly identifies the primary conclusions, results, or main argument of the source document. Deductions for missing major findings or mischaracterizing conclusions.
2. Methods described accurately (1.5 points)
For research papers: the approach, dataset, or analytical method is correctly described. For meeting notes: the decisions and their rationale. For legal documents: the operative clauses and their conditions.
3. Figures, tables, and data referenced (1.5 points)
Key quantitative findings, charts, or data tables in the source are reflected in the summary. Full points require at least 70% of significant data points mentioned.
4. Nuance and conditionality preserved (2 points)
The single most important criterion for high-stakes content. Deductions for: inverting conditional statements (omitting 'unless', 'except', 'subject to'), collapsing caveats into certainty, removing hedge language where the hedge is material.
5. Disciplinary / domain terminology accuracy (1.5 points)
Specialist terms are used correctly. For academic papers: no substitution of incorrect synonyms. For legal: correct clause-type labels. For medical: correct anatomical and pharmacological terminology.
6. Output format usefulness (1.5 points)
Is the output format appropriate for the use case? Meeting summaries should have action items and owners. Research papers should distinguish findings from methods. Contracts should flag clauses by type. Free-form prose gets half credit for this criterion regardless of quality.
Test Document Library
We use the same documents for all tools within a category. This ensures comparability. Our test artifacts:
| Category | Test Document | Length | Source |
|---|---|---|---|
| PDFs (general) | arXiv:2310.11511 - LLM Survey | 47 pages | Public arXiv |
| PDFs (long) | Synthetic 200-page legal deposition | 200 pages | Published test artifact |
| Meetings | Scripted 47-min team meeting | 47 minutes | Synthetic for privacy |
| YouTube | University transformer lecture | 31 minutes | Public YouTube |
| Research papers | 8 arXiv papers, 4 disciplines | 15-60 pages each | Public arXiv |
| Legal (NDA) | Synthetic NDA | 18 pages | Published test artifact |
| Legal (MSA) | Synthetic SaaS MSA | 34 pages | Published test artifact |
| Articles | 5 long-form articles, varied topics | 1,800-3,200 words each | Publicly accessible |
| Medical | Published medical abstract (PubMed) | Abstract only | Public PubMed |
| Books | Non-fiction chapter (public domain) | 4,200 words | Open Library |
Review and Update Cadence
Monthly
Pricing verification. Every price quoted on this site is checked against the vendor's public pricing page monthly. Pages are updated when prices change.
Quarterly
Full test re-run. We re-run the rubric on our standard test documents quarterly to catch material changes in model quality, output format, or feature set.
Annual
Major review. Full competitive landscape reassessment. New tools added, discontinued tools removed, rubric updated if the summarizer landscape has materially changed.
Affiliate Disclosure
Some links on this site are affiliate links. This means that if you click through and purchase a subscription, we may receive a commission from the vendor. Affiliate relationships exist with: QuillBot (via Impact Radius), Jasper (via Impact), Otter.ai (via Impact), Fireflies.ai (via Impact), Scribbr (via Impact and direct), and Paperpal (direct).
Our verdicts are not influenced by commission rates. NotebookLM is recommended on every page where it is the honest winner despite paying us $0 in affiliate revenue. Where a paid tool with an affiliate program is recommended over a free tool, the recommendation is based on the test rubric scores and the specific use case, not the commission.
We do not accept payment for positive reviews, sponsored placements in results grids, or paid inclusion in any recommendation. Our test process is editorial-independent.
Conflicts of Interest Statement
This site is operated independently. We have no ownership interests in any of the tools reviewed. We are not employed by, contracted to, or otherwise commercially affiliated with QuillBot, Otter, Fireflies, NotebookLM, Google, Adobe, Spellbook, Harvey, CoCounsel, Scholarcy, SciSummary, Paperpal, Scribbr, Blinkist, Shortform, Eightify, NoteGPT, Jasper, or any other tool reviewed here beyond the affiliate commission relationships disclosed above.
Corrections Policy
If you find an error in a price, feature description, or test result on this site, please contact us. We will investigate and correct within 48 hours if the error is confirmed. We do not remove negative reviews or alter verdicts in response to vendor requests, but we will correct factual errors promptly. Corrections are noted inline with a timestamp.