Blog

Get insight into the latest in Drupal development

The war against PDFs

In a recent article - The War on PDFs is Heating up, The Economist argues that the once-revolutionary PDF is increasingly out of step with the needs of a data-driven, AI-powered world. Though the Portable Document Format was designed to solve a real problem — preserving layout and visual consistency across devices — its very strengths have become structural weaknesses.

The Digital Taxidermy of the Portable Document Format

Created by Adobe in the early 1990s, the PDF was built for visual fidelity. It ensured that a document looked identical whether opened on a Mac, a PC, or printed on paper. For contracts, government filings, and academic papers, this reliability was transformative. But the format was conceived in a pre-internet, pre-AI era. It was designed to replicate paper, not to fuel intelligent systems.

That design decision now looks costly.

The Invisible Wall: Data Extraction and the AI Gap

At its core, a PDF prioritizes appearance over structure. Text is often stored as positioned characters rather than semantically meaningful elements like headings, tables, or lists. To a human reader, the document is perfectly legible. To a machine, it can resemble a jumble of coordinates and drawing instructions. Extracting clean data from PDFs requires additional processing — optical character recognition, layout reconstruction, or specialized parsing tools — each introducing friction, expense, and error.

This becomes a serious liability in the age of artificial intelligence. Large language models and automated systems depend on well-structured data. Yet vast swaths of corporate knowledge, scientific research, and regulatory documentation exist only as PDFs. Instead of being easily machine-readable, this information must be laboriously decoded. That means higher computational costs, slower workflows, and less reliable outputs. In effect, PDFs act as a bottleneck in otherwise streamlined digital systems.

The disadvantages extend beyond AI. PDFs are notoriously difficult to edit collaboratively. Unlike cloud-native documents that allow real-time updates and structured version control, PDFs often circulate as static attachments. Teams end up managing multiple versions of the same file, increasing the risk of errors and confusion. In environments that demand agility and continuous iteration, the format feels rigid and outdated.

Accessibility is another persistent weakness. While standards like tagged PDFs and PDF/UA exist, properly structured, fully accessible PDFs remain the exception rather than the rule. Many documents lack correct tagging, logical reading order, or usable form fields, making them challenging for screen readers and assistive technologies. Ensuring compliance requires expertise and additional labour, further increasing the total cost of ownership.

The Superiority of Responsive HTML over Static Containers

From a data strategy perspective, PDFs also limit interoperability. Modern digital ecosystems rely on APIs, structured data feeds, and dynamic content. A PDF, by contrast, is a sealed container. Information is locked inside a page-based snapshot rather than flowing seamlessly between systems. Organizations attempting automation, analytics, or AI integration often find themselves converting PDFs into more usable formats — an extra step that would be unnecessary if the information were born structured.

Despite these shortcomings, PDFs remain deeply entrenched in legal, regulatory, and archival systems. Their fixed nature is precisely why they are trusted for official documentation. But the article suggests that this trust comes at a growing cost. As businesses prioritize automation and intelligence, formats designed for static presentation increasingly clash with systems designed for computation.

The “war against PDFs” is not about aesthetics; it is about efficiency and future readiness. In a world where machines are as important as human readers, a format optimized for paper may no longer be fit for purpose. PDFs are unlikely to disappear overnight, but their disadvantages are becoming harder to ignore — and harder for modern enterprises to justify.