Some folders of mine have hundreds to thousands of PDF files, mostly corresponding to conference proceedings and ebooks.

I would like to browse these folders with the same ease I browse photos, using software like the free http://www.irfanview.com/ or the commercial http://www.acdsee.com/ . Unfortunately, these tools do NOT natively support PDFs. If you search, you’ll find a “solution” based on GhostScript, as you can read here, but it is far from perfect. So far from perfect that, in fact, it is no solution at all, at least for me and for most who praise ACDSee’s stability.

The problem is scale. Scalability is quite a challenge: everything is fine for 5, 50, eventually 500 files, but the house of cards collapses beyond that. The GhostScript solution for ACDSee works for light quantities, but when you suddenly browse to a folder with hundreds of PDF files, ACDSee will probably crash. I found the problem even more disturbing because, with that plugin installed, ACDSee sometimes crashed even when not browsing PDFs – an issue that got solved, by removing the GhostScript “solution”. Since I don’t want to mess with my photos database, this wasn’t the way for me.

Then I tried the nuclear weapon: Adobe Bridge, a tool that comes in Adobe’s “Creative Suite”, supporting most photo formats and PDFs, natively. Version CS4 is the most stable, but Adobe Bridge is a monster that will devour all your available RAM the moment you present it a challenge: for example, if I browse to my Canon 5D RAW folder, all the RAM just evaporates – ACDSee does a much better job here. The same goes for browsing PDFs: in order to show the 1st page of a PDF as its thumbnail, any software must open the PDF file, then read and process that page. The question is: once the thumbnail is extracted where will it be stored? Adobe Bridge stores the thumbnails in a Cache folder that is *limited* to 500.000 items, while ACDSee keeps a simple database, that doesn’t use a dedicated server. While 500K items seems like a lot, this number will shrink in no time, because every small file that you display in Bridge will be cached and decrease the available future pool, requiring you to “purge the cache” sooner than expected.

20100121_apdf_thumbnailer_adobe_bridge_cs4_01

So, Adobe Bridge does support PDFs natively, but it is memory hungry for serious content, to the point of becoming incompatible with other open applications: sure, you can have a web browser and Acrobat viewer and Excel opened at the same time… but only if you don’t have many active Web sessions and your .DOC is rather simple, and your Excel book isn’t forecasting some trend; if not, expect plenty of memory page swapping.

Even if you give Adobe Bridge all the possible resources, it will stop thumbnailing as soon as it reaches its cache max limit (magically set at 500K items and being much less, by default). Not only that, I am sorry to say, but Bridge can crash. And it is slow for big folders. In a single sentence: Adobe Bridge seems to do the job until the moment you raise the bar, then you’ll find yourself waiting, crashing or with no results at all, which is far from the level of experience I want.

One Bridge like “solution”, nearly an underground product that no one seems to know about, is STDUtility’s Explorer (http://www.stdutility.com/stduexplorer.html) – it is free and it natively supports PDFs! But it has a major design flaw: it doesn’t cache thumbnails, meaning that browsing to a folder is always like the [slow] first time. It also doesn’t scale: ask it to browse 500 PDFs and you’ll run out of memory, if it doesn’t crash first.

I thought this was the end of the road, when I found “A-PDF Thumbnailer”: http://www.a-pdf.com/thumbnailer/

“A-PDF Thumbnailer” is a very focused product, strictly aimed at creating thumbnails from selected PDF files. As an extra, it can also generate HTML pages with tables containing the thumbnails and linking to the original documents, so that you can later navigate the collection using any web browser. This program doesn’t worry about having an integrated viewer, a caching system or a search function – it is totally specialized in creating thumbnails. And guess what? It works fine! Finally, something that just works, without crashing, without eating up all the available RAM, allowing the computer to remain usable, and providing a *near* solution for browsing large PDF collections.

Use it in two steps: (1) select the PDFs to thumbnail and drag them to the program’s window, (2) press generate. Then wait. For thousands of files, you’ll eventually have to wait hours, but it is faster than all the other alternatives I mentioned. Below, the image shows a batch of 3381 PDFs being processed by the program.

 20100121_apdf_thumbnailer_02

One very important aspect is that it just works! The thumbnail extraction process can fail, but the number of failures with A-PDF’s solution isn’t greater than with Adobe’s Bridge or any of the other tested alternatives.

So, why the *near*?

Five reasons:

(1) if a file’s name has some strange characters, that file will be ignored and spark an error message;

20100121_apdf_thumbnailer_01

(2) sometimes it can silently hang on the conversion of a file;

(3) sometimes after not being able to convert a file, it can fail all the following files – this is its most annoying error;

(4) it is NOT recursive: if you drag a folder with no files, but having other folders containing PDFs, it won’t find a single job;

(5) the HTML output is limited to tables (and I can live with that), but the tables are limited to 50 rows and 12 columns, which is not enough for many cases, although extra tables are built and linked. Why not just cut the limitations and allow a single page, single table document?

It is a solution for me?

It [nearly] is, for now: once the thumbnails have been extracted, I can browse the folders using irfanview or ACDSee, not having to worry with PDF processing, available RAM and crashes.

“A-PDF Thumbnailer” is the simplest, yet most effective solution I found for browsing large collections of PDFs, albeit flawed.