Random Monday : Fun with Batch Files - Converting PDFs to TXTs

I was fiddling with Kryloff Technologies’s GetText last week to export the first 100 issues of Computer Gaming World to TXT format to read on my mobile phone using TequilaCat Book Reader, when I realized that repeatedly typing and then cutting and pasting commands at the Command Prompt 100 times isn’t a good idea to spend a precious weekend. Admittedly, the plain TXT format isn’t the best way to enjoy these treasures but I’ve little time to read except on the way to work :(

Now writing a program to do this in C# .NET in which I write code at work would be a trivial task but it seems like overkill for such a simple task, so I decided to brush up on my rusty knowledge of DOS commands to automate or at least semi-automate this process.

Do read on if you’re interested to understand how the batch files work, otherwise simply scroll to the bottom of this post to download the premade batch files to convert either a single file or a folder of PDFs to text.

Now creating a batch file (a TXT file with a BAT extension) with the following command:

%~dp0gettext.exe %1 %1.txt

and dropping it into the GetText folder quickly allows me to drag and drop any PDF (or in fact any file convertible by GetText) onto the batch file and immediately convert it to a .TXT file in the same directory as the file to convert.

%~dp0 is substituted with the full path of the batch file without the filename of the batch file which essentially points to the full path of the GetText application.

%1 refers to the file being dropped onto the batch file. For some systems, I found that I needed to enclose %1 with double quotes to cater for paths with spaces in them whereas other systems handle long file names without needing double quotes.

Now automatically converting a single file isn’t too shabby but it’s still not good enough. Dragging and dropping 10 files is OK but not 100 :( So I Googled for some resources and found Rick Lively’s Command Reference, a handy guide for all DOS/Command Prompt commands together with notes and examples.

Now armed with a FOR loop from this reference, I created another batch file to process all the files in a folder.

for %%f in (%1\*.pdf) do %~dp0gettext.exe %%f %%f.txt

A short explanation of what’s going on here.

%1 is your folder which you dropped into the batch file.

The FOR statement then takes all files in this folder with a PDF extension and passes each applicable file to GetText as the variable %%f sequentially.

Not bad for a single line of text. Sometimes, a knowledge of some common DOS commands can handle certain tasks faster than writing code :)

Check out Kryloff Technologies’ GetText utility, Rick Lively’s downloadable Command Reference or my premade batch files. Create shortcuts to these two batch files and you can use them in most situations to convert your text files by dragging and dropping files or folders over the batch files respectively.

MORE @ THE DOWNLOAD MUNKEY:
Read Books on Your Mobile Phone with TequilaCat BookReader
Text Mining Tools - Extracting text from CHMs
Wikibooks, WikiType & PDF Wikis

Tags:

No Comments.

Write a comment: