pdffing.htm: How to search the web, by fravia+ pdffing

~ Those annoying pdf files ~

				Those annoying pdf files

Version november 2000

this section is not yet fully operative

Ok, zugegeben: pdf files are a pain in the stomach: cumbersome, difficult to grep, search, and automate for retrieval, awkward for cut and pasting purposes, clogging down your computers with the Acrobat overload. But they have also some positive aspects, of course, hence people still use them, and you will find USEFUL pdf-files every now and then on the web - see for instance the very important scooter's manual (altavista's spider) at
http://altavista.software.digital.com/docs/search/reference_manual.pdf Evidently -once found- you may want to fiddle with them: use them, catalogue them, grep them, whatever.
But, alas, pdf files, are annoying: they can be write protected, password protected, whatever. Searchers should, of course, know how to overcome these small annoyances. Hence this section. Enjoy!

New essays on this page

[The use of [email protected]] by fravia+ (September 2000)
[Converting an Acrobat PDF into ASCII] by Wolfgang Redtenbacher (March 2000)
[Enabling Print-Challenged PDF Files] by Kayaker (March 2000)
[Converting Word-Docs (or anything you like) to PDF] by ashok hariharan (November 2000)

New essays, on other pages

How to create PDF-Files
from any application that allows printing
(The GSview approach)
by hassan, May 2000

Old essays on other pages
(note that phase 1 and phase 6 are related for pdf purposes)

PHASE 1, by JimBob, October 1997
~

The Aerial trick
How to crack any pdf security setting
(The Aerial RTF format converter)

PHASE 2, by zeezee, October 1997
~

Create PDF documents for free reversing Adobe PDF Writer
You know how to create .pdf documents? No? I will -shortly- explain it
(another nice HIEW lesson)

PHASE 3, by Zer0+, November 1997
~

Quick starting notes and pdf again
(boolean variables inside PDF files)

PHASE 4, by Ragica, November 1997
~

A Response to +ORC's Message Regarding reversing PDF
the biggest collection ever (in fact the only collection ever so far!) of information regarding hacking PDF and links to relevant information
(Kevin Lair CGI-hacks, the GhostScript hack, many good starting point for the USER crack)

PHASE 5, by SiuL+Hacky, November 1997
~

Linux cracking: the live approach (acrobat reader)
Linux advanced reverse engineering: imported functions

PHASE 6, by Snatch, November 1997
~

Cracking all nag-screen and time-trial protections (Aerial32 as example)
(Resource-ID fishing)

Converting Word-Docs (or anything you like) to PDF

Thank to ashok hariharan for having collated these useful infos

Converting Word-Docs (or anything you like) to PDF

The word document is a very lousy format -- it just takes up too much
space.  Now you can convert all your huge documents to liteweight pdfs.
Just follow the following steps blindly :
1) Goto the start menu -> settings -> printers and select 'Add Printers'
2) Select 'Local Printer' when it prompts you for a local or network printer.
3) From the list of printers select any printer ending with 'PS' , 
this indicates that the printer has PostScript support. I generally chose 
something like 'HP Color Laserjet 5/5M PS'.
4) Click Next and in the next screen select the Port as FILE:.
5) Click Next again and finish. (Say no to default printer).
6) Download and install GhostView from http://www.cs.wisc.edu/~ghost/
7) Now launch the Word Document that you want to convert to PDF in 
winword.
8) In winword select File->Print.  In the printer name select the name of
the printer that you just added. and check the option 'Print to file'. Now 
click OK.
9) In the 'Print to File' Save As dialog save the file to a folder as filename.prn.
10)Now launch ghostView .
11)Use file->open to open filename.prn in GhostView.
12)Now use file->print.  The printer setup dialog is displayed.
13)Select 'device:' as pdfwrite, select 'resolution:' as 300 , select 
'Print to File'  and click ok, enter the output file name when prompted as
your filename .pdf. 
14)Thats it ! you can now view your old word document as a PDF file in 
acrobat reader.

Converting an Acrobat PDF into ASCII

Thank to Wolfgang Redtenbacher for the bulk of the following advices about converting an Acrobat PDF file into ASCII text

Several solutions like using Ariel (that can be easily cracked) and sending an e-mail to "[email protected]" have been suggested.

What does not seem to be known widely, however, is the fact that there exist freeware programs to convert PDF to TXT locally.

One solution is to download Acrobat 4.0 (ca. 6 MB) from www.adobe.com, plus the accessibility plug-in (ca. 1.2 MB) from "access.adobe.com". This plug-in permits you to load a PDF file into Acrobat and save it as .TXT or .HTM.
An even better solution (regarding program size and conversion quality) is the program "pdftotext" which is part of the XPDF-package (a freeware PDF viewer for several operating systems).

You need to download the following files:

DOS: ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.90-dos.zip (1298148 bytes) and
ftp://ftp.cdrom.com/pub/infozip/MSDOS/gzip124.exe (119146 bytes)

Win32: ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.90-win32.zip (584326 bytes) and
http://www.gzip.org/gzip124xN.zip (62203 bytes)

After unpacking, you only need 1 file from each of the archives (either DOS or Win32):

pdftotext.exe (964341 bytes/DOS resp. 354304 bytes/Win32) gzip.exe (39910 bytes/DOS resp. 91648 bytes/Win32)

Move these 2 files into a directory that is in your search path (environment variable PATH= ...), enter the command "pdftotext xyz.pdf", and within seconds you get an ASCII text conversion result in the file "xyz.txt" ("xyz" has to be replaced by the real file name, of course).

NOTE: While the Win32 version of "pdftotext.exe" is more compact than the DOS version (which contains additional DOS extender code), it does not work with the widespread DOS version of "gzip.exe" as it needs gzip with long file name support. Therefore make sure to use either both programs in the DOS version or both in the Win32 version. (The DOS version runs flawlessly on a Win32 platform - it is just a bigger EXE-file.)

Enabling Print-Challenged PDF Files

Enabling Print-Challenged PDF Files

I've seen a number of queries recently about printing PDF files when the
Document Security doesn't allow printing, so I thought I'd pass this along
before I file it with my notes.

With Acrobat Reader 4.0 all that is required is to enable the Print menu item,
enable it - you print. There is no second check before it goes to the code that
actually calls the Print Common Dialog! This is different from Acrobat Reader 3.x,
where there is a second check, which stupidly gives you a Message Box saying
"This Operation is Not Allowed" if you try clicking on your newly enabled Print
function. By saying essentially "You shouldn't have been able to do this", it
gives you the reverser something to work with to bypass it of course.

The check for whether printing is allowed occurs as soon as you click on the File
drop down menu. I was using the Building Win95 Apps PDF by Kevin Goodman which is
available here and there.

You all know the Win32 API function EnableMenuItem of course:

The EnableMenuItem function enables, disables, or grays the specified menu item.

BOOL EnableMenuItem(

HMENU hMenu, // handle to menu
UINT uIDEnableItem, // menu item to enable, disable, or gray
UINT uEnable // menu item flags

The 2nd parameter specifies the menu item under question and will be either the
identifier of the menu item if given by the uEnable parameter MF_BYCOMMAND flag,
or the relative position of the menu item if given by the MF_BYPOSITION flag.
The MF_BYPOSITION flag is normally used.

We can find out the position of the Print menu item in the drop down list with:

The GetMenuItemID function which retrieves the menu item identifier of a menu
item located at the specified position in a menu.

UINT GetMenuItemID(

HMENU hMenu, // handle of menu
int nPos // position of menu item

If a regular menu item, the Return Value is an identifier.
If a submenu, the Return Value is 0xFFFFFFFF.
If a separator, the Return Value is 0.

If you cycle through the GetMenuItemID function by setting a breakpoint on the
2nd parameter (the 1st parameter PUSHed in SoftIce), you see an interesting pattern
forming. The first item in the drop down list is given the position # 0 (hex) and
an identifier as a return value, the second #1 and so on, including the separators.

The following table can be made (and I apologize for the /PRE formatting spacing):

Position	Identifier		Menu Item

0		1770			Open

1		0			Separator

2		1772			Close

3		0			Separator

4		1774			Page Setup

5		1775			Print

6		0			Separator

7		FFFF			Document Info (submenu)

8		FFFF			Preferences (submenu)

9		0			Separator

A		1783			Adobe Online

B		0			Separator

C		1785			Recent File 1

D		17CC			Recent File 2 (up to 4 files)

E		0			Separator

F		1787			Exit

So, without even going to this trouble we can deduce what the BYPOSITION position
value will be for that 2nd parameter of EnableMenuItem simply by counting the
number of menu items, including separators, in the drop down list.

OK, now what? We know that somewhere between the time the identifier (1775) is
allocated to the Print menu item by GetMenuItemID, and the EnableMenuItem function
is called, there is a check to see if this file is actually supposed to be printable.

So how about doing a TRACE between the two and see what's going on?

We want the first break to be when the 2nd parameter of GetMenuItemID (the first
parameter PUSHed) is equal to 5 (the position number of Print in the drop down list).

The address of the 1st parameter on the function call stack is given by (ESP+4),
the 2nd by (ESP+8), so this works:

BPX GetMenuItemID IF *(SS:ESP+8)==5

If we set up a macro to display this address in the data window we can verify it
broke at the right time:

MACRO Position = "dd SS:ESP+8"

and the first BPX can become:

BPX GetMenuItemID IF *(SS:ESP+8)==5 DO "Position"

Break here, F11 and notice the menu identifier (1775) in EAX and the position (5)
in the first line of the data window.

Set up the second breakpoint similarly:

BPX EnableMenuItem IF *(SS:ESP+8)==5 DO "Position"
(again, we are looking at the 2nd parameter on the stack)

Then set up the Trace. You may want to increase the Trace Buffer size from the
default of 8K.

TASK will give the Taskname Acrord32

BPRW Acrord32 T will set the trace

Press F5 and you will break back into SoftIce after the code between the two
function calls has been executed. F11 to return to Acrord32. You might temporarily
toggle out the Register/Data/Code windows with WR/WD/WC and maximize LINES before
typing SHOW to display the trace.

SHOW 1 will show the last command executed

000001  0137:0054C117  FF159C295700    CALL  [USER32!EnableMenuItem]

and you can use the arrow keys to scroll up and down.

A full screens' worth of this trace gives a nice screendump with the Icedump
PAGEIN N c:\filename.txt command.

Looking back through the trace code, you quickly see a suspicious jump. You can
patch this or force EAX to 1 a few lines back by changing

SBB EAX, EAX
INC EAX (EAX=0)

into

XOR EAX, EAX
INC EAX (EAX=1)

I won't give the actual addresses to patch, as that would take the fun out ;-)

Please correct any mistakes I might have made.

Cheers, Kayaker

The use of [email protected]

This is an extract from an email I made some time ago... common knowledge (see also Wolfgang Redtenbacher's contribution, but hey! it works fine for me!

More and more documents are stored in Adobe's pdf format on the Web.

That may be fine for frill-formatting purposes, but quite annoying for the rest of us, since pdf files are quite cumbersome for cut & paste and for search & grepping purposes. I have realized that many don't know that there's a nice (email) utility by Adobe itself for those of you that prefer plain *.txt files (that can be searched, cutted, pasted or grepped ad libitum).

Simply send an email with your pdf files attached (i.e. use the "insert file" option) to the following email address:

[email protected]

You don't need to send either text or subjects.

After a couple of minutes: "Hey bingo!" you'll get your text file emailed back to you (for free of course).