[lug] Combining pdf documents

rm at fabula.de rm at fabula.de
Tue Jan 14 07:30:56 MST 2003

On Mon, Jan 13, 2003 at 11:17:05AM -0700, J. Wayde Allen wrote:
> On Mon, 13 Jan 2003 rm at fabula.de wrote:
> > I assume your question implies that you don't want/can't use
> > Acrobat for that task (not the reader - inserting documents is
> > only available in the full version).
> No, I needed to explore the range of possibilities.  We do have the option
> of using Acrobat for this task.  The actual question is how to take papers
> submitted to the ISART conference
> <http://www.its.bldrdoc.gov/meetings/art/paper_instructions.html> and
> combine them to create the proceedings.

Ah, that sounds like a tricky job. I've toyed arround with Perl's
PDF::API2 module yesterday just to see what would be possible without
getting into deep sea hacking: I hacked together a simple script in
no time, but -- depending on the input docs -- the module eats memory
(and looking at the code i'd say it still needs a lot of redesign and
cleanup :-/ ). However: looking at your specific requirements 
(esp. modifying content like page numbers) i'd strongly advise against
using pdf as a submission format. PDF is a display format -- almost no
structural information is present. You'd have to request that all your 
authors use some special _visual_ markup to tag relevant bits of information
(like: use Adobe-Comic-Sans for page numbers so our program can identify them
during processing).

> > In theory it should be possible to combine pdf documents by
> > reading their dictionaries (the last object in a file - the
> > toplevel/root object so to say) and adding all object trees
> > to a newly created root object (but you would need to renumber
> > all objects to avoid duplicated object IDs). Doable, but most
> > likely not fun ....
> Actually, it turns out that there is indeed a provision in the Acrobat
> system for combining multiple pdf's into a single pdf.  For information on
> that see <http://adobedoc.kanisasolution.com/Acrobat5/Help.htm>.  The real
> catch seems to be how to renumber pages.  We are still tinkering with
> making this work right but the information I've found is:
>    http://adobedoc.kanisasolution.com/Acrobat5/Help.htm
>    http://www.planetpdf.com/mainpage.asp?webpageid=2180
> Our manuscript editor has been playing with the technique in the second
> URL here and has had some success.

Yes,  but i think you'd want to have a solution that worksreliably all 
of the time ...

> In any case, our fall back position is simply to create whatever front and
> back material we want.  Print everything to paper, stack it all in the
> order desired.  Use a bit of white out if necessary and one of those old
> fashioned things called a typewriter to renumber the pages if necessary,
> and submit this stack to the printer.  That is the format the printer
> wants anyway.

> The reason for not wanting to work with source material is that technical
> papers tend to contain a good number of equations.  The variation in
> possible document creation systems as well as the danger of accidental
> font substitution is a problem we have to deal with all too often.  The
> hope is that pdf's with embedded fonts will minimize this.

Oh, i'd use either (La)TeX (if the submitters are competent computer users)
or some sort of XML language (MathML looks like it's capabe of handling
fairly complex equations) and transform it to LaTeX during processing.
The nice side effect of this: your publication looks much more uniform.

       Ralf Mattes
> - Wayde
>   (wallen at lug.boulder.co.us)
>       --------------------------------------------------------
>                             ISART 2003
>        International Symposium on Advanced Radio Technologies
>          http://www.its.bldrdoc.gov/meetings/art/index.html
>       --------------------------------------------------------
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug

More information about the LUG mailing list