Create a PDF document out of an HTML page

Perl has several modules on CPAN for creating and manipulating PDF files. Just a single search on PDF results in over 500 modules that have something to do with PDF files.

The most useful (or rather essential for PDF processing) are PDF::API2 and CAM::PDF. The former lends itself best for creating PDF’s and the latter for manipulating existing PDF’s and extracting data (such as plain text) from it.

Though these modules make handling PDF’s easier, handling PDF’s still isn’t much fun. As I was in need of a way to generate PDF’s out of work orders (or job tickets) and not feeling much for creating the layout manually and properly formatting paragraphs (manually) with PDF::API2 I started to look further.

I ended up trying out PDF::FromHTML. With PDF::FromHTML you can create a simple HTML layout and let the module create a PDF out of it. You can do some basic configuration such as changing fonts and font-size (check out its documentation for more). It also provides a nifty command line tool called html2pdf.pl for converting an HTML page to a PDF.

The resulting PDF’s from PDF::FromHTML weren’t as pretty as I had wanted, but good enough for the problem I needed solving. But after I started using these work order PDF’s in practice I found I needed more formatting freedom when writing the problem description. So I decided to add Markdown support through Text::Markdown.

Using Markdown I had added a list of tasks to a work order with the items being in bold text and the descriptions underneath it in normal text. Sadly the PDF’s created by PDF::FromHTML didn’t cope very well with nested HTML-elements. A bold paragraph would somehow cause the next paragraph become bold as well. I think that’s a bug in PDF::FromHTML and I’m sure it can be fixed and shame on me for not looking into it.

So instead of seeing if I could fix the bug I did a quick search on the internet and stumbled upon xhtml2pdf, which is provided by python-pisa/xhtml2pdf. Pisa is a Python library for converting HTML pages to PDF’s. It’s far more sophisticated than PDF::FromHTML as it supports more (all?) HTML tags and even CSS2 (plus some CSS3 stuff) for styling.

Currently my webapp will be using xhtml2pdf if it’s available or either fall back to PDF::FromHTML.

Some other interesting Perl PDF modules worth looking into some day are PDF::Boxer and PDF::TextBlock. And while writing this post I also found out that PhantomJS, a headless WebKit, also has a way of saving a page to PDF. So even though handling PDF’s still isn’t a lot of fun, with all these modules and software available it has become a lot easier.

Want to use a web service to convert HTML to PDF? Then take a look at HTML2PDF Web Service.

Using HTML2PDF Web Service you can design in HTML and CSS, and convert the resulting page to PDF. Free trial available!

toa

August 12, 2013 at 22:44

there is a way to use webkit using gtk

check http://bratislava.pm.org/presentation/WebKit-en.pdf

another option is a wkhtml2pdf with virtual framebuffer (xvfb-run)

Htbaa

August 12, 2013 at 22:56

That’s a nice resource! Looks like you can do some stuff with Cairo as well. GTK wouldn’t be required for generating a PDF with WebKit though right? Because that would be quite a big dependency.

Looks like for wkhtml2pdf there’s also a Perl module called PDF::WebKit (https://metacpan.org/module/PDF::WebKit). Shame I didn’t find that one before. But I must say Python’s xhtml2pdf tool does a good job as well.

So instead of using PhantomJS for rendering I could also turn to using the WebKit wrapper.

August 13, 2013 at 00:06

> Looks like you can do some stuff with Cairo as well. GTK wouldn’t be required for generating a PDF with WebKit though right?

Maybe, i didn’t try Cairo without Gtk bindings

Dominic Skinner

January 26, 2017 at 11:51

There are lots of API’s that help you to do this as well in Perl. Such as GrabzIt’s HTML to PDF API: http://grabz.it/html-to-pdf-image-api.aspx

January 26, 2017 at 13:12

I’ll stick to my own solution: HTML2PDF Web Service at http://html2pdfwebservice.com 🙂

Related Posts

5 thoughts on “Create a PDF document out of an HTML page”