Home » Archive by category "Web Development" (Page 3)

Create a PDF document out of an HTML page

Perl has several modules on CPAN for creating and manipulating PDF files. Just a single search on PDF results in over 500 modules that have something to do with PDF files.

The most useful (or rather essential for PDF processing) are PDF::API2 and CAM::PDF. The former lends itself best for creating PDF’s and the latter for manipulating existing PDF’s and extracting data (such as plain text) from it.

Though these modules make handling PDF’s easier, handling PDF’s still isn’t much fun. As I was in need of a way to generate PDF’s out of work orders (or job tickets) and not feeling much for creating the layout manually and properly formatting paragraphs (manually) with PDF::API2 I started to look further.

I ended up trying out PDF::FromHTML. With PDF::FromHTML you can create a simple HTML layout and let the module create a PDF out of it. You can do some basic configuration such as changing fonts and font-size (check out its documentation for more). It also provides a nifty command line tool called html2pdf.pl for converting an HTML page to a PDF.

The resulting PDF’s from PDF::FromHTML weren’t as pretty as I had wanted, but good enough for the problem I needed solving. But after I started using these work order PDF’s in practice I found I needed more formatting freedom when writing the problem description. So I decided to add Markdown support through Text::Markdown.

Using Markdown I had added a list of tasks to a work order with the items being in bold text and the descriptions underneath it in normal text. Sadly the PDF’s created by PDF::FromHTML didn’t cope very well with nested HTML-elements. A bold paragraph would somehow cause the next paragraph become bold as well. I think that’s a bug in PDF::FromHTML and I’m sure it can be fixed and shame on me for not looking into it.

So instead of seeing if I could fix the bug I did a quick search on the internet and stumbled upon xhtml2pdf, which is provided by python-pisa/xhtml2pdf. Pisa is a Python library for converting HTML pages to PDF’s. It’s far more sophisticated than PDF::FromHTML as it supports more (all?) HTML tags and even CSS2 (plus some CSS3 stuff) for styling.

Currently my webapp will be using xhtml2pdf if it’s available or either fall back to PDF::FromHTML.

Some other interesting Perl PDF modules worth looking into some day are PDF::Boxer and PDF::TextBlock. And while writing this post I also found out that PhantomJS, a headless WebKit, also has a way of saving a page to PDF. So even though handling PDF’s still isn’t a lot of fun, with all these modules and software available it has become a lot easier.

Want to use a web service to convert HTML to PDF? Then take a look at HTML2PDF Web Service.

HTML2PDF Web Service - Convert HTML to PDFUsing HTML2PDF Web Service you can design in HTML and CSS, and convert the resulting page to PDF. Free trial available!

Twitter Bootstrap 3

It looks like Twitter Bootstrap 3 is just around the corner! I’ve been using Twitter Bootstrap 2 for an internal project and have used it for other quick prototypes as well (including the first Twitter Bootstrap). It’s an awesome framework for quickly putting a nice looking user interface together.

Version 3 promotes itself as mobile-first. Which I guess is getting more and more important these days.

At a first glimpse it looks like for Twitter Bootstrap 2 users the grid system has changed the most. The default style has changed a bit as well. No more gradients and box-shadows as well (but it seems they will come back). Though you can always add them back I liked the default style of version 2. A full list of changes can be found at Github.

A post with the best new features in Bootstrap 3.0 gives more in depth information on what changed. Good stuff!

Vim essentials: Powerline

I thought it would be nice to start a series of posts on neat tricks and plugins for Vim, called Vim Essentials. I don’t know how often I will do these kind  of posts, but I’ve got a couple in mind already.

This first post I’ll start off with Powerline. A plugin for Vim that gives you a better status bar than the default status bar. Your current mode, filetype and used file format is more clear. See the picture below for an example.



After you download and install Powerline you also need to make sure that your .vimrc/.gvimrc has the following lines:

set laststatus=2   " Always show the statusline
set encoding=utf-8 " Necessary to show Unicode glyphs

Adding these two lines makes sure the status bar is always being shown and the characters used in it can be displayed.

Looking at the screenshots provided by the author of Powerline it should also be possible to display the current Git branch, but I’ve yet to find out how to do so. The version I’ve linked here is deprecated in favor of the Python rewrite, which can be found at https://github.com/Lokaltog/powerline.

Speeding up your website with Varnish

Since a week or 2 I’ve been using Varnish in front of my websites on my server. Varnish is marketed as a “web application accelerator”. To be honest, the homepage of the project doesn’t really give a good description of what it does. For that you’ve got to dive a little bit deeper into their website; the about-page actually advises you to read the Wikipeda article.

Basically it’s a HTTP reverse proxy that caches the made requests into virtual memory. It’s also capable of doing load balancing and supports health checking of your backends (e.g. the actual apps generating and serving your pages).

So why cache HTTP requests into memory? Several reasons actually:

  • Your webapp will most likely take more time to generate a page than it takes to serve a static copy of it. Don’t believe me? Benchmark a WordPress site.
  • Since your webapp will most likely generate the same page for the same route over and over it’s a waste of CPU to do so. With Varnish caching the request it only has to do this once. Without Varnish (or any other caching at your server stack) you can do some caching on the client side, but that still requires every visitor to request the page from your app at least once.
  • Continuing from the last sentence of the previous point, imagine getting Slashdotted. Since Varnish will be serving a cached copy of the page you’re offloading your app and your server’s resources won’t get eaten up by all your new simultaneous visitors.

This all sounds very nice and it is, but there are some caveats. When Varnish caches a request it will not only look to the requested URL, but also the request headers. Since most websites can place a lot of Cookies on the visitor’s PC those are sent to your server as well, which are part of the request. When Varnish sees Cookies are being sent from the client it’ll directly forward the request to your app and never cache the page.

Luckily you can tell Varnish to ignore Cookies in certain cases so it’ll cache a page and serve a cached copy of it. The kind of Cookies you can usually ignore are those from third parties such as Google Analytics. Your app (server) probably doesn’t need them, so they can be ignored. The client-side Javascript that requires these Cookies still has access to them.

Pages with user specific content on your website such as greeting messages are also something you don’t want Varnish to cache. Since this usually requires a Cookie Varnish won’t cache the page anyway. In case you do use some other kind of identification check you don’t want Mike to see “Welcome back, John” when Mike visits your website after John. Dynamic content parts can be supported with Edge Side Includes (ESI).  ESI is similar to Server Side Includes (SSI). With ESI Varnish still caches your page, but dynamic content is inserted where ESI statements are put when serving the page.

As I’m still new to Varnish as well I’d like to hear about your experience. If I’m telling something here that’s off, please let me know so I can correct it. If in the future I learn some neat Varnish tricks I’ll blog about it here. For now I’m very pleased with running Varnish in front of my HTTP server and I feel pretty confident in case I get Slashdotted (unlikely though).


A little while ago I claimed the domain name meldpuntdierenleed.nl (animal cruelty reporting). For months I didn’t do anything with it, mainly due to health reasons.

But, after reading the Dive Into HTML5 book, which I posted about earlier and feeling a little better I decided to get something up and running. It’s a simple HTML5 page with some CSS3 styling. I’ve only really tested it in Chromium and FireFox 3.6. It’s only a static page with some links to websites likes the WSPA where you can report animal cruelty. Along with it are some Google Ads and Analytics is measuring statistics for me.

I don’t have any other plans yet with meldpuntdierenleed.nl but figured having something on it is better than nothing.