Home » Programming » Archive by category "Python"

Convert HTML to PDF with HTML2PDF Web Service

HTML2PDF Web ServiceRecently I launched my new product HTML2PDF Web Service — a web service for converting HTML to PDF.

In this post I’d like to talk about HTML2PDF Web Service. Why to choose it, how to use it and what technologies were used to create it.

Why Choose HTML2PDF Web Service?

Programmatically generating PDF documents is a painful and time consuming problem that neither makes your developers nor designers happy. With HTML2PDF Web Service you can design your invoices or reports in HTML, style them with CSS and convert the resulting page into a PDF document. Using HTML2PDF Web Service saves your developers and designers time which is better spent making your product better.

Say your web application or mobile app (or any application for that matter) needs to generate invoices or reports in PDF format. Unless you can install special HTML to PDF conversion software you’re probably stuck with some of the libraries available for your language that can programmatically generate PDF documents. To do this you would probably design your document in something like MS Word, LibreOffice Writer or perhaps HTML. After this design has been approved you can start programming your PDF module; setting up coordinates, font sizes etc. And then all of the sudden you notice your library has limited support for doing actual document layouts and presenting tabular data that can span multiple lines. Now you need to write your own routines for splitting text over multiple lines, keep track of coordinates and make sure nothing overlaps. If like me you’ve already been there, it’s quite the nightmare.

So being able to design in HTML, style with CSS (heck, even use a bit of JavaScript) and convert the resulting page to PDF would speed up this process a lot. Am I starting to tickle your interest?

How to use HTML2PDF Web Service

Simply create your soon to be PDF documents in HTML, style them with CSS and if wanted you can use JavaScript as well. The final document is best previewed in a WebKit based browser such as Google Chrome, since that’s the technology HTML2PDF Web Service uses in the background to render the HTML and convert it to PDF.

Here are some examples on how to call the web service. Converting HTML to PDF is easy with the HTML2PDF Web Service. You can pass an URL to the page you want to convert or either send the HTML code with the request.

cURL

$ curl -H "X-API-Key: F8802062-4D31-11E3-8F59-BFD4058B6BFF"
       -H "X-API-Username: MyUsername"
       -d '{"content":"<html><head><title>My page</title></head><body><h1>Hello World!</h1><p>I am an HTML page converted to PDF!</p></body></html>"}'
       https://html2pdfwebservice.com/api/convert > page.pdf

Perl

#!/usr/bin/env perl
use strict;
use warnings;
use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;
my $tx = $ua->post(
    'https://html2pdfwebservice.com/api/convert' => {
        'X-API-Username' => 'MyUsername',
        'X-API-Key'      => 'F8802062-4D31-11E3-8F59-BFD4058B6BFF'
    } => json => {url => 'http://domain.com/invoice.html'}
);
if (my $res = $tx->success) {
    my $pdf_data = $res->body;
}

Ruby

require 'net/https'
require 'uri'

uri           = URI.parse('https://html2pdfwebservice.com/api/convert')
https         = Net::HTTP.new(uri.host, uri.port)
https.use_ssl = true
# In case the SSL certificate isn't accepted
https.verify_mode = OpenSSL::SSL::VERIFY_NONE

req = Net::HTTP::Post.new(uri.path)
req['X-API-Username'] = 'MyUsername'
req['X-API-Key']      = 'F8802062-4D31-11E3-8F59-BFD4058B6BFF'
req.body              = '{"url": "http://domain.com/invoice.html"}'

res = https.request(req)
if res.code == '200'
    pdf_data = res.body
    # - or write to file -
    # File.open('invoice.pdf', 'w') { |file| file.write(res.body) }
end

PHP

$settings = array(
    'url' => 'http://domain.com/invoice.html',
);

$curl = curl_init();
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode($settings));
curl_setopt($curl, CURLOPT_HTTPHEADER, array(
    'X-API-Username: MyUsername',
    'X-API-Key: F8802062-4D31-11E3-8F59-BFD4058B6BFF'
));

curl_setopt($curl, CURLOPT_URL, 'https://html2pdfwebservice.com/api/convert');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Helps to debug in case of issues
// curl_setopt($curl, CURLOPT_VERBOSE, 1);

// In case the SSL certificate isn't accepted because of outdated certificates
// on your server
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

$res = curl_exec($curl);
// Save PDF to disk
file_put_contents('document.pdf', $res);
curl_close($curl);

Technologies used to develop HTML2PDF Web Service

The most interesting part in developing HTML2PDF Web Service was choosing which technology to use for converting HTML to PDF. After doing research on the subject and testing several solutions I eventually went with a WebKit based solution. By using WebKit it’s easier for the end user to preview their document using a WebKit based browser.

The HTML to PDF conversion server was developed using Go. Go is a fun language to program with, does concurrency in a really nice way and can produce a native executable for Linux, OS X, Windows and some other platforms. Thanks to Go the conversion server is fast, snappy and low on memory and CPU usage. Being able to create a binary executable allows me to sell the conversion server as a standalone product as well.

To get access to the web service there’s also a web application which is written in Perl. My favorite web framework of choice has become Mojolicious for quite some time now and thus HTML2PDF Web Service has been written with it. DBIx::Class has been used for database interaction and Validation::Class is used to validate all user inputted data.

Used databases are PostgreSQL and Redis. The former is used to store user accounts, subscriptions and more. The latter is used to keep track of token usage per user.

Sign up now for a free trial

If after reading all this and you’re still reading, please do sign up for a free trial. The trial gives full access to all the features of the web service so if you like it, please consider buying a subscription.

In case of any questions, please do contact me either through the comments on this page or send an e-mail to support at support@html2pdfwebservice.com.

Create a PDF document out of an HTML page

Perl has several modules on CPAN for creating and manipulating PDF files. Just a single search on PDF results in over 500 modules that have something to do with PDF files.

The most useful (or rather essential for PDF processing) are PDF::API2 and CAM::PDF. The former lends itself best for creating PDF’s and the latter for manipulating existing PDF’s and extracting data (such as plain text) from it.

Though these modules make handling PDF’s easier, handling PDF’s still isn’t much fun. As I was in need of a way to generate PDF’s out of work orders (or job tickets) and not feeling much for creating the layout manually and properly formatting paragraphs (manually) with PDF::API2 I started to look further.

I ended up trying out PDF::FromHTML. With PDF::FromHTML you can create a simple HTML layout and let the module create a PDF out of it. You can do some basic configuration such as changing fonts and font-size (check out its documentation for more). It also provides a nifty command line tool called html2pdf.pl for converting an HTML page to a PDF.

The resulting PDF’s from PDF::FromHTML weren’t as pretty as I had wanted, but good enough for the problem I needed solving. But after I started using these work order PDF’s in practice I found I needed more formatting freedom when writing the problem description. So I decided to add Markdown support through Text::Markdown.

Using Markdown I had added a list of tasks to a work order with the items being in bold text and the descriptions underneath it in normal text. Sadly the PDF’s created by PDF::FromHTML didn’t cope very well with nested HTML-elements. A bold paragraph would somehow cause the next paragraph become bold as well. I think that’s a bug in PDF::FromHTML and I’m sure it can be fixed and shame on me for not looking into it.

So instead of seeing if I could fix the bug I did a quick search on the internet and stumbled upon xhtml2pdf, which is provided by python-pisa/xhtml2pdf. Pisa is a Python library for converting HTML pages to PDF’s. It’s far more sophisticated than PDF::FromHTML as it supports more (all?) HTML tags and even CSS2 (plus some CSS3 stuff) for styling.

Currently my webapp will be using xhtml2pdf if it’s available or either fall back to PDF::FromHTML.

Some other interesting Perl PDF modules worth looking into some day are PDF::Boxer and PDF::TextBlock. And while writing this post I also found out that PhantomJS, a headless WebKit, also has a way of saving a page to PDF. So even though handling PDF’s still isn’t a lot of fun, with all these modules and software available it has become a lot easier.

Want to use a web service to convert HTML to PDF? Then take a look at HTML2PDF Web Service.

Remote controlled boat using the Raspberry Pi with an Arduino

In the last 8 weeks or so I was working together with 3 other students researching some of the possibilities of the Raspberry Pi and what it could offer the school for future education and research purposes. The results of the many small research subjects came together in a remote controlled boat.

Sadly due to me getting sick this week I haven’t been able to properly finish the project myself, but the other team members were able to though.

The boat’s (server) software is a Python app running on the Raspberry Pi (model A). This app is responsible for communicating with the Arduino through I²C and basically tells the Arduino which channels to modify. The Arduino runs a simple C program for this. There are 2 separate motor controllers which are connected to the Arduino. The rudder is a single servo which is also connected to the Arduino. Initially we had planned on mounting a camera on the pan-tilt mechanism which is located up front, but the ideas we had for implementing computer vision didn’t quite work out. The pan-tilt mechanism has two servo’s, both connected to the Arduino. So in short the Arduino drives all the hardware and the Raspberry Pi tells the Arduino what to do.

The Arduino itself is ‘mounted’ to the Raspberry Pi with the use of a custom made connector board. The Arduino is placed on top of that. On top of the Arduino is a custom made shield which has pins for connecting the motor controllers and servo’s and powers both the Arduino and Raspberry Pi.

We decided to call it the Stackberry Pi.

Stackberry Pi

Here’s our workroom where the boat was being worked on.

Raspberry Pi + Arduino powered boat

And finally its first test run!

Raspberry Pi + Arduino powered boat

But motionless pictures are no fun. Here’s a video of the boat’s first test run.


In case you’d like to hear more details about the boat please do ask.