What are the best tools or services for converting a page of a PDF file to an image?
The first thing that usually shows up for converting PDF to PNG or JPEG is ImageMagick's
convert command, but it uses Ghostscript which has had security issues, so that was out of the question.
As an alternative, my first thought was to render the PDF with PDF.js in Puppeteer, then take a screenshot of the full page.
I then realised that PDF.js can run on a server with Node, rendering the PDF to a canvas provided by node-canvas and then saving it as PNG or JPEG.
This approach worked nicely locally, but it had problems running as a web service on now.sh because the PDFs were expecting a standard set of fonts to be available on the system, rather than having them embedded in the PDF.
Running the web service as an Express app in a Docker container based on the full
node:12 Docker image was the answer. Given a PDF URL, this web service can produce PNG or JPEG images of any page in a PDF at any size.
I was then looking at image conversion library libvips. It uses pdfium for reading PDF files if available, otherwise it uses poppler. It seems like pdfium isn't so easy to work with at the moment, so I didn't try too hard to build the library and make it available to vips. Instead, I simply ran
vipsthumbnail test.pdf --size 1600x -o test.jpg and got an image of the first page!
Finally, I was looking at Cloudinary's image transformation API, which supports lots of image formats and paged media like PDF. With this API, converting a PDF page to a JPEG image is simply a matter of building the right URL.