[Solved] PDF Generator Umbraco #help-with-other

[Solved] PDF Generator

Craig100

07/19/2024, 3:27 PM

Is there a decent PDF generator for Umbraco that works with V13? I've tried the WebWonders one but it won't install and looks like it hasn't been updated in a long while.

Dean Leigh

07/19/2024, 3:51 PM

We used Iron Pdf and it's pretty decent

Spinal

07/19/2024, 7:00 PM

does it work on Umbraco Cloud?

Craig100

07/19/2024, 7:29 PM

IronPDF installs but won't run. Keeps complaining about skiasharp not being available. So installed that separately, then it complains skiasharp.extended.svg not available. That doesn't exist in nuget that I can see. Might give up. The agency I'm working with said they'd had issues with it in the past anyway.

Dean Leigh

07/19/2024, 7:30 PM

I'm not sure but I think Moriyama Pdf preview uses it. Perhaps @AaronSadlerUK will know?

AaronSadlerUK

07/19/2024, 7:31 PM

Check out the install instructions for Moriyama.PreviewPDF it explains what to do

AaronSadlerUK

07/19/2024, 7:32 PM

https://github.com/Moriyama-Umbraco/Moriyama.PreviewPDF

Craig100

07/19/2024, 7:32 PM

Say's it'll only work on Windows servers

Craig100

07/19/2024, 7:32 PM

IronPDF says it'll run on .net 8.0 so shouldn't matter

AaronSadlerUK

07/19/2024, 7:35 PM

I've only tested on windows when I wrote it, it uses system. Drawing which is windows only... I meant just checkout the instructions as it says how to get past the error you had with ironpdf by excluding some stuff in the app settings

Craig100

07/19/2024, 7:43 PM

QuestPDF looks really good but it's just for creating a PDF from scratch, not by taking a URL or HTML string. https://github.com/QuestPDF/QuestPDF

Craig100

07/19/2024, 7:48 PM

Ok, did that and now: Could not load file or assembly 'Mono.Posix.NETStandard.

Craig100

07/19/2024, 7:51 PM

I suspect IronPDF are telling porkies and it only runs on Windoze.

Spinal

07/19/2024, 8:02 PM

I was able to pass the skiasharp.extended.svg by installing skiasharp.svg but more dependencies came in and couldn't go anywhere

Spinal

07/19/2024, 8:05 PM

I'm starting to lose faith on pdf generators in umbraco cloud and suggest to have this done on a webapi on a separate machine with windows

Craig100

07/19/2024, 8:10 PM

I'm on localhost and it still won't build without missing dependencies. The IronPDF tuts make it look easy. Just install it from Nuget and then start coding. Rubbish.

Craig100

07/19/2024, 8:10 PM

I've requested access to their Slack channel, but it's Friday and I won't my breath.

Spinal

07/19/2024, 8:12 PM

i was able to generate pdfs using syncfusionPDF... but requires execution of a chromium wrapper (executable) which does not work on umbraco cloud due to lack of permissions

Craig100

07/19/2024, 8:12 PM

It'd be great if we could just hit a button to fire up the browser's print as PDF function 🙂

Craig100

07/19/2024, 8:14 PM

What is it you need to do? I just want to PDF the visible page.

Craig100

07/19/2024, 8:15 PM

If you want to build one up in code then QuestPDF looks fantastic. Look at their top video on the github link I put up about 5 mins in.

Spinal

07/19/2024, 8:19 PM

I don't want to build one from scratch. I have the code done for all the PDFS (it takes an HTML file with graphs and tables)... it works perfectly fine locally or if I deploy in a non-umbraco cloud environment... problem is when it goes to umbraco cloud, can't generate pdfs (permissions). I only found out it requires an executable when I first deployed this to the dev environment. 😦

Spinal

07/19/2024, 8:20 PM

to build from scratch, I used ASPPDF.NET for 5 years at my previous company

Spinal

07/19/2024, 8:20 PM

loved that tool

Spinal

07/19/2024, 8:20 PM

super cheap (one off $299)

Spinal

07/19/2024, 8:21 PM

and built a bunch of books with it (actual catalogues for car parts)

Craig100

07/19/2024, 8:33 PM

Syncfusion is a PITA as well. What is it with PDF generating software creators?

huwred

07/20/2024, 7:53 PM

I've used https://selectpdf.com/community-edition/ in v10, should work in v13 I created a viewcomponent that generated a pdf from a url passed to it

Spinal

07/20/2024, 8:40 PM

it uses blink engine... if there's an executable like syncfusionPDF, it won't run on Umbraco Cloud unfortunately 😦

SiempreSteve

07/22/2024, 9:13 AM

I use SelectPDF via the API now. Removes any worry about Cloud / Azure Seems pretty solid

SiempreSteve

07/22/2024, 9:19 AM

To elaborate a little - I have a view render service, basically hit an internal url passing the user data and using a view that generates my statement. This generated html is then sent to the API via a custom pdf generator service. Then just serve up the memory stream returned. Works fine. It's behind a member login but you obviously need to be cautious on busting your api limits and what's generated etc.

Craig100

07/22/2024, 4:02 PM

Hmm it says PDF Select only works on windows. So not a true DotNet Core application. I develop on Linux so I need proper x-platform support.

Craig100

07/22/2024, 4:02 PM

I got in touch with IronPDF on their Slack channel, they're going to get back to me.

Jason

07/22/2024, 4:48 PM

HTML to PDF

Jason

07/22/2024, 4:49 PM

If you want to accurately render HTML, in a standards compliant way, the only way to do it is with a browser.

Jason

07/22/2024, 4:49 PM

Most HTML to PDF converters ship a HTML rendering engine, often just the rendering engine of the browser, sometimes an entire web browser wrapped up in a DLL.

Jason

07/22/2024, 4:49 PM

They're almost always rubbish.

Jason

07/22/2024, 4:50 PM

Instead of looking for a good HTML to PDF converter, why not look for a good HTML renderer that can generate PDFs?

Jason

07/22/2024, 4:53 PM

IME, the best of those is... Chrome. (FireFox is also available)

Jason

07/22/2024, 4:54 PM

You can control Chrome or Firefox with PuppeteerSharp.

Jason

07/22/2024, 4:54 PM

They even have a guide for generating PDFs in their docs: https://www.puppeteersharp.com/examples/index.html#generate-pdf-files

Jason

07/22/2024, 4:55 PM

Fully open source with a licensing model we can use without stressing about. Cross platform.

Craig100

07/22/2024, 6:46 PM

I really don't care which way round it works, I just want to print off the current web page to PDF and have it download. Thanks for the steer on pupeteersharp, I'll take a look 🙂

Craig100

07/22/2024, 6:53 PM

It's all very well having all that documentation but how can it possibly miss "Installation"? I have no idea how to get it on my machine, lol

Jason

07/22/2024, 6:55 PM

It's just a nuget package.

Craig100

07/22/2024, 6:59 PM

It's just one of my pet hates. Writing a document starting from half way in, assuming everyone knows what you're talking about. There's no hint of the word "install" nor "nuget" in any of the docs. Apparently we all "just know". Anyway, rant over, lol. Thanks. I'll have a go 🙂

Jason

07/22/2024, 7:02 PM

Sure, more adding the extra information because a lot of devs oversimplify the process to generate a PDF from an HTML document because "it's just text and images" without thinking about the fact that they're wildly different formats with very little in common internally. The initial title etc. doesn't make it clear your trying to turn a web page into a PDF rather than create from scratch.

Jason

07/22/2024, 7:06 PM

https://github.com/kblok/puppeteer-sharp/issues/new?title=Improve%20docfx_project/index.md&body=Explain%20how%20would%20you%20like%20this%20document%20to%20be%20impoved

Craig100

07/22/2024, 7:09 PM

Saw that. If I get it going, I might mention it 😉

Jason

07/22/2024, 7:14 PM

One thing to bear in mind with PuppeteerSharp - you will need to download and install the chrome or Firefox binaries. The package handles caching etc. but you may not want users to have to wait for download the first time they use the functionality. I recommend calling the download method in Umbraco's boot pipeline (make it fire and forget).

Craig100

07/22/2024, 9:08 PM

I can hit the method now. On

await page.GoToAsync("http://localhost:44342/features/");

it just reports ERR_EMPTY_RESPONSE at http://localhost:44342/features/ Just wondering if this is because it's on localhost. I haven't installed any Chrome code. Don't particularly want Chrome on my machine tbh. I run Brave which is Chromium based anyway.

Craig100

07/22/2024, 10:15 PM

Request duly submitted 🙂

Craig100

07/22/2024, 10:17 PM

I remain pretty much appalled by the state of PDF Generators/Converters. call them what you will. QuestPDF seems the best so far, but not suitable for just converting your webpage off the screen.

Jason

07/22/2024, 11:16 PM

The BrowserFetcher will take care of downloading the binaries for chrome (assuming you're calling it). You can configure it to use Firefox if you'd prefer. Not sure why you'd get an empty response... port 44342 are you running https or http?

Jason

07/22/2024, 11:54 PM

Have a look at the PDF spec, it's PostScript based so essentially a whole programming language in its own right. To "convert" from HTML & CSS, Every property you can "see", e.g. font size, position, colour etc. needs to be expressed in a whole different programming language. Inside that programming language key concepts are wildly different - image encoding, positional system, colour etc. Consider a simple div with a background colour, margin, border, and padding. None of those concepts even exist in a PDF. The only reason it works at all is because people have written a buttload of code to abstract and approximate from one to the other and even then only a small part of the spec is actually implemented/supported.

Craig100

07/23/2024, 8:18 AM

I'm just running their "Generate PDF Files" example like so:- It happily runs down to Ln41, appears to skip Ln 42 and returns with an error:- NavigationException: net::ERR_CERT_AUTHORITY_INVALID at https://localhost:44342/features If I use http then it returns with a different error:- NavigationException: net::ERR_EMPTY_RESPONSE at http://localhost:44342/features This "could" be the age old issue of not being able to trust the dotnet cert on Linux. Even though I've generated a cert and applied it many times. You always end up allowing the browser to jump to http on it's own. https://cdn.discordapp.com/attachments/1263879966588665858/1265221418371780660/image.png?ex=66a0b8c4&is=669f6744&hm=1b01499d04dafcb40eb39b294cee0be781314cbe5ae63670dd7b768a93391689&

Jason

07/23/2024, 8:24 AM

If you try and load the page in a browser locally, which works https:// or http://?

Craig100

07/23/2024, 8:28 AM

Only the https one works but it ends up with a red line through the protocol in the browser address bar. So that's what tells you the browser has allowed it.

Craig100

07/23/2024, 8:29 AM

Have just come accross this script which I'm going to try 🙂 https://blog.wille-zone.de/post/aspnetcore-devcert-for-ubuntu

Jason

07/23/2024, 8:29 AM

Or you can set

IgnoreHTTPSErrors: true

In the launchOptions

Jason

07/23/2024, 8:31 AM

(Puppeteer's launchOptions)

Jason

07/23/2024, 8:33 AM

WRT your point of allowing the browser to "jump to http on its own" that's not how it works. HTTPS is a very different protocol to HTTP, just as HTTP is different from FTP. You can only run one protocol on a single port - HTTPS doesn't downgrade to HTTP, the communication protocol is still the same, the browser just tells you that the certificate used to encrypt the connection is invalid - or more specifically can't be trusted.

Craig100

07/23/2024, 8:37 AM

Yes I realise I miss explained it. It's more the browser marks it as insecure.

Craig100

07/23/2024, 8:43 AM

Ooh, progress:- InvalidOperationException: The type 'System.ReadOnlySpan`1[System.Byte]' of property 'Preamble' on type 'System.Text.Encoding' is invalid for serialization or deserialization because it is a pointer type, is a ref struct, or contains generic parameters that have not been replaced by specific types. lol

Craig100

07/23/2024, 8:47 AM

Actually, it looks like Puppeteer has operated correctly, I even have a features.pdf file!!!!!!!! It looks exactly the same as the browser's own "Print to PDF" function, but doesn't look anything like the page, lol. I guess a print.css might fix that though. I think the error is just that I called it with : href="/umbraco/api/GeneratePdfController/CreatePdfAsync" and Rider doesn't know what to do next, lol.

Craig100

07/23/2024, 9:47 AM

This script actually works! I can now dev with https on Linux 🙂

Craig100

07/23/2024, 9:47 PM

I guess the trick now is to call it via another controller that can take url and filename args and that can return the file as a stream to download.

huwred

07/25/2024, 9:39 AM

could you use a viewcomponent?

Craig100

07/29/2024, 2:11 PM

I don't see the benefit of using a viewcomponent here or am I being thick? I'll just have a linkAction that looks like a button saying "Download PDF" and then expect to hit a controller that will stream back the PDF via a call to an API Controller that gets the PDF created. I'm hoping that that stream will then be the PDF to download.

Craig100

07/29/2024, 4:03 PM

As it happens, for some reason, umbraco routing I suspect, actionlink and urlaction don't populate the href, which makes them totally useless. So going down another road, calling an api which can get the pdf created then the next trick is to return it so it's downloaded back to the user, pref from memory so as not to leave a file hanging around.

Craig100

07/29/2024, 9:08 PM

So far I've only got it to save to a nominated folder and to respond to the print.css. Looking at the Puppeteer issues it looks like getting it downloadable has been asked for for a while. Not 100% convinced this is the way to go, unless someone knows better who's used it before maybe?

Jason

07/29/2024, 9:11 PM

I wouldn't stream the PDF directly. Save it to disk first then either stream the file or redirect to it.

Craig100

07/29/2024, 9:14 PM

I'd be happier if I could get the file path from it but I think I'm going to have to just construct it from the file name..

Jason

07/29/2024, 9:15 PM

The method I'm using you choose the filename.

Craig100

07/29/2024, 9:15 PM

It's currently a controller that's called with a url in the page, which I'm not too happy about, of the form: https://localhost:44342/GetPdfController/CreatePdfFile?url=/&fileName=Home

Craig100

07/29/2024, 9:16 PM

I suppose converting that to a beginUmbracoForm thing might help with the security

Jason

07/29/2024, 9:17 PM

Also, these settings helped me with styles, had to define @page rules in CSS

Copy code

csharp
await page.PdfAsync(filePath, new PdfOptions
{
    Outline = false,
    PrintBackground = true,
    PreferCSSPageSize = true
});

Craig100

07/29/2024, 9:17 PM

Me too, though I'm just passing through the Model.Name of the page, but the path to where it's saved is easy to add.

Craig100

07/29/2024, 9:18 PM

Oooh now that's interesting, I was trying to get it to change the background colour just to show it was picking up the print.css and it wouldn't. However, other things were changing ok like font colour.

Craig100

07/29/2024, 9:23 PM

That looks MUCH better now 🙂

Jason

07/29/2024, 9:26 PM

Need to be pretty careful with the approach generally - web page rendering & PDF generation are non-trivial operations ( literally involves starting a browser on your web server to do it). An endpoint like the above is easy to DoS or abuse in other ways. Consider: /GetPdf/{page ID}/ where page ID is validated against an allowed list of doctypes or documents that have some toggle set or something. Saving the file to disk, and using that as a caching mechanism for subsequent requests (PDFs are small, generating is expensive). Some kind of rate limiting.

Craig100

07/29/2024, 9:32 PM

Yes, I was aware of the DoS possibilities, which is why I was thinking using a form in the background might be more secure. Doubt I'll be able to cache them though. They're all very dynamic pages. Rate limiting might be good idea though.

Jason

07/29/2024, 10:56 PM

> Doubt I'll be able to cache them though Even 10 minutes is better than nothing. If an option you could always eagerly cache and generate a new PDF whenever the page changes, this is what we do on publish.

Craig100

07/29/2024, 11:05 PM

These pages have user defined tables on them and are very individual. Plus I don't fancy the task of coding it, which luckily, won't be necessary 😉

138 Views

Previous Next