[Solved] PDF Generator
# help-with-other
c
Is there a decent PDF generator for Umbraco that works with V13? I've tried the WebWonders one but it won't install and looks like it hasn't been updated in a long while.
d
We used Iron Pdf and it's pretty decent
s
does it work on Umbraco Cloud?
c
IronPDF installs but won't run. Keeps complaining about skiasharp not being available. So installed that separately, then it complains skiasharp.extended.svg not available. That doesn't exist in nuget that I can see. Might give up. The agency I'm working with said they'd had issues with it in the past anyway.
d
I'm not sure but I think Moriyama Pdf preview uses it. Perhaps @AaronSadlerUK will know?
a
Check out the install instructions for Moriyama.PreviewPDF it explains what to do
c
Say's it'll only work on Windows servers
IronPDF says it'll run on .net 8.0 so shouldn't matter
a
I've only tested on windows when I wrote it, it uses system. Drawing which is windows only... I meant just checkout the instructions as it says how to get past the error you had with ironpdf by excluding some stuff in the app settings
c
QuestPDF looks really good but it's just for creating a PDF from scratch, not by taking a URL or HTML string. https://github.com/QuestPDF/QuestPDF
Ok, did that and now: Could not load file or assembly 'Mono.Posix.NETStandard.
I suspect IronPDF are telling porkies and it only runs on Windoze.
s
I was able to pass the skiasharp.extended.svg by installing skiasharp.svg but more dependencies came in and couldn't go anywhere
I'm starting to lose faith on pdf generators in umbraco cloud and suggest to have this done on a webapi on a separate machine with windows
c
I'm on localhost and it still won't build without missing dependencies. The IronPDF tuts make it look easy. Just install it from Nuget and then start coding. Rubbish.
I've requested access to their Slack channel, but it's Friday and I won't my breath.
s
i was able to generate pdfs using syncfusionPDF... but requires execution of a chromium wrapper (executable) which does not work on umbraco cloud due to lack of permissions
c
It'd be great if we could just hit a button to fire up the browser's print as PDF function 🙂
What is it you need to do? I just want to PDF the visible page.
If you want to build one up in code then QuestPDF looks fantastic. Look at their top video on the github link I put up about 5 mins in.
s
I don't want to build one from scratch. I have the code done for all the PDFS (it takes an HTML file with graphs and tables)... it works perfectly fine locally or if I deploy in a non-umbraco cloud environment... problem is when it goes to umbraco cloud, can't generate pdfs (permissions). I only found out it requires an executable when I first deployed this to the dev environment. 😦
to build from scratch, I used ASPPDF.NET for 5 years at my previous company
loved that tool
super cheap (one off $299)
and built a bunch of books with it (actual catalogues for car parts)
c
Syncfusion is a PITA as well. What is it with PDF generating software creators?
h
I've used https://selectpdf.com/community-edition/ in v10, should work in v13 I created a viewcomponent that generated a pdf from a url passed to it
s
it uses blink engine... if there's an executable like syncfusionPDF, it won't run on Umbraco Cloud unfortunately 😦
s
I use SelectPDF via the API now. Removes any worry about Cloud / Azure Seems pretty solid
To elaborate a little - I have a view render service, basically hit an internal url passing the user data and using a view that generates my statement. This generated html is then sent to the API via a custom pdf generator service. Then just serve up the memory stream returned. Works fine. It's behind a member login but you obviously need to be cautious on busting your api limits and what's generated etc.
c
Hmm it says PDF Select only works on windows. So not a true DotNet Core application. I develop on Linux so I need proper x-platform support.
I got in touch with IronPDF on their Slack channel, they're going to get back to me.
j
HTML to PDF
If you want to accurately render HTML, in a standards compliant way, the only way to do it is with a browser.
Most HTML to PDF converters ship a HTML rendering engine, often just the rendering engine of the browser, sometimes an entire web browser wrapped up in a DLL.
They're almost always rubbish.
Instead of looking for a good HTML to PDF converter, why not look for a good HTML renderer that can generate PDFs?
IME, the best of those is... Chrome. (FireFox is also available)
You can control Chrome or Firefox with PuppeteerSharp.
They even have a guide for generating PDFs in their docs: https://www.puppeteersharp.com/examples/index.html#generate-pdf-files
Fully open source with a licensing model we can use without stressing about. Cross platform.
c
I really don't care which way round it works, I just want to print off the current web page to PDF and have it download. Thanks for the steer on pupeteersharp, I'll take a look 🙂
It's all very well having all that documentation but how can it possibly miss "Installation"? I have no idea how to get it on my machine, lol
j
It's just a nuget package.
c
It's just one of my pet hates. Writing a document starting from half way in, assuming everyone knows what you're talking about. There's no hint of the word "install" nor "nuget" in any of the docs. Apparently we all "just know". Anyway, rant over, lol. Thanks. I'll have a go 🙂
j
Sure, more adding the extra information because a lot of devs oversimplify the process to generate a PDF from an HTML document because "it's just text and images" without thinking about the fact that they're wildly different formats with very little in common internally. The initial title etc. doesn't make it clear your trying to turn a web page into a PDF rather than create from scratch.
c
Saw that. If I get it going, I might mention it 😉
j
One thing to bear in mind with PuppeteerSharp - you will need to download and install the chrome or Firefox binaries. The package handles caching etc. but you may not want users to have to wait for download the first time they use the functionality. I recommend calling the download method in Umbraco's boot pipeline (make it fire and forget).
c
I can hit the method now. On
await page.GoToAsync("http://localhost:44342/features/");
it just reports ERR_EMPTY_RESPONSE at http://localhost:44342/features/ Just wondering if this is because it's on localhost. I haven't installed any Chrome code. Don't particularly want Chrome on my machine tbh. I run Brave which is Chromium based anyway.
Request duly submitted 🙂
I remain pretty much appalled by the state of PDF Generators/Converters. call them what you will. QuestPDF seems the best so far, but not suitable for just converting your webpage off the screen.
j
The BrowserFetcher will take care of downloading the binaries for chrome (assuming you're calling it). You can configure it to use Firefox if you'd prefer. Not sure why you'd get an empty response... port 44342 are you running https or http?
Have a look at the PDF spec, it's PostScript based so essentially a whole programming language in its own right. To "convert" from HTML & CSS, Every property you can "see", e.g. font size, position, colour etc. needs to be expressed in a whole different programming language. Inside that programming language key concepts are wildly different - image encoding, positional system, colour etc. Consider a simple div with a background colour, margin, border, and padding. None of those concepts even exist in a PDF. The only reason it works at all is because people have written a buttload of code to abstract and approximate from one to the other and even then only a small part of the spec is actually implemented/supported.
c
I'm just running their "Generate PDF Files" example like so:- It happily runs down to Ln41, appears to skip Ln 42 and returns with an error:- NavigationException: net::ERR_CERT_AUTHORITY_INVALID at https://localhost:44342/features If I use http then it returns with a different error:- NavigationException: net::ERR_EMPTY_RESPONSE at http://localhost:44342/features This "could" be the age old issue of not being able to trust the dotnet cert on Linux. Even though I've generated a cert and applied it many times. You always end up allowing the browser to jump to http on it's own. https://cdn.discordapp.com/attachments/1263879966588665858/1265221418371780660/image.png?ex=66a0b8c4&is=669f6744&hm=1b01499d04dafcb40eb39b294cee0be781314cbe5ae63670dd7b768a93391689&
j
If you try and load the page in a browser locally, which works https:// or http://?
c
Only the https one works but it ends up with a red line through the protocol in the browser address bar. So that's what tells you the browser has allowed it.
Have just come accross this script which I'm going to try 🙂 https://blog.wille-zone.de/post/aspnetcore-devcert-for-ubuntu
j
Or you can set
IgnoreHTTPSErrors: true
In the launchOptions
(Puppeteer's launchOptions)
WRT your point of allowing the browser to "jump to http on its own" that's not how it works. HTTPS is a very different protocol to HTTP, just as HTTP is different from FTP. You can only run one protocol on a single port - HTTPS doesn't downgrade to HTTP, the communication protocol is still the same, the browser just tells you that the certificate used to encrypt the connection is invalid - or more specifically can't be trusted.
c
Yes I realise I miss explained it. It's more the browser marks it as insecure.
Ooh, progress:- InvalidOperationException: The type 'System.ReadOnlySpan`1[System.Byte]' of property 'Preamble' on type 'System.Text.Encoding' is invalid for serialization or deserialization because it is a pointer type, is a ref struct, or contains generic parameters that have not been replaced by specific types. lol
Actually, it looks like Puppeteer has operated correctly, I even have a features.pdf file!!!!!!!! It looks exactly the same as the browser's own "Print to PDF" function, but doesn't look anything like the page, lol. I guess a print.css might fix that though. I think the error is just that I called it with : href="/umbraco/api/GeneratePdfController/CreatePdfAsync" and Rider doesn't know what to do next, lol.
This script actually works! I can now dev with https on Linux 🙂
I guess the trick now is to call it via another controller that can take url and filename args and that can return the file as a stream to download.
h
could you use a viewcomponent?
c
I don't see the benefit of using a viewcomponent here or am I being thick? I'll just have a linkAction that looks like a button saying "Download PDF" and then expect to hit a controller that will stream back the PDF via a call to an API Controller that gets the PDF created. I'm hoping that that stream will then be the PDF to download.
As it happens, for some reason, umbraco routing I suspect, actionlink and urlaction don't populate the href, which makes them totally useless. So going down another road, calling an api which can get the pdf created then the next trick is to return it so it's downloaded back to the user, pref from memory so as not to leave a file hanging around.
So far I've only got it to save to a nominated folder and to respond to the print.css. Looking at the Puppeteer issues it looks like getting it downloadable has been asked for for a while. Not 100% convinced this is the way to go, unless someone knows better who's used it before maybe?
j
I wouldn't stream the PDF directly. Save it to disk first then either stream the file or redirect to it.
c
I'd be happier if I could get the file path from it but I think I'm going to have to just construct it from the file name..
j
The method I'm using you choose the filename.
c
It's currently a controller that's called with a url in the page, which I'm not too happy about, of the form: https://localhost:44342/GetPdfController/CreatePdfFile?url=/&fileName=Home
I suppose converting that to a beginUmbracoForm thing might help with the security
j
Also, these settings helped me with styles, had to define @page rules in CSS
Copy code
csharp
await page.PdfAsync(filePath, new PdfOptions
{
    Outline = false,
    PrintBackground = true,
    PreferCSSPageSize = true
});
c
Me too, though I'm just passing through the Model.Name of the page, but the path to where it's saved is easy to add.
Oooh now that's interesting, I was trying to get it to change the background colour just to show it was picking up the print.css and it wouldn't. However, other things were changing ok like font colour.
That looks MUCH better now 🙂
j
Need to be pretty careful with the approach generally - web page rendering & PDF generation are non-trivial operations ( literally involves starting a browser on your web server to do it). An endpoint like the above is easy to DoS or abuse in other ways. Consider: /GetPdf/{page ID}/ where page ID is validated against an allowed list of doctypes or documents that have some toggle set or something. Saving the file to disk, and using that as a caching mechanism for subsequent requests (PDFs are small, generating is expensive). Some kind of rate limiting.
c
Yes, I was aware of the DoS possibilities, which is why I was thinking using a form in the background might be more secure. Doubt I'll be able to cache them though. They're all very dynamic pages. Rate limiting might be good idea though.
j
> Doubt I'll be able to cache them though Even 10 minutes is better than nothing. If an option you could always eagerly cache and generate a new PDF whenever the page changes, this is what we do on publish.
c
These pages have user defined tables on them and are very individual. Plus I don't fancy the task of coding it, which luckily, won't be necessary 😉
126 Views