PDF Search Using PDFIndex in Examine Not Returning Results Umbraco #help-with-umbraco

PDF Search Using PDFIndex in Examine Not Returning...

Olti

01/31/2025, 9:03 AM

❓ Questions: 1. Do I need to specify a media root node for searching in PDFs? - Right now, I am NOT filtering by a specific folder, but still get 0 results. - If I need to filter by folder, how should I implement that? 2. Why does Examine Management find PDFs in the Backoffice, but my query in the service returns 0 results? - Is my query missing something? - Do I need to specify "__IndexType:pdf" in the query? 3. Is there a better way to link PDFs to the pages they are embedded in? - If I want to display the page where the PDF is used, what’s the best approach?

Jemayn

01/31/2025, 9:24 AM

When you create your document query and you start with:

Copy code

csharp
var criteria = searcher.CreateQuery("content", BooleanOperation.And)

That corresponds to a Lucene query that checks that the

__IndexType: content

which is helpful in the external index as you wont find things with indexType: media then. However in your document approach you have this:

Copy code

csharp
var criteria = searcher.CreateQuery("media", BooleanOperation.And)

Which corresponds to

__IndexType: content

where atleast on the ones I have running using the PdfIndex the indextype field has the value

pdf

Also - if you set a breakpoint right after you execute the search then you can see the full lucene search string on your criteria - it is often really helpful for debugging

Jemayn

01/31/2025, 9:25 AM

Also consider having a look at this blogpost: https://dev.to/jemayn/index-pdfs-on-their-pages-in-umbraco-30l1

Olti

01/31/2025, 9:39 AM

@Jemayn Thanks for you response! I’ve read through the blog post you mentioned, and it was very insightful. However, I’m still struggling with why my document search is returning no results. For context: 1. When I search in Pages, everything works perfectly. Here's an example of the generated Lucene query:

Copy code

Category: content, LuceneQuery: +(combinedField:alexander~2) +__Published:y +searchablePath:1591 -hideFromInternalSearch:1 -__NodeTypeAlias:usnsitemapxml -__NodeTypeAlias:usnrobotstxt

This query returns the expected results. 2. When I search in Documents (using the PDFIndex), my query looks like this:

Copy code

Category: media, LuceneQuery: +fileTextContent:konzept~2

However, this query returns 0 results, even though I can find the correct document (Marketing Konzept Hunziker) in Examine Management under the same PDFIndex when I search manually. Here’s where I’m confused: - Does the query need additional fields to properly target the PDFIndex? Should I explicitly include something like +__IndexType:pdf, or is that already implied by the query’s Category: media? - When debugging, how can I verify that the query is actually searching where it should? Just to clarify, the generated Lucene query for document search looks like this when I log it:

Copy code

Category: media, LuceneQuery: +fileTextContent:konzept~2

It seems correct, but something is still missing. Any advice or guidance on this would be greatly appreciated! https://cdn.discordapp.com/attachments/1334810683426209842/1334820235957637142/image.png?ex=679debb8&is=679c9a38&hm=1fba9a0b31270f2432b9845851e9ecf3f00e9bea3850337cfc4ca2cc7174189a&

Jemayn

01/31/2025, 9:48 AM

I tried to explain it in my above comment, but basically when you in your code write:

var criteria = searcher.CreateQuery("media", BooleanOperation.And)

you make a mistake, it should instead be

var criteria = searcher.CreateQuery("pdf", BooleanOperation.And)

. The code you have now adds a filter to the __IndexType which has 0 results as all índexed pdfs by default will have IndexType: pdf not media

Olti

01/31/2025, 9:55 AM

Yeah, I just found out. Thankss a lot!!!

5 Views

Previous Next