PDF Search Using PDFIndex in Examine Not Returning...
# help-with-umbraco
o
❓ Questions: 1. Do I need to specify a media root node for searching in PDFs? - Right now, I am NOT filtering by a specific folder, but still get 0 results. - If I need to filter by folder, how should I implement that? 2. Why does Examine Management find PDFs in the Backoffice, but my query in the service returns 0 results? - Is my query missing something? - Do I need to specify "__IndexType:pdf" in the query? 3. Is there a better way to link PDFs to the pages they are embedded in? - If I want to display the page where the PDF is used, what’s the best approach?
j
When you create your document query and you start with:
Copy code
csharp
var criteria = searcher.CreateQuery("content", BooleanOperation.And)
That corresponds to a Lucene query that checks that the
__IndexType: content
which is helpful in the external index as you wont find things with indexType: media then. However in your document approach you have this:
Copy code
csharp
var criteria = searcher.CreateQuery("media", BooleanOperation.And)
Which corresponds to
__IndexType: content
where atleast on the ones I have running using the PdfIndex the indextype field has the value
pdf
Also - if you set a breakpoint right after you execute the search then you can see the full lucene search string on your criteria - it is often really helpful for debugging
Also consider having a look at this blogpost: https://dev.to/jemayn/index-pdfs-on-their-pages-in-umbraco-30l1
o
@Jemayn Thanks for you response! I’ve read through the blog post you mentioned, and it was very insightful. However, I’m still struggling with why my document search is returning no results. For context: 1. When I search in Pages, everything works perfectly. Here's an example of the generated Lucene query:
Copy code
Category: content, LuceneQuery: +(combinedField:alexander~2) +__Published:y +searchablePath:1591 -hideFromInternalSearch:1 -__NodeTypeAlias:usnsitemapxml -__NodeTypeAlias:usnrobotstxt
This query returns the expected results. 2. When I search in Documents (using the PDFIndex), my query looks like this:
Copy code
Category: media, LuceneQuery: +fileTextContent:konzept~2
However, this query returns 0 results, even though I can find the correct document (Marketing Konzept Hunziker) in Examine Management under the same PDFIndex when I search manually. Here’s where I’m confused: - Does the query need additional fields to properly target the PDFIndex? Should I explicitly include something like +__IndexType:pdf, or is that already implied by the query’s Category: media? - When debugging, how can I verify that the query is actually searching where it should? Just to clarify, the generated Lucene query for document search looks like this when I log it:
Copy code
Category: media, LuceneQuery: +fileTextContent:konzept~2
It seems correct, but something is still missing. Any advice or guidance on this would be greatly appreciated! https://cdn.discordapp.com/attachments/1334810683426209842/1334820235957637142/image.png?ex=679debb8&is=679c9a38&hm=1fba9a0b31270f2432b9845851e9ecf3f00e9bea3850337cfc4ca2cc7174189a&
j
I tried to explain it in my above comment, but basically when you in your code write:
var criteria = searcher.CreateQuery("media", BooleanOperation.And)
you make a mistake, it should instead be
var criteria = searcher.CreateQuery("pdf", BooleanOperation.And)
. The code you have now adds a filter to the __IndexType which has 0 results as all índexed pdfs by default will have IndexType: pdf not media
o
Yeah, I just found out. Thankss a lot!!!
3 Views