Issue with Filtering PDF Search by Path in Examine (Umbraco 13) Umbraco #help-with-umbraco

Issue with Filtering PDF Search by Path in Examine...

Olti

01/31/2025, 11:40 AM

Hey everyone, I'm implementing a PDF search using Examine in Umbraco 13, and everything is working except for filtering by path. What Works ✅ - Searching for text inside PDFs works using fileTextContent. - Searching by nodeName also works. - If I don't filter by path, I get results. What Doesn't Work ❌ - When I try to filter by path, I get zero results. - The same query works in Examine Management but not in my code. How My Examine Data Looks: In Examine Management, when I search for a PDF, I see this in the index:

Copy code

path: -1,1600,1603,1611,2227
__IndexType: pdf
nodeName: Marketing Document
fileTextContent: "This document contains marketing strategies..."

- 1603 is my media root folder (selected by the user). - 2227 is the actual PDF file inside /HQM/blog1/. Lucene Query That Works in Examine Management If I manually search in Examine Management with:

Copy code

fileTextContent:marketing~2 OR nodeName:marketing~2

I get results. Lucene Query That My Code Generates (Fails) Here’s what my code generates:

Copy code

Lucene Query: { Category: pdf, LuceneQuery: +(fileTextContent:marketing~2 nodeName:marketing~2) +path:1603* }

This returns zero results ❌, even though 1603* should match -1,1600,1603,1611,2227.

Olti

01/31/2025, 11:40 AM

What I Tried So Far 🔄 ❌ criteria.And().Field("path", $"{rootNode.Id}*") Fails: Lucene Query: { Category: pdf, LuceneQuery: +(fileTextContent:marketing~2 nodeName:marketing~2) +path:1603 } → No results. Wildcard is ignored. ❌ criteria.And().NativeQuery($"path:{rootNode.Id}*") Fails: Lucene Query: { Category: pdf, LuceneQuery: +(fileTextContent:marketing~2 nodeName:marketing~2) +path:1603* } → No results. ❌ criteria.And().Field("path", "*1603*") Fails: Lucene Query: { Category: pdf, LuceneQuery: +(fileTextContent:marketing~2 nodeName:marketing~2) +path:*1603* } → No results. Throws error: '*' or '?' not allowed as first character in WildcardQuery How Can I Filter PDFs by Path Correctly? I need to filter PDFs so that only those inside a specific media folder (e.g., 1603) appear. Does anyone have experience with Examine's path filtering in Umbraco? Why is path:1603* returning nothing, even though the paths exist? Is there a better way to search PDFs within a media folder and its children? Thanks in advance! 🚀

Jemayn

01/31/2025, 11:43 AM

> 1603* should match -1,1600,1603,1611,2227. This is technically not true..

*1603*

should match, or

-1,1600,1603*

should. I haven't worked with paths in the PDF index, but it will depend on how the values are indexed. You can't visually see a difference in the backoffice viewer between values indexed as:

"1,2,3,4"

and

[1,2,3,4]

But the way Examine handles a "multivalue field" and a single value field containing a string of multiple comma separated values are very different

Olti

01/31/2025, 11:55 AM

Thanks for the clarification. I debugged the results in Visual Studio and inspected the path property of a matching PDF result. It matches exactly what I see in the backoffice Examine Management UI. Here’s what I found: The path value of the result is: -1,1600,1603,1611,2227. When I use +path:1603* in my Lucene query, it still does not return results even though 1603 is part of the path. I also tried using +path:*1603* as a RawQuery to account for all possible matches where 1603 appears anywhere in the path, but it seems not to work or throws errors when attempting the wildcard logic. Do you know if Examine treats path fields differently when querying? Or is there something else I should check to ensure this works? Thanks for your help! https://cdn.discordapp.com/attachments/1334850818012352575/1334854616843685969/image.png?ex=679e0bbd&is=679cba3d&hm=d55748fa8ab8726f75c10c716a459532b6ae2f1d7af3bb3fd1a1ceefdb74ac12&

Jemayn

01/31/2025, 1:27 PM

Historically with Examine people will edit the path field and make it index the values separately or with spaces in between. IIRC it is because Lucene strips specific characters when it searches, among which is the comma. So if you index

"-1,1600,1603,1611,2227"

lucene will treat it as

"11600160316112227"

which leads to issues when filtering by one of the node ids. My guess is that is what you are running into here. So indexing the path in a TransformingIndexValues event may fix your issue

Olti

01/31/2025, 1:48 PM

I solved it like this:

Olti

01/31/2025, 1:49 PM

Copy code

using Examine;
using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Core.Notifications;

namespace HunzikerIntranet.HQM.Umbraco.Infrastructure.Examine
{
    public class PDFIndexPathTransformer : INotificationHandler<UmbracoApplicationStartedNotification>
    {
        private readonly IExamineManager _examineManager;
        private readonly ILogger<PDFIndexPathTransformer> _logger;

        public PDFIndexPathTransformer(IExamineManager examineManager, ILogger<PDFIndexPathTransformer> logger)
        {
            _examineManager = examineManager;
            _logger = logger;
        }

        public void Handle(UmbracoApplicationStartedNotification notification)
        {
            if (!_examineManager.TryGetIndex("PDFIndex", out var pdfIndex))
            {
                _logger.LogError("PDFIndex not found in Examine.");
                return;
            }

            pdfIndex.TransformingIndexValues += IndexOnTransformingIndexValues;
        }

        private void IndexOnTransformingIndexValues(object? sender, IndexingItemEventArgs e)
        {
            if (!e.ValueSet.Values.ContainsKey("path")) return;

            var rawPath = e.ValueSet.Values["path"].FirstOrDefault()?.ToString();
            if (string.IsNullOrEmpty(rawPath)) return;

            var searchablePath = rawPath.Replace(",", " ");

            var indexFields = e.ValueSet.Values.ToDictionary(x => x.Key, x => x.Value.ToList());
            indexFields["searchablePath"] = new List<object> { searchablePath };

            e.SetValues(indexFields.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value));
        }
    }
}

Jemayn

01/31/2025, 1:51 PM

Yes exactly like that, so now it should work if you filter on searchablePath instead of path

Olti

01/31/2025, 1:55 PM

yes its working now, I wanted to tell you that

Olti

01/31/2025, 1:55 PM

@Jemayn thanks for ur help

Jemayn

01/31/2025, 2:16 PM

Glad to hear it, and np 🙂

11 Views

Previous Next