Examine: Unable to Access Content Fields in ExternalIndex (Custom Search Component) Umbraco #help-with-umbraco

Examine: Unable to Access Content Fields in Extern...

Olti

02/12/2025, 1:01 PM

Hi everyone! 👋 I'm implementing a custom search component in Umbraco that searches pages (not documents) using Examine's ExternalIndex. However, I am unable to access the correct content field that contains the actual text of the page. I need to: 1. Extract a text snippet from the search results where the search term appears. 2. Highlight the search term in that snippet. 3. Ensure I’m using the correct field from ExternalIndex to retrieve the page content. Currently, I am trying to access: - "mainContent_en-us" - "combinedField" However, both are always null in my Razor view. What I Have Tried: - Checked Examine Management in Umbraco Backoffice to inspect indexed fields. - Attempted to retrieve content from different fields. - Used item.Value("fieldName"), but it always returns null. ExternalIndex Fields (Example): When searching for the term "arbeitsalltag", here are some relevant fields from the index:

Copy code

__NodeId    1650
__NodeTypeAlias    USNPage
__Path    -1,2260,1591,1606,1650
__Published    y
combinedField    icon-layout color-black y ... (includes full text)
mainContent_en-us    0 {"text":"","headingtag":""} {"text":"","headingtag":""} (seems empty)
nodeName    1.3 Vision Mission Leitbild
searchablePath    -1 2260 1591 1606 1650

It seems that combinedField contains the text, but I can't access it via item.Value("combinedField"). This is my current search implementation:

Olti

02/12/2025, 1:02 PM

https://cdn.discordapp.com/attachments/1339219843987144735/1339220034857336893/message.txt?ex=67aded59&is=67ac9bd9&hm=87082cf9b5719e9137a479b2666860342a0537bf3af5c3196d216a27ebf124db&

Olti

02/12/2025, 1:03 PM

1. Which field should I use in ExternalIndex to get the actual page content? 2. Why does item.Value("combinedField") always return null? 3. Is there a different way to access the text from Examine results? Any help is greatly appreciated! 🙏 Thanks in advance! 🚀

Olti

02/12/2025, 4:18 PM

What is the best approach to do that? Because I know that we have the combinedField in the ExternalIndex, but the string is very messy and it has JSON in there.

Olti

02/12/2025, 4:20 PM

So is it a good idea to create a new field "plainText" and extract the combinedField to just have a plainText? Can anyone tell me if it is a good approach?

Olti

02/12/2025, 4:23 PM

@Jemayn you are the examine pro I think here haha, if you have time, I would appreciate it.

Olti

02/12/2025, 5:00 PM

Just tried it but it's not good, cause I can't get the text clean enough. There is to many id's and stuff (JSON)..

Mike Chambers

02/12/2025, 5:12 PM

https://dev.to/jemayn/indexing-blocklist-data-in-umbraco-9-334o any use?

Mike Chambers

02/12/2025, 5:13 PM

also prob overkill but out of the box there is https://docs.umbraco.com/umbraco-cms/reference/configuration/indexingsettings#explicitlyindexeachnestedproperty

Mike Chambers

02/12/2025, 5:18 PM

for custom property editors..

public override IPropertyIndexValueFactory PropertyIndexValueFactory => _tagPropertyIndexValueFactory;

Or other core propertyIndexValueFactories here https://github.com/umbraco/Umbraco-CMS/tree/contrib/src/Umbraco.Core/PropertyEditors If you can reuse.. or for inspiration.

Olti

02/12/2025, 5:38 PM

@Mike Chambers thanks! The thing here is, that I am using uSkinned... And there are Pods and things like that, so I need to iterate through them aswell, and I don't know what other Property Editors there are...

Jemayn

02/13/2025, 12:05 PM

Hey @Olti If I understand your needs correctly, then I would not spend time doing this implementation myself. The FullTextSearch package can do it all for you, it requires a little bit of setting up, but it contains the highlighting feature to show the search term in the result highlighted as well: https://github.com/skttl/umbraco-fulltextsearch8/blob/v5/dev/docs/developers-guide-v4.md

Olti

02/13/2025, 12:08 PM

Hey @Jemayn Oh okay, didn't know this existed. But the thing is, that my search component, should work on every page, so it can be reused. I can show you my current Code: PagedSearchResult.cs:

Copy code

using Umbraco.Cms.Core.Models.PublishedContent;

namespace HunzikerIntranet.HQM.Umbraco.Models
{
    public class PagedSearchResult
    {
        public IEnumerable<IPublishedContent> Results { get; set; }
            = Enumerable.Empty<IPublishedContent>();

        public long TotalItemCount { get; set; }
        public int CurrentPage { get; set; }
        public int TotalPages { get; set; }
        public int PageSize { get; set; }
    }
}

Olti

02/13/2025, 12:08 PM

SearchService.cs https://cdn.discordapp.com/attachments/1339219843987144735/1339568887808659456/message.txt?ex=67af323e&is=67ade0be&hm=a252b81709e001d125d9dd1fd0c94354301c9d45bb615bfa35a5a7730d122284&

Olti

02/13/2025, 12:09 PM

Everything works, but I just need the preview-text to show up in the results... And that's my component: https://cdn.discordapp.com/attachments/1339219843987144735/1339569145787973652/message.txt?ex=67af327c&is=67ade0fc&hm=d361ac8bc21075b21de5a10afcb0e733f690553fd923c034e16563a8fabd0390&

Jemayn

02/13/2025, 12:26 PM

I think I understand now, so the combinedField you have in your index probably contains all content of the page in one long string, this is a custom field that does not correspond to one specific property but rather a bunch of them all put into one field. Because of this you cannot just say

item.Value<string>("combinedField")

as the item in this case is IPublishedContent which is the cached model of the content node. This will instead have a bunch of properties that are put together into the combinedField. Looping through the properties is a bad idea as they will differ for each pagetype and some of them are probably nested blocklists etc. Instead you can in your searchService you can get the value of your field and have the full content of the combinedField searchfield and then try to find the surrounding text. In your service you do this:

Copy code

csharp
var finalSearchList = searchResults
                .ToPublishedSearchResults(umbracoContext.PublishedSnapshot.Content)
                .Select(x => x.Content);

Which basically converts your searchresult data into IPublishedContent, but in doing so you throw away all of the additional indexed data that doesn't exist on the node but only in the index. If you for each searchresult retrieved the property with alias combinedField then you would have the text and could use something to find your search term within that text and show it on the page. The package I mentioned before has this "highlighter helper" which can take the full text of the field, the search term and the length of your summary and then give you a summary centered around the searchterm: https://github.com/skttl/umbraco-fulltextsearch8/blob/v5/dev/src/Our.Umbraco.FullTextSearch/Helpers/Highlighter.cs

Olti

02/13/2025, 12:33 PM

@Jemayn okay, I am gonna try using the package and I will come back here again and tell you if its working or not haha

Olti

02/13/2025, 12:34 PM

@Jemayn thanks for the package. I will give it a try

Olti

02/13/2025, 1:10 PM

Hey @Jemayn , I just implemented it. But I still got the same problem as I had before. I mean, know the logic is implemented with the highlighting but I still don't have a clean Field with pure Text... Somehow it is always showing me this text in the bodyText... My pages are protected for members only, is it because of that? https://cdn.discordapp.com/attachments/1339219843987144735/1339584423330451500/image.png?ex=67af40b6&is=67adef36&hm=29158f517c6ad040a8cb6e457af971306c6b2b5a9454789908ec3faa66207663&

Olti

02/13/2025, 1:11 PM

It's the text from my login page

Olti

02/13/2025, 1:24 PM

It was because of that, I removed the protection and then rebuild the index and the text is now OK, but still not good, its too much stuff like page title, page description, navigationlinks and so on. So it is not only the blocklist-content. And the problem is, that I can not remove the protection, the pages need to be protected... Any solution/idea?

Olti

02/13/2025, 1:24 PM

@Jemayn

Mike Chambers

02/13/2025, 2:25 PM

Copy code

csharp
using Examine;
using Examine.Lucene;
using Microsoft.Extensions.Options;
using Umbraco.Cms.Core;
using Umbraco.Cms.Core.Composing;
using Umbraco.Cms.Core.Scoping;
using Umbraco.Cms.Core.Services;
using Umbraco.Cms.Infrastructure.Examine;

namespace www.Extensions.Examine
{
    public sealed class ConfigureExamineIndexOptions : IConfigureNamedOptions<LuceneDirectoryIndexOptions>
    {
        public readonly IPublicAccessService _publicAccessService;
        [Obsolete("Update when Umbraco Core ContentValueSetValidator is updated")]
        private readonly IScopeProvider _scopeProvider;

        [Obsolete("Update when Umbraco Core ContentValueSetValidator is updated")]
        public ConfigureExamineIndexOptions(IPublicAccessService publicAccessService, IScopeProvider scopeProvider)
        {
            _publicAccessService = publicAccessService;
            _scopeProvider = scopeProvider;
        }

        [Obsolete("Update when Umbraco Core ContentValueSetValidator is updated")]
        public void Configure(string? name, LuceneDirectoryIndexOptions options)
        {
            switch (name)
            {
                //NB you need to rebuild the examine index for these changes to take effect
                case Constants.UmbracoIndexes.ExternalIndexName:
                    options.Validator = new ContentValueSetValidator(true, true, _publicAccessService, _scopeProvider);
                    break;                
            }
        }

        public void Configure(LuceneDirectoryIndexOptions options) => throw new NotImplementedException("This is never called and is just part of the interface");
    }

    public class SearchComposer : IComposer
    {
        public void Compose(IUmbracoBuilder builder)
        {
            // Custom Examine configuration
            builder.Services.ConfigureOptions<ConfigureExamineIndexOptions>();
        }
    }
}

Mike Chambers

02/13/2025, 2:25 PM

options.Validator = new ContentValueSetValidator(true, true, _publicAccessService, _scopeProvider);

changes the external index to include protected nodes.

Mike Chambers

02/13/2025, 2:29 PM

Copy code

public ContentValueSetValidator(
        bool publishedValuesOnly,
        bool supportProtectedContent,
        IPublicAccessService? publicAccessService,
        IScopeProvider? scopeProvider)

Olti

02/14/2025, 10:19 AM

Hey @Mike Chambers , thanks. I tried that, but it didn't work... I am using now a library called "FullTextSearch" v. 4.1.0.

Olti

02/14/2025, 10:20 AM

@Jemayn any Idea maybe? I also tried this: builder.Services.AddUnique(); but it didn't work aswell.

Olti

02/14/2025, 11:10 AM

That's my program.cs:

Copy code

using HunzikerIntranet.HQM.Umbraco.Infrastructure.Configuration;
using HunzikerIntranet.HQM.Umbraco.Interfaces;
using HunzikerIntranet.HQM.Umbraco.Security;
using HunzikerIntranet.HQM.Umbraco.Services;

WebApplicationBuilder builder = WebApplication.CreateBuilder(args);

builder.Services.Configure<AzureAdSettings>(
    builder.Configuration.GetSection("AzureAd")
);

builder.Services.AddTransient<ISearchService, SearchService>();

builder.CreateUmbracoBuilder()
    .AddBackOffice()
    .AddWebsite()
    .AddDeliveryApi()
    .AddComposers()
    .ConfigureAuthenticationMembers(builder.Configuration)
    .Build();

WebApplication app = builder.Build();

await app.BootUmbracoAsync();


app.UseUmbraco()
    .WithMiddleware(u =>
    {
        u.UseBackOffice();
        u.UseWebsite();
    })
    .WithEndpoints(u =>
    {
        u.UseInstallerEndpoints();
        u.UseBackOfficeEndpoints();
        u.UseWebsiteEndpoints();
    });

await app.RunAsync();

and I already tried this "services.AddUnique();" but it didn't work...

Mike Chambers

02/14/2025, 11:17 AM

you do need to rebuild the indexes for the changes to propagate... if you didn't notice the comment.. haven't used full text search.. but hopefully will get you there.

Olti

02/14/2025, 11:18 AM

I tried rebuilding the indexes but still returning me an empty string.

Mike Chambers

02/14/2025, 11:19 AM

protected nodes are now in the index though?

Mike Chambers

02/14/2025, 11:19 AM

or always have been? (sorry thought I read that you weren't seeing your protected nodes at all in there)

Olti

02/14/2025, 11:20 AM

they were always there. But the only problem I had was with the bodyText from the package: https://cdn.discordapp.com/attachments/1339219843987144735/1339919072711675958/image.png?ex=67b07861&is=67af26e1&hm=d9b7ead1d1a5085af3a5bcb4ad0a9bf906122a60175f16b576134f9cd0013d31&

Olti

02/14/2025, 11:20 AM

thats the html form my login page: https://cdn.discordapp.com/attachments/1339219843987144735/1339919208288489473/image.png?ex=67b07881&is=67af2701&hm=363ac5614462e671b20ff6e20bb3be5d125a40b60d215761d69e19b0badc6c92&

Mike Chambers

02/14/2025, 11:20 AM

you are looking in the externalIndex and not the internalIndex.. as find it odd that protected nodes are in the default external

Olti

02/14/2025, 11:21 AM

I think the package does that: https://github.com/skttl/umbraco-fulltextsearch8/blob/v5/dev/docs/developers-guide-v4.md

Olti

02/14/2025, 11:23 AM

the combinedField is from umbraco I think and that was before also here: and there we can also see the content but yeah its not clean string and to clean it is impossible... so I had to use the package because it offers a lot of pros as the highlighting and summary. https://cdn.discordapp.com/attachments/1339219843987144735/1339919908754161705/image.png?ex=67b07928&is=67af27a8&hm=14bb1cbefa20d1989eec511d2f2ab956a854be197f45af1cc2235e2f79a6d79c&

Olti

02/14/2025, 11:23 AM

but yeah the bodyText_culture is generated by the package and the indexer from the package (searchBot or something like that)

Olti

02/14/2025, 11:24 AM

when I remove all of the protection on the page, then rebuild the indexes, I get the full bodyText and summary...

Mike Chambers

02/14/2025, 11:42 AM

are you using a package to protect certain aspects of the page and not the

restrict public access

for the entire node?

Mike Chambers

02/14/2025, 11:43 AM

https://cdn.discordapp.com/attachments/1339219843987144735/1339924881067868222/image.png?ex=67b07dca&is=67af2c4a&hm=c3448a89eb9e1d6ee6737b70b216d4c5b3eb7e46c97ea9c36cfa5cee166b1506&

Olti

02/14/2025, 12:14 PM

No I just use the restrict public access.

Olti

02/14/2025, 12:42 PM

Still searching for a solution. I tried a lot of things with ChatGPT, but no solution found...

Olti

02/14/2025, 2:08 PM

The external Indexer from Umbraco https://cdn.discordapp.com/attachments/1339219843987144735/1339961381981650954/image.png?ex=67b09fc8&is=67af4e48&hm=46041b57ef4c7fa7665d345ff63dee4ecfd4788caaafb9e069b531a81b9f4ef7&

Olti

02/14/2025, 2:08 PM

my thirdparty package https://cdn.discordapp.com/attachments/1339219843987144735/1339961424679669780/image.png?ex=67b09fd2&is=67af4e52&hm=d06d401f2e84e684f797f2cd76d6762ed432170a3220b8db323877c81974cdd8&

Olti

02/14/2025, 2:08 PM

https://cdn.discordapp.com/attachments/1339219843987144735/1339961470120759316/image.png?ex=67b09fdd&is=67af4e5d&hm=b9399fa70044d39819b78e794b3435b2d4c81c72996a65ae8c3c133e05668a60&

Olti

02/17/2025, 8:18 AM

anyone a solution?

29 Views

Previous Next