Examine: Unable to Access Content Fields in Extern...
# help-with-umbraco
o
Hi everyone! 👋 I'm implementing a custom search component in Umbraco that searches pages (not documents) using Examine's ExternalIndex. However, I am unable to access the correct content field that contains the actual text of the page. I need to: 1. Extract a text snippet from the search results where the search term appears. 2. Highlight the search term in that snippet. 3. Ensure I’m using the correct field from ExternalIndex to retrieve the page content. Currently, I am trying to access: - "mainContent_en-us" - "combinedField" However, both are always null in my Razor view. What I Have Tried: - Checked Examine Management in Umbraco Backoffice to inspect indexed fields. - Attempted to retrieve content from different fields. - Used item.Value("fieldName"), but it always returns null. ExternalIndex Fields (Example): When searching for the term "arbeitsalltag", here are some relevant fields from the index:
Copy code
__NodeId    1650
__NodeTypeAlias    USNPage
__Path    -1,2260,1591,1606,1650
__Published    y
combinedField    icon-layout color-black y ... (includes full text)
mainContent_en-us    0 {"text":"","headingtag":""} {"text":"","headingtag":""} (seems empty)
nodeName    1.3 Vision Mission Leitbild
searchablePath    -1 2260 1591 1606 1650
It seems that combinedField contains the text, but I can't access it via item.Value("combinedField"). This is my current search implementation:
1. Which field should I use in ExternalIndex to get the actual page content? 2. Why does item.Value("combinedField") always return null? 3. Is there a different way to access the text from Examine results? Any help is greatly appreciated! 🙏 Thanks in advance! 🚀
What is the best approach to do that? Because I know that we have the combinedField in the ExternalIndex, but the string is very messy and it has JSON in there.
So is it a good idea to create a new field "plainText" and extract the combinedField to just have a plainText? Can anyone tell me if it is a good approach?
@Jemayn you are the examine pro I think here haha, if you have time, I would appreciate it.
Just tried it but it's not good, cause I can't get the text clean enough. There is to many id's and stuff (JSON)..
m
for custom property editors..
public override IPropertyIndexValueFactory PropertyIndexValueFactory => _tagPropertyIndexValueFactory;
Or other core propertyIndexValueFactories here https://github.com/umbraco/Umbraco-CMS/tree/contrib/src/Umbraco.Core/PropertyEditors If you can reuse.. or for inspiration.
o
@Mike Chambers thanks! The thing here is, that I am using uSkinned... And there are Pods and things like that, so I need to iterate through them aswell, and I don't know what other Property Editors there are...
j
Hey @Olti If I understand your needs correctly, then I would not spend time doing this implementation myself. The FullTextSearch package can do it all for you, it requires a little bit of setting up, but it contains the highlighting feature to show the search term in the result highlighted as well: https://github.com/skttl/umbraco-fulltextsearch8/blob/v5/dev/docs/developers-guide-v4.md
o
Hey @Jemayn Oh okay, didn't know this existed. But the thing is, that my search component, should work on every page, so it can be reused. I can show you my current Code: PagedSearchResult.cs:
Copy code
using Umbraco.Cms.Core.Models.PublishedContent;

namespace HunzikerIntranet.HQM.Umbraco.Models
{
    public class PagedSearchResult
    {
        public IEnumerable<IPublishedContent> Results { get; set; }
            = Enumerable.Empty<IPublishedContent>();

        public long TotalItemCount { get; set; }
        public int CurrentPage { get; set; }
        public int TotalPages { get; set; }
        public int PageSize { get; set; }
    }
}
j
I think I understand now, so the combinedField you have in your index probably contains all content of the page in one long string, this is a custom field that does not correspond to one specific property but rather a bunch of them all put into one field. Because of this you cannot just say
item.Value<string>("combinedField")
as the item in this case is IPublishedContent which is the cached model of the content node. This will instead have a bunch of properties that are put together into the combinedField. Looping through the properties is a bad idea as they will differ for each pagetype and some of them are probably nested blocklists etc. Instead you can in your searchService you can get the value of your field and have the full content of the combinedField searchfield and then try to find the surrounding text. In your service you do this:
Copy code
csharp
var finalSearchList = searchResults
                .ToPublishedSearchResults(umbracoContext.PublishedSnapshot.Content)
                .Select(x => x.Content);
Which basically converts your searchresult data into IPublishedContent, but in doing so you throw away all of the additional indexed data that doesn't exist on the node but only in the index. If you for each searchresult retrieved the property with alias combinedField then you would have the text and could use something to find your search term within that text and show it on the page. The package I mentioned before has this "highlighter helper" which can take the full text of the field, the search term and the length of your summary and then give you a summary centered around the searchterm: https://github.com/skttl/umbraco-fulltextsearch8/blob/v5/dev/src/Our.Umbraco.FullTextSearch/Helpers/Highlighter.cs
o
@Jemayn okay, I am gonna try using the package and I will come back here again and tell you if its working or not haha
@Jemayn thanks for the package. I will give it a try
Hey @Jemayn , I just implemented it. But I still got the same problem as I had before. I mean, know the logic is implemented with the highlighting but I still don't have a clean Field with pure Text... Somehow it is always showing me this text in the bodyText... My pages are protected for members only, is it because of that? https://cdn.discordapp.com/attachments/1339219843987144735/1339584423330451500/image.png?ex=67af40b6&is=67adef36&hm=29158f517c6ad040a8cb6e457af971306c6b2b5a9454789908ec3faa66207663&
It's the text from my login page
It was because of that, I removed the protection and then rebuild the index and the text is now OK, but still not good, its too much stuff like page title, page description, navigationlinks and so on. So it is not only the blocklist-content. And the problem is, that I can not remove the protection, the pages need to be protected... Any solution/idea?
@Jemayn
m
Copy code
csharp
using Examine;
using Examine.Lucene;
using Microsoft.Extensions.Options;
using Umbraco.Cms.Core;
using Umbraco.Cms.Core.Composing;
using Umbraco.Cms.Core.Scoping;
using Umbraco.Cms.Core.Services;
using Umbraco.Cms.Infrastructure.Examine;

namespace www.Extensions.Examine
{
    public sealed class ConfigureExamineIndexOptions : IConfigureNamedOptions<LuceneDirectoryIndexOptions>
    {
        public readonly IPublicAccessService _publicAccessService;
        [Obsolete("Update when Umbraco Core ContentValueSetValidator is updated")]
        private readonly IScopeProvider _scopeProvider;

        [Obsolete("Update when Umbraco Core ContentValueSetValidator is updated")]
        public ConfigureExamineIndexOptions(IPublicAccessService publicAccessService, IScopeProvider scopeProvider)
        {
            _publicAccessService = publicAccessService;
            _scopeProvider = scopeProvider;
        }

        [Obsolete("Update when Umbraco Core ContentValueSetValidator is updated")]
        public void Configure(string? name, LuceneDirectoryIndexOptions options)
        {
            switch (name)
            {
                //NB you need to rebuild the examine index for these changes to take effect
                case Constants.UmbracoIndexes.ExternalIndexName:
                    options.Validator = new ContentValueSetValidator(true, true, _publicAccessService, _scopeProvider);
                    break;                
            }
        }

        public void Configure(LuceneDirectoryIndexOptions options) => throw new NotImplementedException("This is never called and is just part of the interface");
    }

    public class SearchComposer : IComposer
    {
        public void Compose(IUmbracoBuilder builder)
        {
            // Custom Examine configuration
            builder.Services.ConfigureOptions<ConfigureExamineIndexOptions>();
        }
    }
}
options.Validator = new ContentValueSetValidator(true, true, _publicAccessService, _scopeProvider);
changes the external index to include protected nodes.
Copy code
public ContentValueSetValidator(
        bool publishedValuesOnly,
        bool supportProtectedContent,
        IPublicAccessService? publicAccessService,
        IScopeProvider? scopeProvider)
o
Hey @Mike Chambers , thanks. I tried that, but it didn't work... I am using now a library called "FullTextSearch" v. 4.1.0.
@Jemayn any Idea maybe? I also tried this: builder.Services.AddUnique(); but it didn't work aswell.
That's my program.cs:
Copy code
using HunzikerIntranet.HQM.Umbraco.Infrastructure.Configuration;
using HunzikerIntranet.HQM.Umbraco.Interfaces;
using HunzikerIntranet.HQM.Umbraco.Security;
using HunzikerIntranet.HQM.Umbraco.Services;

WebApplicationBuilder builder = WebApplication.CreateBuilder(args);

builder.Services.Configure<AzureAdSettings>(
    builder.Configuration.GetSection("AzureAd")
);

builder.Services.AddTransient<ISearchService, SearchService>();

builder.CreateUmbracoBuilder()
    .AddBackOffice()
    .AddWebsite()
    .AddDeliveryApi()
    .AddComposers()
    .ConfigureAuthenticationMembers(builder.Configuration)
    .Build();

WebApplication app = builder.Build();

await app.BootUmbracoAsync();


app.UseUmbraco()
    .WithMiddleware(u =>
    {
        u.UseBackOffice();
        u.UseWebsite();
    })
    .WithEndpoints(u =>
    {
        u.UseInstallerEndpoints();
        u.UseBackOfficeEndpoints();
        u.UseWebsiteEndpoints();
    });

await app.RunAsync();
and I already tried this "services.AddUnique();" but it didn't work...
m
you do need to rebuild the indexes for the changes to propagate... if you didn't notice the comment.. haven't used full text search.. but hopefully will get you there.
o
I tried rebuilding the indexes but still returning me an empty string.
m
protected nodes are now in the index though?
or always have been? (sorry thought I read that you weren't seeing your protected nodes at all in there)
m
you are looking in the externalIndex and not the internalIndex.. as find it odd that protected nodes are in the default external
the combinedField is from umbraco I think and that was before also here: and there we can also see the content but yeah its not clean string and to clean it is impossible... so I had to use the package because it offers a lot of pros as the highlighting and summary. https://cdn.discordapp.com/attachments/1339219843987144735/1339919908754161705/image.png?ex=67b07928&is=67af27a8&hm=14bb1cbefa20d1989eec511d2f2ab956a854be197f45af1cc2235e2f79a6d79c&
but yeah the bodyText_culture is generated by the package and the indexer from the package (searchBot or something like that)
when I remove all of the protection on the page, then rebuild the indexes, I get the full bodyText and summary...
m
are you using a package to protect certain aspects of the page and not the
restrict public access
for the entire node?
o
No I just use the restrict public access.
Still searching for a solution. I tried a lot of things with ChatGPT, but no solution found...
anyone a solution?
22 Views