Examine MultipleCharacterWildcard results in same scores for results
a
I was wondering how you guys work with MultipleCharacterWildcards. I've add a search to my blog overviewpage, so people can easily search for words. But currently it's only a exact match. For example if you search on "Naturisten" you won't find "Naturistencamping" So I've added a
MultipleCharacterWildcard()
to my keyword, so it all searches properly. Great! However, all items that have been returned, have the exact same score. According to ChatGpt: "When you perform a wildcard query, Lucene looks up the matching terms in the inverted index and retrieves the documents containing those terms. However, since wildcard queries are considered as exact matches, the relevance scoring for these documents might not differentiate between them effectively, leading to identical scores for all matches." Now I'm quite sure other people have run in to this.. How did you work around this / fixed this?
p
@User May I suggest β€œall” instead of "guys"? We use gender inclusive language in this Discord. πŸ˜€
m
How about including the exact match as well with a boosted value that's higher then the Wildcard match?
So you search for Naturisten OR Naturisten and boost the first one higher. (Not sure what you're blog is about lol)
And the italic Naturisten should be a wildcard, but it's formatting it as italic and not sure what to do about it πŸ˜‰
d
☝️ this is what I do as well: just add the exact match with a boost modifier to increase scores for exact matches
a
Got an example ? πŸ˜„ Not sure on how to do that
m
I gave up trying to understand scoring.. especially when we can't remove IDF https://github.com/Shazwazza/Examine/issues/131 😦 also boosting seems to be a complete mystery.. several tests revealed boosting higher resulted in lower scores! and fuzzy ?????
string.Boost(float)
is in the examine string extensions
using Examine
Copy code
csharp
query.And().Group(n => n
    .GroupedOr(new[] { "nodeName" }, keyword.Escape(), keyword.Boost(12))
    .Or()
    .GroupedOr(new[] { summaryAlias }, keyword.Escape(), keyword.Boost(10))
    .Or()
    .GroupedOr(new[] { descriptionAlias }, keyword.Escape(), keyword.Boost(8))
    .Or()
    .GroupedOr(new[] { authorityAlias }, keyword.Escape(), keyword.Boost(6))
    .Or()
    .GroupedOr(new[] { notesAlias }, keyword.Escape(), keyword.Boost(4))
    .Or()
    .GroupedOr(new[] { prerequisitesAlias }, keyword.Escape(), keyword.Boost(2))
    .Or()
    .GroupedOr(new[] { "nodeName", summaryAlias, descriptionAlias, authorityAlias, notesAlias, prerequisitesAlias }, keyword.Fuzzy())
    .Or()
    .GroupedOr(new[] { "nodeName", summaryAlias, descriptionAlias, authorityAlias, notesAlias, prerequisitesAlias }, keyword.MultipleCharacterWildcard())
);
d
This is an example that I use, except I use Fuzzy search:
Copy code
csharp
if (search.Text is not null)
{
    var terms = search.Text.Split(' ', StringSplitOptions.RemoveEmptyEntries);
    query = query.And(sq =>
    {
        var boosted = terms.Select(t => t.Boost(5)).ToArray();
        var fuzzy = terms.Select(t => t.Fuzzy(0.7f)).ToArray();

        var wordTerms = boosted.Concat(fuzzy);

        var op = sq.GroupedOr(new[] { Index.Defaults.IndexFieldNames.Word }, wordTerms.ToArray());

        return op;
    }, BooleanOperation.Or);
}
a
hmm, ok that explains a little bit
will test, thanks!
10 Views