Can the Umbraco Examine Search index static html files? Umbraco #help-with-umbraco

Can the Umbraco Examine Search index static html f...

pdqumbraco

03/01/2024, 8:59 AM

As the title suggest, is it possible for the examine search to index a folder of static html files? We have an archive of around 800 html files, that we want to make available via the search and I know there have been additions like searching PDF's - just wondering if the same could be done with .html files

Lars-Erik

03/01/2024, 9:49 AM

Hi @pdqumbraco, Examine, and underneath Lucene, can index anything you can make simple value types from. See the documentation about creating a custom "product index" here: https://docs.umbraco.com/umbraco-cms/reference/searching/examine/indexing#creating-your-own-index In the

ProductIndexPopulator

, instead of getting content from Umbraco, you'd scrape your HTML files using whatever means you like. The

ValueSet

collection could be as simple as a "body" field with all the text, or you could parse metadata as well and store several fields per "doc". After that it'll behave just like any other index. Also see the part about MultiIndexSearcher in the same doc. area. for examples on how you can combine the search with your regular content search.

pdqumbraco

03/01/2024, 3:05 PM

Hi @Lars-Erik , Thanks for responding and for the link. What I couldn't see from the docs is how you point examine to folder of html files or do you have to upload the html files to the media library?

Lars-Erik

03/01/2024, 3:16 PM

You just use the file system, System.IO and possibly HtmlAgilityPack. Nothing Umbraco-related. Just plain .net. 🙃

pdqumbraco

03/01/2024, 4:28 PM

Thanks Lars, I shall give it a go - wish me luck 😀

6 Views

Previous Next