Can the Umbraco Examine Search index static html f...
# help-with-umbraco
p
As the title suggest, is it possible for the examine search to index a folder of static html files? We have an archive of around 800 html files, that we want to make available via the search and I know there have been additions like searching PDF's - just wondering if the same could be done with .html files
l
Hi @pdqumbraco, Examine, and underneath Lucene, can index anything you can make simple value types from. See the documentation about creating a custom "product index" here: https://docs.umbraco.com/umbraco-cms/reference/searching/examine/indexing#creating-your-own-index In the
ProductIndexPopulator
, instead of getting content from Umbraco, you'd scrape your HTML files using whatever means you like. The
ValueSet
collection could be as simple as a "body" field with all the text, or you could parse metadata as well and store several fields per "doc". After that it'll behave just like any other index. Also see the part about MultiIndexSearcher in the same doc. area. for examples on how you can combine the search with your regular content search.
p
Hi @Lars-Erik , Thanks for responding and for the link. What I couldn't see from the docs is how you point examine to folder of html files or do you have to upload the html files to the media library?
l
You just use the file system, System.IO and possibly HtmlAgilityPack. Nothing Umbraco-related. Just plain .net. 🙃
p
Thanks Lars, I shall give it a go - wish me luck 😀