Find content pages by content of related pdf file?
d
A client is asking if it's possible to find content pages based on the contents of pdf files that are referenced on said content page. Has anyone done this before? Should I be comfortable with indexing the content of pdf files on every page that it's used on?
j
I've done it where we indexed it on the page and added that field to our global search
d
Alright, how was your experience, doing that? I imagine you have to look out for content and media changes and reindex content accordingly?
j
We did it in a TransformingIndexValues event, and basically told the client that whenever they save a page the pdf content they picked will be added to that pages search index. We then informed them that if they went into the media section and replaced the pdf file with another one on the pdf media node then they would need to do a full index rebuild. As this was a usecase they said they would almost never need (and frankly didn't know was possible when we told them), we didn't bother spending a lot of time catching that potential issue
d
That makes sense, perhaps I can pull a trick like that as well
Thanks!!
j
Np 🙂
If I had to support replacing the file, then I'd do it via a notification that would find any references to the media node via the relations service I think. But prefer to just make it simpler if the client doesnt really need it.
d
I wish I had that; I can't stand these little loose ends that can possibly mess up the index. I've been thoroughly taught that a content editor should not be able to break the CMS