Sitecore Multi Site with Coveo HTML Indexing in Docker Containers
Background
I have been using Coveo on a Sitecore instance to index the Sitecore content to Coveo index for Federated Search. The Coveo for Sitecore connector is fantastic to use as a way to index Sitecore content. The Sitecore instance details are as follows:
- Sitecore v10.0 XM
- Modules installed are: JavaScript Service (Headless Service) v15.0.1, Sitecore Experience Accelerator v10.0, Coveo for Sitecore Connector v5.0.x
- Multi Site setup using SXA Multi Site Manager
- Media Library contains document files that needs to be indexed to the correct site. The setting
Media.MediaLinkServerUrl
is not suitable. - Local Development environment runs in Docker Containers, and other environments run in Azure App Service (Platform as a Service aka PaaS)
- 1 x Coveo Non Production environment, 1 x Coveo Production environment
- Both sites are Headless Tenant / Sites, running in Integrated Mode (it actually makes no difference what mode this is running on)
- There are pages protected with a login gate. However, the protected pages should show up if it matches a normal valid search term regardless if a user is logged in
Jeff L’Heureux (https://twitter.com/jflh) from the Sitecore Demo Team (also a Coveo MVP) has provided a lot of help for me, and was kind enough to even jump on video calls with me to explain in detail on the challenges and how he overcame them.
Docker Containers Complexity
Working with Docker Containers has its challenges with Coveo for Sitecore.
- There is no way for Coveo to Invoke the web site locally to verify that content is indexed or deleted properly. Set
Coveo.Indexing.CommittedDocumentsPollingEnabed
andCoveo.Indexing.DeletedDocumentsPollingEnabled
tofalse
. - The Docker Network by default does not understand the host names that are used from the browser. The fix is to add the labels in Traefik, as well as Network Aliases to the Docker Services (cm, cd, RenderingHost). The Network Aliases is the same full host name that is used in the browser. The
HttpWebRequest
for HTML Indexing from theContentManagement
role needs to communicate properly with theContentDelivery
role or the Rendering Host.
3. The preference is for local developers to share the same set of Coveo for Sitecore indexes. There is not enough active development on the Sitecore part (especially after the site is now live in production) to justify the same type of really nice setup as the Sitecore Platform Demo, where it is Coveo Organisation and Indexes agnostic.
https://github.com/Sitecore/Sitecore.Demo.Platform/tree/develop/docker/images/windows/demo-init/Jobs
What I have done instead, is to activate Coveo indexes for local development environment, and then stored all the Coveo config files which are renamed and filled in with appropriate details during Activation.
The complexity lies in the fact that there is an encryption key stored in the Web
database Properties
table.
A dacpac
for that one database record is created, and then installed to the local database
Sitecore Multi Site Complexity
- Coveo for Sitecore has no concept of switching to a Public Facing Host Name (e.g. TargetHostName for CD, or the Public Facing Rendering Host URL). The Web database Coveo search index needs to have the correct public facing URL hostname. Using Sitecore Experience Accelerator may have complicated this scenario as the
Indexing
role does not have any context to theContentDelivery
orRendering Host
address.
The solution is to add 2site
properties for theContentManagement
site in SXA Site Grouping. I have opted to call thesecoveoIngestCMURL
andcoveoIngestCDURL
.
2. The Media Library has no concept of Multi Site. In SXA, the default Media Library path for each site is /project/[Tenant]/[Site]
.
With the above in mind, and taking some fantastic inspiration from the Sitecore Platform Demo application,
I have come up with an approach to support multi site indexing to en sure the correct URLs are used for each site in the Master
and Web
Database Context.
Here is the Config Patch
3. There are Protected / Paywalled Pages in the site. Authentication headers can be added so that it can authenticate properly during Coveo HTML Index Crawling. I was a little lazy and decided to just keep it super simple.
Final Words
Coveo is really powerful, but can be tricky on the initial setup so that it plays nice in a containerised environment, and also in a multi site setup. I hope this blog post can help the readers resolve, or get some inspiration on how to resolve some of the challenges.