Sitecore Multi Site with Coveo HTML Indexing in Docker Containers

Vincent Lui
5 min readApr 11, 2023

--

An old school Library Indexing system
Photo by Maksym Kaharlytskyi on Unsplash

Background

I have been using Coveo on a Sitecore instance to index the Sitecore content to Coveo index for Federated Search. The Coveo for Sitecore connector is fantastic to use as a way to index Sitecore content. The Sitecore instance details are as follows:

  • Sitecore v10.0 XM
  • Modules installed are: JavaScript Service (Headless Service) v15.0.1, Sitecore Experience Accelerator v10.0, Coveo for Sitecore Connector v5.0.x
  • Multi Site setup using SXA Multi Site Manager
  • Media Library contains document files that needs to be indexed to the correct site. The setting Media.MediaLinkServerUrl is not suitable.
  • Local Development environment runs in Docker Containers, and other environments run in Azure App Service (Platform as a Service aka PaaS)
  • 1 x Coveo Non Production environment, 1 x Coveo Production environment
  • Both sites are Headless Tenant / Sites, running in Integrated Mode (it actually makes no difference what mode this is running on)
  • There are pages protected with a login gate. However, the protected pages should show up if it matches a normal valid search term regardless if a user is logged in

Jeff L’Heureux (https://twitter.com/jflh) from the Sitecore Demo Team (also a Coveo MVP) has provided a lot of help for me, and was kind enough to even jump on video calls with me to explain in detail on the challenges and how he overcame them.

Docker Containers Complexity

Working with Docker Containers has its challenges with Coveo for Sitecore.

  1. There is no way for Coveo to Invoke the web site locally to verify that content is indexed or deleted properly. Set Coveo.Indexing.CommittedDocumentsPollingEnabed and Coveo.Indexing.DeletedDocumentsPollingEnabled to false.
  2. The Docker Network by default does not understand the host names that are used from the browser. The fix is to add the labels in Traefik, as well as Network Aliases to the Docker Services (cm, cd, RenderingHost). The Network Aliases is the same full host name that is used in the browser. The HttpWebRequest for HTML Indexing from the ContentManagement role needs to communicate properly with the ContentDelivery role or the Rendering Host.

3. The preference is for local developers to share the same set of Coveo for Sitecore indexes. There is not enough active development on the Sitecore part (especially after the site is now live in production) to justify the same type of really nice setup as the Sitecore Platform Demo, where it is Coveo Organisation and Indexes agnostic.
https://github.com/Sitecore/Sitecore.Demo.Platform/tree/develop/docker/images/windows/demo-init/Jobs
What I have done instead, is to activate Coveo indexes for local development environment, and then stored all the Coveo config files which are renamed and filled in with appropriate details during Activation.

Sitecore for Coveo Indexes for local development environment

The complexity lies in the fact that there is an encryption key stored in the Web database Properties table.

A dacpac for that one database record is created, and then installed to the local database

Sitecore Multi Site Complexity

Always have every single environment and every single role in SXA Site Grouping
  1. Coveo for Sitecore has no concept of switching to a Public Facing Host Name (e.g. TargetHostName for CD, or the Public Facing Rendering Host URL). The Web database Coveo search index needs to have the correct public facing URL hostname. Using Sitecore Experience Accelerator may have complicated this scenario as the Indexing role does not have any context to the ContentDelivery or Rendering Host address.
    The solution is to add 2 site properties for the ContentManagement site in SXA Site Grouping. I have opted to call these coveoIngestCMURL and coveoIngestCDURL.
Note how the Valid for Environment field is filled with {Environment}-{Sitecore Role}
Additional Site Properties specified for Coveo to use during Indexing

2. The Media Library has no concept of Multi Site. In SXA, the default Media Library path for each site is /project/[Tenant]/[Site] .

With the above in mind, and taking some fantastic inspiration from the Sitecore Platform Demo application,

I have come up with an approach to support multi site indexing to en sure the correct URLs are used for each site in the Master and Web Database Context.

Here is the Config Patch

3. There are Protected / Paywalled Pages in the site. Authentication headers can be added so that it can authenticate properly during Coveo HTML Index Crawling. I was a little lazy and decided to just keep it super simple.

Final Words

Coveo is really powerful, but can be tricky on the initial setup so that it plays nice in a containerised environment, and also in a multi site setup. I hope this blog post can help the readers resolve, or get some inspiration on how to resolve some of the challenges.

--

--

Vincent Lui
Vincent Lui

Written by Vincent Lui

Sitecore Technology MVP 2020–2025 | Solution Architect on Sitecore, Akamai, Microsoft Azure | Passionate on DevSecOps Lifecycle

No responses yet