Have you ever used a new piece of technology and immediately wished you could poke around inside — “How’d they do that?” Us too! Luckily, the open source ethos that informs everything we do means that the guts of the new Jetpack Search are there for the poking.
For a change this big, we also thought it’d be useful to share why we felt large changes to our architecture were necessary in order to improve the search experience.
Why change it?
We launched Jetpack Search two years ago, mostly as a wrapper around an Elasticsearch search API that we integrated into the standard way that WordPress themes handle search. This approach made it easy for a site admin to customize the search query and integrate with existing themes any way they wanted, but there were tradeoffs:
- It was stodgy. WordPress Core search results don’t feel modern, they are designed as an archive listing of posts. There is no indication of what terms matched. There is no spelling correction. The results often take up too much space on the page, so they are particularly tough on mobile devices. In almost every theme, search and content discovery is an afterthought.
- It could be faster. We used Elasticsearch to replace MySQL to speed things up, but this left a lot more delay than is necessary. A website visitor would send a request to your server to do a search; the servers sent a request to our API; the whole page got generated; and then it was all sent back to the visitor. Depending on the locations of the user, the server, and our API, data may be making multiple hops across the planet.
- It wasn’t fully usable. User experience research shows that filtering is a big aid to discovery and exploration, but most themes lack sidebars, or have sidebars that won’t work on mobile devices.
Yes, users got faster and better results by just replacing MySQL with Elasticsearch. But great search provides relevant results very quickly so that the visitor is encouraged to explore even when they are on a mobile device. Any software architecture that is built to work on many thousand (hopefully many million) websites is going to have tradeoffs. In our new architecture we remove some control of the search query from the site admin and use an overlay on top of the theme. In return, we think we have been able to build the best and easiest to deploy WordPress site search experience.
To build this new search experience, we completely re-architected Jetpack Search. There are many small improvements, but the big ones are:
- An optimized API. Jetpack Search now isn’t just a wrapper around an Elasticsearch API, but an API optimized for WordPress sites that can run multiple search queries when needed for producing the best results.
- New indexing. Our rebuilt Elasticsearch index supports search as you type, matches against comments as well as posts, and uses pageview stats to improve the result rankings.
- Speed. Our search and filtering now goes directly from the visitor’s browser to our API to minimize delay.
The new and improved search and discovery
These changes solve all our original issues with Jetpack Search, and they greatly improve the most important goals of search and discovery: better search relevancy, fast search and filtering, and reduced load on your server.
Improved Search Relevancy
A lot goes into building great search results, and the new Jetpack Search architecture helps in a few key areas.
Our prior default search algorithm used BM-25 for ranking, then slightly reranking results based on how recently they were posted. We also had some example code to allow boosting based on the number of comments and likes, which we used as a rough stand-in for popularity. For some sites recency worked well, and for others popularity worked. But it was tough to predict which would be better and selecting one or the other was a fairly blunt tool.
To further improve our search algorithm, we started experimenting with adding the percentage of pageviews from the past 30 days into the index. We ended up finding that pageviews are a much better ranking signal because it somewhat combines both popularity and recency. So now most of our result ranking is strongly influenced by the number of pageviews a post or page gets. Conveniently, if you get a lot of Google Search traffic, our search results should be heavily influenced by Google’s ranking algorithm.
Our new ranking algorithm is pretty new, so it is still hand-tuned. As we collect more data, we will improve how well it works on many different types of sites. However, our new architecture also uses anonymized tracking and A/B testing, which will let us run experiments on our search algorithm over time so we can make consistent, ongoing improvements.
Ultimately, search needs to match content on the site against what the site visitor is searching for. There’s a lot that can get in the way: typos, users not knowing correct spellings, local spellings variants (“favorite” vs “favourite”), not using the same word to describe something as the website owner (e.g. “car” vs “automobile”), not having any content that matches, and more.
Our API now does a simple form of spelling correction. We know from other experiments we have run that between 20-30% of all searches have some form of typo or spelling error. We implemented a version of spelling correction for our old Jetpack Search architecture in late 2018, but it required running multiple Elasticsearch queries and we were never happy with how it integrated with the theme. So we never actually launched it.
With our new architecture, if there are no matching search results, we make an educated guess at what the user meant and quickly run a second search query. We hope to put a pretty big dent in how often users get zero search results.
When matching content, it also almost always helps to have more text to search against. We also include the content of a post’s 100 most recent comments along with the post content to further improve the likelihood of a match. We are also experimenting with adding content from some common post meta such as: WooCommerce SKU, product categories, and a few dozen others.
Most big public search websites have some common components that are not possible on top of Core WordPress. Two big ones: highlighting matching terms and faceted search results.
Highlighting matching terms gives the searcher a lot more context about why a result matched the search query. They can see the context around the matching terms, and see whether it came from a comment, the title, or the main text. We can then condense the search results to take up less space on the screen so that the user can quickly scan through multiple results.
Faceting means that we display a set of filters in the sidebar so users can quickly filter long lists of results to find what they need. Refining the search query might lead to zero results; filtering broad results helps ensure that the user won’t run into a dead end.
We had filtering and facets in the previous version of Jetpack Search, but only 10% of sites actually configured it, and it didn’t work if the user was on a mobile device or your theme didn’t have a sidebar. Our new overlay UI ensures that we can can automatically configure filters for all sites while still letting admins customize the settings, sidebar or no.
Fast Search and Filtering
Here’s a flow diagram for a search query on the old architecture:
And here’s what it looks like in the new architecture:
For an end user sitting in New Zealand (where one of our developers lives) the old search would take multiple seconds to display a set of results. The new architecture takes about half a second to go from his browser to our data center in California. I’m sitting in Colorado, and it takes about 0.2 seconds to get my search results from Texas.
Beyond the re-architecture, we deployed a number of other technologies to speed up search results:
- Cache search results in the user’s browser, so that if they rerun a search they won’t even need to make a new API request.
- For every API request, we use Memcache to cache every Elasticsearch query so that about 50% of our API searches don’t need to go to Elasticsearch at all.
- We don’t use MySQL at all for the API requests. All response data comes from Elasticsearch.
- All images we render use the globally deployed Jetpack CDN. (Our current search results don’t display images, though we have a prototype.)
- Our search index and algorithm is optimized for search as you type so that we can display relevant results as quickly as possible.
We also take a wide view of “search,” so we speed up filtering and content discovery. We know that only 1% of page loads on the millions of WordPress.com sites are from search, while 9% are from tags and category pages (as a comparison, 13% are homepages). So when our search widget is configured in the sidebar of a site, all of the filters and facets displayed there will open the search overlay rather than requiring a new page reload. The faster content discovery is, the more engaging and fast the site will feel.
Remove the Load on Your Server
WordPress can run small sites on $5 a month hosts, or huge sites on hosts that cost thousands of dollars a month. In either case, heavy search use can easily break the MySQL database. We’ve seen searches that take 30-60 seconds to complete on MySQL take less than 100ms when offloaded to Elasticsearch. Part of the magic of Jetpack is that your busiest day of traffic is still a normal day for us. With our new architecture, the search and filter queries won’t even go to your server — they’ll all be offloaded to our API.
In order to achieve that magic, we cache a lot of the data from your website onto our servers and use that to build a search index for you. If you update a tag that’s on 10,000 posts, we’ll reindex all of those posts to keep everything in sync. It can take a while to build the initial cache, but once it’s done, it’s a big load off your site.
Into the Future
We think that the new Jetpack Search architecture will serve your site well into the future by offering your visitors a more streamlined, modern experience. We’ve made a few tradeoffs that reduce customization options, but feel like the search experience is at least 10x better than before and definitely better than what else is in the market. We also expect that this new architecture will help us to continue improving your search experience over time. We hope you like Jetpack Search as much as we enjoyed building it.
Upgrade your WordPress search experience today.
Did that sound like your idea of fun? Come work with us!