Support Home > Search > Jetpack Search: Document Fields (Advanced)

Jetpack Search: Document Fields (Advanced)

Jetpack Search is a powerful replacement for the search capability built into WordPress. It is a paid upgrade to the Jetpack plugin that provides higher quality results and an improved search experience. Upgrade today to get started.

Note: This support article is intended for Jetpack Inline Search rather than the newer Jetpack Instant Search solution. Jetpack Inline Search is deprecated and we recommend all users switch to Jetpack Instant Search, our newer search solution. For more details on the difference between Inline Search and Instant Search, check out this page.

This support article covers how to customize Jetpack Search by using the Jetpack Search API and is intended for developers.

Each search result is a single Elasticsearch document. Currently, there is only a single document type. The top level code for building our index is open sourced and is the best place to look if you want the details. Below, though, is a written description of all the fields that are safe to rely on. Certain fields (especially about extracted post content) are likely to change due to Gutenberg indexing, and so they have been left out.

Description of the Fields

We use a few specific terms to specify fields below.

Data Type: description of the data that is stored

  • number
  • string
  • boolean
  • date

Type Details: details about the mapping for that field

  • short, integer, long
  • boolean
  • text: An analyzed string. Tokenized into multiple terms.
  • keyword: The string is treated as a single term. Keyword strings get truncated.
  • token_count: A count of the number of tokens in the text. i.e. the word count.
  • date: The date data object takes dates in ISO 8601 format either with times (yyyy-MM-dd HH:mm:ss) or without (yyyy-MM-dd).

Tokenization languages

There is a default language analysis used for text fields, and then also custom language analysis for 29 languages (Arabic, Bulgarian, Catalan, Czech, Danish, German, Greek, English, Spanish, Basque, Persian, Finnish, French, Hebrew, Hindi, Hungarian, Armenian, Indonesian, Italian, Japanese, Korean, Dutch, Norwegian, Portuguese, Romanian, Russian, Swedish, Turkish, Chinese).

The language analyzers are defined in this code.

Post Fields

Post Info

Field NameData TypeType DetailsNotes
site_idnumbershort1 for, 2 for Jetpack
stickyboolean boolean 
permalink.url.analyzedstringtextURL no protocol
permalink.url.rawstringkeywordURL no protocol
has_passwordboolean boolean 
publicboolean boolean 
featured_imagestringkeywordURL no protocol
featured_image_url.url.analyzedstringtextURL no protocol
featured_image_url.url.rawstringkeywordURL no protocol

Post Language

The post language is determined dynamically by detecting the language in the post title, content, and excerpt fields. If it is not possible to detect the post language then the fall back is the blog’s configured language.

Field NameData TypeType DetailsNotes
langstringkeywordTwo letter ISO 639 code

Post Author

The post author is the user that authored the post. If it’s a Jetpack site and we are unable to determine the corresponding user the author_id field will be set to 0.

Field NameData TypeType DetailsNotes display name display name username user id

Post Content

All content in the post is processed in a similar way, with HTML and shortcodes stripped. The content is analyzed for which language it is in and then indexed into the appropriate lang field. Content is always put into the default field(s) also.

Field NameData TypeType DetailsNotes
all_content.defaultstringtextdefault analyzer
all_content.default.engramstringtextsearch as you type analyzer
all_content.default.word_countnumbertoken_countcount of words – works for whitespace delimited languages
all_content.[LANG]stringtext[LANG]_analyzer analyzer
all_content.[LANG].engramstringtextsearch as you type with lang specific analysis
all_content.[LANG].word_countnumbertoken_countcount of words, but will exclude stop words. Good for ja, ko, zh

Post Tags and Categories

Field NameData TypeType DetailsNotes
tag_cat_countnumbershortTotal number of tags and categories default analyzer[LANG]stringtext [LANG] analyzer
tag.slug_slash_namestringkeyword combines slug and formatted name for displaying aggregations
tag.term_idnumberlong analyzer[LANG]stringtext[LANG] analyzer
category.slug_slash_namestringkeywordcombines slug and formatted name for displaying aggregations

Post Custom Taxonomies

The taxonomy fields are dynamic and the [NAME] portion of the field name depends on the name of the post taxonomy. There is a hardcoded list of taxonomies that are indexed. This has not yet been synced to Jetpack.

Field NameData TypeType DetailsNotes
taxonomy.[NAME].namestringtext default analyzer
taxonomy.[NAME].name.slug_slash_namestringkeyword combines slug and formatted name for displaying aggregations

Post Interactions

Field NameData TypeType DetailsNotes
like_countnumbershort users that liked this post
comment_countnumberinteger users that commented on this post
is_rebloggedboolean booleanPost contains reblogged content from another site
reblog_countnumberlongNumber of times this post was reblogged elsewhere users that reblogged this post elsewhere

Post Dates

Each the dates associated with the post is stored as both a date data type as well as broken out into token parts to make granular date based searches easier. For example, finding all posts that were published on a Tuesday (date_token.day_of_week), or those that were modified in the second half of each hour (modified_token.seconds_from_hour).

Field NameData TypeType DetailsNotes
date_token.yearnumbershort4 digit year
date_token.hournumberbyte24 hour format
date_token.day_of_yearnumbershortThe day of the year (starting from 0)
date_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
date_token.week_of_yearnumberbyteWeek number of year
date_token.seconds_from_daynumberintegerSeconds since midnight of day
date_token.seconds_from_hournumbershortSeconds since start of hour
date_gmt_token.yearnumbershort4 digit year
date_gmt_token.hournumberbyte24 hour format
date_gmt_token.day_of_yearnumbershortThe day of the year (starting from 0)
date_gmt_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
date_gmt_token.week_of_yearnumberbyteWeek number of year
date_gmt_token.seconds_from_daynumberintegerSeconds since midnight of day
date_gmt_token.seconds_from_hournumbershortSeconds since start of hour
modified_token.yearnumbershort4 digit year
modified_token.hournumberbyte24 hour format
modified_token.day_of_yearnumbershortThe day of the year (starting from 0)
modified_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
modified_token.week_of_yearnumberbyteWeek number of year
modified_token.seconds_from_daynumberintegerSeconds since midnight of day
modified_token.seconds_from_hournumbershortSeconds since start of hour
modified_gmt_token.yearnumbershort4 digit year
modified_gmt_token.hournumberbyte24 hour format
modified_gmt_token.day_of_yearnumbershortThe day of the year (starting from 0)
modified_gmt_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
modified_gmt_token.week_of_yearnumberbyteWeek number of year
modified_gmt_token.seconds_from_daynumberintegerSeconds since midnight of day
modified_gmt_token.seconds_from_hournumbershortSeconds since start of hour

Post Meta

There is a hard coded list of post meta keys that are indexed. This list has not yet been synced to Jetpack.

The post meta fields are dynamic and the [KEY] portion of the field name depends on the name (key) of the post meta being indexed. To accommodate advanced querying all post meta values are cast and indexed as numeric and boolean values in addition to being indexed as strings.

Field NameData TypeType DetailsNotes
meta.[KEY].valuestringtext default analyzer
meta.[KEY].datedatedateIf it looks like a date
meta.[KEY].longnumberlongValue cast as 64bit integer (bigint) if it looks like a number
meta.[KEY].doublenumberdoubleValue cast as floating point number if it looks like a number
meta.[KEY].booleanboolean booleanValue cast as boolean

* Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.

If you have any further questions about Jetpack Search, please check out our related support documentation. If your questions aren’t answered there feel free to contact us and we will be happy to help.

  • Table Of Contents

  • Categories

  • Contact Us

    Need more help? Feel free to contact us.