Support Home > Search > Advanced Search: Document Fields

Advanced Search: Document Fields

Jetpack Search is a powerful replacement for the search capability built into WordPress. It is a paid upgrade to the Jetpack plugin that provides higher quality results and an improved search experience. Upgrade today to get started.

This support article covers how to use customize using the search api and is intended for developers.


Each search result is a single Elasticsearch document. Currently there is only a single document type.The top level code for building our index is open sourced and is the best place to look if you want the details. Below though is a written description of all the fields that are safe to rely on. Certain fields (especially about extracted post content) are likely to change due to Gutenberg indexing and so they have been left out.

Description of the Fields

We use a few specific terms to specify fields below.

Data Type: description of the data that is stored

  • number
  • string
  • boolean
  • date

Type Details: details about the mapping for that field

  • short, integer, long
  • boolean
  • text: An analyzed string. Tokenized into multiple terms.
  • keyword: The string is treated as a single term. Keyword strings get truncated.
  • token_count: A count of the number of tokens in the text. i.e. the word count.
  • date: The date data object takes dates in ISO 8601 format either with times (yyyy-MM-dd HH:mm:ss) or without (yyyy-MM-dd).

Tokenization languages

There is a default language analysis used for text fields, and then also custom language analysis for 29 languages. The language analyzers are defined in this code.

Post Fields

Post Info

Field NameData TypeType DetailsNotes
site_idnumbershort1 for WordPress.com, 2 for Jetpack
blog_idnumberinteger 
post_idnumberlong 
parent_post_idnumberlong 
ancestor_post_idsnumberlong 
stickyboolean boolean 
menu_ordernumberinteger 
slugstringkeyword 
permalink.url.analyzedstringtextURL no protocol
permalink.url.rawstringkeywordURL no protocol
permalink.hoststringkeyword 
permalink.reverse_hoststringkeyword 
post_typestringkeyword 
post_formatstringkeyword 
post_statusstringkeyword 
has_passwordboolean boolean 
publicboolean boolean 
featured_imagestringkeywordURL no protocol
featured_image_url.url.analyzedstringtextURL no protocol
featured_image_url.url.rawstringkeywordURL no protocol
featured_image_url.hoststringkeyword 
featured_image_url.reverse_hoststringkeyword 

Post Language

The post language is determined dynamically by detecting the language in the post title, content, and excerpt fields. If it is not possible to detect the post language then the fall back is the blog’s configured language.

Field NameData TypeType DetailsNotes
langstringkeywordTwo letter ISO 639 code

Post Author

The post author is the WordPress.com user that authored the post. If it’s a Jetpack site and we are unable to determine the corresponding WordPress.com user the author_id field will be set to 0.

Field NameData TypeType DetailsNotes
authorstringtextWordPress.com display name
author.rawstringkeywordWordPress.com display name
author_loginstringkeywordWordPress.com username
author_idnumberintegerWordPress.com user id

Post Content

All content in the post is processed in a similar way with HTML and shortcodes stripped. The content is analyzed for which language it is in and then indexed into the appropriate lang field. Content is always put into the default field(s) also.

Field NameData TypeType DetailsNotes
all_content.defaultstringtextdefault analyzer
all_content.default.engramstringtextsearch as you type analyzer
all_content.default.word_countnumbertoken_countcount of words – works for whitespace delimited languages
all_content.[LANG]stringtext[LANG]_analyzer analyzer
all_content.[LANG].engramstringtextsearch as you type with lang specific analysis
all_content.[LANG].word_countnumbertoken_countcount of words, but will exclude stop words. Good for ja, ko, zh
title.defaultstringtext
title.default.word_countnumbertoken_count
title.[LANG]stringtext
title.[LANG].word_countnumbertoken_count
excerpt.defaultstringtext
excerpt.default.word_countnumbertoken_count
excerpt.[LANG]stringtext
excerpt.[LANG].word_countnumbertoken_count
content.defaultstringtext
content.default.word_countnumbertoken_count
content.[LANG]stringtext
content.[LANG].word_countnumbertoken_count

Post Tags and Categories

Field NameData TypeType DetailsNotes
tag_cat_countnumbershortTotal number of tags and categories
tag.name.defaultstringtext default analyzer
tag.name.[LANG]stringtext [LANG] analyzer
tag.slug_slash_namestringkeyword combines slug and formatted name for displaying aggregations
tag.slugstringkeyword 
tag.term_idnumberlong 
category.name.defaultstringtextdefault analyzer
category.name.[LANG]stringtext[LANG] analyzer
category.slug_slash_namestringkeywordcombines slug and formatted name for displaying aggregations
category.slugstringkeyword 
category.term_idnumberlong 

Post Custom Taxonomies

The taxonomy fields are dynamic and the [NAME] portion of the field name depends on the name of the post taxonomy. There is a hardcoded list of taxonomies that are indexed. This has not yet been synced to Jetpack.

Field NameData TypeType DetailsNotes
taxonomy.[NAME].namestringtext default analyzer
taxonomy.[NAME].name.slug_slash_namestringkeyword combines slug and formatted name for displaying aggregations
taxonomy.[NAME].slugstringkeyword 
taxonomy.[NAME].term_idnumberlong 

Post Interactions

Field NameData TypeType DetailsNotes
like_countnumbershort 
liker_idsnumberintegerWordPress.com users that liked this post
comment_countnumberinteger 
commenter_idsnumberintegerWordPress.com users that commented on this post
is_rebloggedboolean booleanPost contains reblogged content from another site
reblog_countnumberlongNumber of times this post was reblogged elsewhere
reblogger_idsnumberlongWordPress.com users that reblogged this post elsewhere

Post Dates

Each the dates associated with the post is stored as both a date data type as well as broken out into token parts to make granular date based searches easier. For example, finding all posts that were published on a Tuesday (date_token.day_of_week), or those that were modified in the second half of each hour (modified_token.seconds_from_hour).

Field NameData TypeType DetailsNotes
datedatedate 
date_token.yearnumbershort4 digit year
date_token.monthnumberbyte 
date_token.daynumberbyte 
date_token.hournumberbyte24 hour format
date_token.minutenumberbyte 
date_token.secondnumberbyte 
date_token.day_of_yearnumbershortThe day of the year (starting from 0)
date_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
date_token.week_of_yearnumberbyteWeek number of year
date_token.seconds_from_daynumberintegerSeconds since midnight of day
date_token.seconds_from_hournumbershortSeconds since start of hour
date_gmtdatedate 
date_gmt_token.yearnumbershort4 digit year
date_gmt_token.monthnumberbyte 
date_gmt_token.daynumberbyte 
date_gmt_token.hournumberbyte24 hour format
date_gmt_token.minutenumberbyte 
date_gmt_token.secondnumberbyte 
date_gmt_token.day_of_yearnumbershortThe day of the year (starting from 0)
date_gmt_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
date_gmt_token.week_of_yearnumberbyteWeek number of year
date_gmt_token.seconds_from_daynumberintegerSeconds since midnight of day
date_gmt_token.seconds_from_hournumbershortSeconds since start of hour
modifieddatedate 
modified_token.yearnumbershort4 digit year
modified_token.monthnumberbyte 
modified_token.daynumberbyte 
modified_token.hournumberbyte24 hour format
modified_token.minutenumberbyte 
modified_token.secondnumberbyte 
modified_token.day_of_yearnumbershortThe day of the year (starting from 0)
modified_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
modified_token.week_of_yearnumberbyteWeek number of year
modified_token.seconds_from_daynumberintegerSeconds since midnight of day
modified_token.seconds_from_hournumbershortSeconds since start of hour
modified_gmtdatedate 
modified_gmt_token.yearnumbershort4 digit year
modified_gmt_token.monthnumberbyte 
modified_gmt_token.daynumberbyte 
modified_gmt_token.hournumberbyte24 hour format
modified_gmt_token.minutenumberbyte 
modified_gmt_token.secondnumberbyte 
modified_gmt_token.day_of_yearnumbershortThe day of the year (starting from 0)
modified_gmt_token.day_of_weeknumberbyte1 for Monday through 7 for Sunday
modified_gmt_token.week_of_yearnumberbyteWeek number of year
modified_gmt_token.seconds_from_daynumberintegerSeconds since midnight of day
modified_gmt_token.seconds_from_hournumbershortSeconds since start of hour

Post Meta

There is a hard coded list of post meta keys that are indexed. This list has not yet been synced to Jetpack.

The post meta fields are dynamic and the [KEY] portion of the field name depends on the name (key) of the post meta being indexed. To accommodate advanced querying all post meta values are cast and indexed as numeric and boolean values in addition to being indexed as strings.

Field NameData TypeType DetailsNotes
meta.[KEY].valuestringtext default analyzer
meta.[KEY].value.rawstringkeyword 
meta.[KEY].datedatedateIf it looks like a date
meta.[KEY].longnumberlongValue cast as 64bit integer (bigint) if it looks like a number
meta.[KEY].doublenumberdoubleValue cast as floating point number if it looks like a number
meta.[KEY].booleanboolean booleanValue cast as boolean

* Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.

  • Categories

  • Contact Us

    Need more help? Feel free to contact us.