Search relevancy

Getting the relevant results is critical for the success of the community. Here you can what parameters impact the relevancy score for a piece of content and the rank it will get when you search for a specific search phrase.
The relevancy rank is calculated as follows:
Rank = (SimilarityScore + ProximityScore) * OutcomeType * ObjectType * Recency * SocialScore
These parameters impact the rank of a content item and can provide a boost to get it to the top of the search results:

Similarity score

When searching for a phrase the system looks at each word in the phrase and checks the match type and place of match for this work. Each match type and place has its own boost score. The default settings are listed in Table 1.

The boost score is normalized with the number of times the searched term appears in the given content (the more it appears the better), as well as with the number of times this term appears in the search index (the more common the term is, the less impact it has on the rank).

Match types that Cloud Search employes:

  • Raw: Exact matches of the search term.
  • Analyzed: Matches that are created by language analyzer. In this case, stemming is used, that is, looking for the root of the word. For example, "focusing" will also find "focus", "focused", and other related words with the same stem.
  • Edgengram: Partial match, used for wildcard search matches and matches in search-as-you-type queries.

Places of matches that Cloud Search employes:

  • Subject: Title field of content items
  • Body: Content of content items
  • Tags: Tags added to content items
Table 1. Similarity boosts
Match type Match place
Subject Body Tags
Raw 1.0 0.1 0.5
Analyzed 1.0 0.1
Edgengram 1.0 0.1 0.5

Proximity score

The proximity score checks how close is the term the user searches for to what appears in the content. When a user searches for a phrase built from several words, this phrase may appear exactly the same way in the content or it may appear in the content in a slightly different way. For example, content with the term "product one-pager brochure" is an approximate match when searching for "product brochure".

Types of proximity boosts:

  • Exact match: When all the search terms appear in the content next to each other
  • Proximity match: When all the search terms appear less than three words apart from each other

The proximity score is also used to boost more relevant results. Exact matches get boosted more than proximity matches. The default settings are listed in Table 2.

Table 2. Proximity and exact match boosts
Place Proximity boost Exact match boost
Subject 0.5 1.6
Body 0.5 1.0
Tags* 0.1 1.0

* Having proximity score on Tags is unlikely to happen.

Additionally, frequency is taken into account. The score has a lot to do with how many occurrences of the word user is searching for exists in the field. For example, if a 20,000-word essay makes a single reference to the movie "Finding Nemo" somewhere in the document and another document in the system has only 50 words and includes "Finding Nemo", the latter is counted more relevant to a query for "nemo".

Outcome type

Content in Jive can be marked with structured outcomes. The search results are boosted based on outcome type.

The boosts given to content according to outcome type are listed in Table 3.

Table 3. Outcome boosts
Outcome Boost Outcome Boost
Finalized 1.4 Official 2.0
Outdated 0.1 Default 1.0

This score is being multiplied by the boosts above.

Note that a higher boost results in that content being ranked higher in the search results, so the 0.1 score for outdated documents significantly reduces its rank.

Object type

Similarly to outcome boost, there is a boost for ranks based on the type of content used. Documents and blogs are ranked higher in the search results as these are usually used for more comprehensive content that may be more relevant for the searching user. The settings are listed in Table 4.

Table 4. Object boosts
Object Boost Object Boost
Document 1.4 Poll 1.0
blog 1.4 Idea 1.0
Discussion 1.0 Video 1.0
Question 1.0 Status Update 1.0

Recency

Recency (or time decay) lowers the score for older content. The impact of content can be seen this way:

Figure: Recency boost by default



Recency score calculation is based on the following parameters:

Table 5. Recency parameters for calculations
Parameter Description Default
Drop speed Determines how fast the algorithm reduces the content score by age 50
Max value Determines the latest period the content from which has the same score without decay 4 weeks
Minimum score Determines the score difference of a very old document and a just created one as 2 times as maximum. It is set so that even the oldest relevant content can be found but allows preference for fresh content. 0.9

Social score

The Jive R2E2 service calculates a social score for the search phrase based on given user activities, follows, and other behavioral connections.

The R2E2 service (previously Jive Find) provides improved search relevance by incorporating social information into search. Search rankings are tailored for individuals based on dynamic signals derived from activity within Jive. As users use Jive, data is generated about activities, such as views, creates, responses, and likes. These activities are processed in the Jive Recommender service and summarized into a form that can be used by Jive Search to enhance the relevance of the search results. When a user searches for content or places, items that are considered close to the user (based on the activities performed by the user or other individuals connected to the user) are given a boost in the search rankings. This personalizes search results for each user.

The details of how user activity translates into levels of boost change over time as the system is optimized.