How can I make sure that dynamic snippets in the results list are perfect?

10/31/2012

Söksidan

Web pages almost always contains navigation sections; menus and notes containing text not representative for the actual content as a whole.

Distinguishing the real content from navigational text is a prerequisite for:

  • Maximum precision search results
  • Informative and relevant text snippets in the result list, and
  • a functional duplicate control

There are two methods for SiteSeeker to determine what is content and what is navigation and auxiliary information in web pages: 1. Through automatic analysis of the HTML code, or 2. By specific labels in the HTML code.

Automatic navigation detection

The HTML code in web pages almost always gives too little information about where the important content in a web page begins and ends. Therefore, to automatically assess what is content and what is navigation in each web page is in most cases very hard. However, SiteSeeker analyses the HTML code and the contents of the web pages and performs an educated guess. In most cases, SiteSeeker's assessment is correct, although if you want to achieve the best possible performance, we recommend the labeling described below:

Labels for navigation sections

Since identification of navigation sections are crucial for search quailty, SiteSeeker supports the possibility to mark up which part of the text is navigation, which SiteSeeker then excludes when indexing the page. With labelling, you can make sure that only relevant words are indexed, you achieve maximum search accuracy and text snippets that are always representative.

This adjustment can be made in the website templates when using a content management system. In this case, the adjustment is quickly done and you do not need to edit every document on the website.

The labels are made up of two special comments that specifies the beginning and end of the undesired section. Here follows an example where the only sections to be indexed by SiteSeeker are those outside of the blue comments:

    Headline; preamble; some body text...
    <!--eri-no-index-->
    menu item 1, menu item 2, etc
    <!--/eri-no-index-->
    more body text...
    <!--eri-no-index-->
    footer text...
    <!--/eri-no-index-->

Please note that links within a section enclosed by <!--eri-no-index--> works as normal and that link texts are associated with the linked documents. In this way, SiteSeeker's analysis of the link structure can contribute to the relevancy of the hits.

Please note that <!--eri-no-index--> och <!--/eri-no-index--> must be located in the BODY section of the web page in order to function properly. No text from the HEAD section except the title is displayed in the text snippets. Text in the meta description (Description) can if desired be displayed in the result list; you find the relevant setting in SiteSeeker Admin.

If you place labels around text sections in different blocks or controls in your CMS and insert them into different documents, you can nest <!--eri-no-index-->-sections.

If a page can be found by searching for a word that only occurs within a <!--eri-no-index-->-section, the reason could be that the word is also part of:

  • Hidden meta information for the document, e.g. among keywords or description
  • Link texts in other documents that are referring to the document
  • The URL of the page
  • A different inflectional form, in some other part of the text, in the meta information, in a link text or in the URL.

If all of the above cases can be ruled out, please contact SiteSeeker support.