The H.W. Wilson Company - New York, Dublin
 
 
 

  Relevancy Ranking and Search Rules Explained

   
 

WilsonWeb offers a unique and satisfying search experience. Powerful WilsonWeb search tools (many of which operate automatically behind the scenes) retrieve the most relevant information, whether the user is a novice or a more experienced researcher.

Periodical and Book Databases

 

Biography Databases

All - Smart Search
Which fields are searched? How are the results displayed?
1. Subject (includes use-for terms from thesaurus)
2. Title, Title enhancement
3. Abstract
4. Author
5. Journal name
6. Full text of the article and bibliographic record
Results are ranked by relevancy:
100% = A match in the subject field
90% = An exact match as a use-for term
80% = A partial match in the subject field
70% = Term found in Title, Title enhancement, or abstract
60% = Term found in the full text of the bibliographic record or the article itself
Keyword
Which fields are searched? How are the results displayed?
1. Subject
2. Title, Title enhancement
3. Abstract
4. Author
5. Journal name
Results displayed by date order (descending from most current entry)
All results from any of these search choices will return only articles containing all of the terms that you seek.
 
All - Smart Search
Which fields are searched? How are the results displayed?
1. Person’s Name Profession
2. Place of Birth (the state and/or country of birth)
3. Date of Birth
4. Date of Death
5. Gender
6. Race/ Ethnicity
7. Works by the person (books, plays, movies, but not for all of the older bios)
8. Subject headings of magazine articles about the person
9. Text of biographies
100% = A match in the following fields: Person, Profession, Place of Origin, Gender, Race, Birth Date, or Death Date
90% = The words were found in the note that appears beside the names in the result set
80% = The words are found in the Works by section at the end of a biography, or in the subject headings list on the Articles with Subject screen.
70% = The words are found near each other in the text of the biographies
Lower = only one of the words you typed was found
Keyword
Which fields are searched? How are the results displayed?
1. Person
2. Profession
3. Source
4. Text of Biography
Results are displayed in alphabetical order

 

Note: Use of “double quotes” surrounding search terms creates an exact bound phrase search.

Any WilsonWeb search will automatically include stemmed variants of your terms. To search without this feature, surround your term(s) in “double quotes” in the text entry box.

The All - Smart Search and the Keyword Search both search the same fields except:

  • The All - Smart Search also searches the full text of the article (when Expand: Also search within the full text of the articles for extra results, has been selected) and the bibliographic record, as well as the thesaurus use for postings.

  • The results of the All-Smart Search are sorted by relevancy, with the most relevant articles at the top of the list. The results of the Keyword Search are simply sorted by date, with the most recent articles at the top of the results set.

Top

Search Strategies for the First Time or Novice User

  • If you would like WilsonWeb to find relevant articles by using its own search rules, select All – Smart Search. See other dropdown boxes for alternative ways to focus your results. If you know what term(s), topic, or names you would like to search, select a simple Keyword search found on the Advanced Search screen in the as: dropdown box.

  • Want to use your own words as search terms? Use the Natural Language option on the Basic Search screen.

  • Not sure what Subject, Journal, or Name you are looking for? Use the Browse mode and select a field from the dropdown list that will take you to the appropriate alphabetical listing.

  • Need other ideas for a subject? Use the Thesaurus, entering your term, and you’ll get suggestions for alternative, broader, or narrower terms to search.

Search Strategies for the Advanced Researcher

  • Focusing the search to specific search criteria: Use the Advanced Search screen. The Wilson rules-based search (takes advantage of Wilson’s metadata) can be done by using All – Smart Search. Otherwise, select one of the other fields from the "as:" dropdown box.

  • Want to construct your own search? Select the Boolean radio button on the Basic Search screen. Note examples of a Boolean search on the screen. For more complex Boolean searching use the Verity search engine syntax found in Help.

  • To find alphabetical lists for such fields as Subject, Journal Name, ISSN number, etc., use the Browse mode.

  • Want to view other possible subjects based upon Wilson’s quality indexing? Go to the Thesaurus and enter your term.

  • View and manipulate current session search strings in Search History. You can also retrieve saved searches and update your results.

Important Information about All – Smart Search, Relevancy Ranking, and the Verity Search Engine

The All – Smart Search query on the Advanced Search screen is much more than simply a keyword search of all fields. Unique to WilsonWeb, the All – Smart Search query is a rules-based search whose basis is the quality indexing of H.W. Wilson. A full explanation of how this operates is given below.

Other key elements described below include an explanation of Relevancy Ranking—a key feature in evaluating your results set. Wilson’s approach to Relevancy Ranking is unique. When you do an All – Smart Search query from the Advanced Search screen, WilsonWeb launches a rules-based search on Wilson’s proprietary metadata, rather than simply relying on density or proximity of words to create a ranking. The new WilsonWeb system is also coupled with the powerful Verity search engine (described below). Verity has been specifically chosen for its power in searching unstructured data such as that found on the Internet.

Top

Advanced Search Screen

The default setting on the Advanced Search Screen is for an All –Smart Search query. This is not the same as the old ERL WilsonWeb Words anywhere query or any other keyword search.

Important Tip: Do not type your own Boolean syntax or any other special character (for example, truncation) query into the text entry boxes with the All –Smart Search query. Please use the Basic Search screen (see below for information) if you wish to construct your own Boolean syntax search string.

  • The All –Smart Search query has at its core the native Verity search algorithms (see Basic Search below for an explanation). However, in order to take full advantage of Wilson’s exceptionally rich indexing metadata, we have overlaid a hierarchical set of Wilson-created search rules.

  • These rules evaluate and rank the occurrence of words primarily by the importance of the field where the word is found, and not simply by the number of times that the word occurs. Thus, finding a term in the subject heading of an article is acknowledged to be more important than simply finding that term in the title, abstract, or full text of the article.

  • The Wilson search rules make use of the metadata created from Wilson indexing by acknowledging and calculating the importance of where the words are found. WilsonWeb looks for occurrences of words in different fields of the record, and assigns a score reflecting the importance of the field in which the word is found.

Note: The only time that the Wilson search rules are employed in a search is when an All –Smart Search query is requested. This can be triggered either from the Advanced Search screen by choosing the default dropdown box of All –Smart Search, or from the Basic Search screen (see instructions below).

The Advanced Search screen also allows queries to selected single fields. Use the drop-down boxes to select the field(s) that you wish to search. These fielded queries will not employ the Wilson search rule hierarchy. Only the Verity search algorithms will be employed for these queries. The use of double quotation marks around a phrase triggers an exact, bound phrase search in these fielded queries.

Selecting Keyword from the dropdown box triggers a search of the traditional citation fields (Title, Title Enhancement, Subject, Abstract, Author, and Journal Name). The use of double quotation marks around a phrase triggers an exact, bound phrase search in these fielded queries.

Various search limits are also available from this screen, including limiting the search to full text articles, peer reviewed articles, various types of articles, or articles with selected physical description characteristics.

Basic Search Screen

  • The default search here is a Natural Language search of the complete record. This is an enhanced keyword search of all text available. All fields are searched including, when available, the full text of the item. The Verity search engine, using the Verity relevancy ranking algorithm (see details below), computes the number of occurrences of the words searched and other variations on that information to derive a relevancy-ranked set of search results. Unlike the All –Smart Search query, this search does not record any difference in the importance of finding the words in a subject heading or any other part of the record. All occurrences of the words are judged to be of equal importance.

  • This screen also allows the user to enter either a Boolean search string, a specific field-limited query (selected by the user’s inclusion of a field code), or a free text search. See Help for tips on how to do this.

  • The user can specify a Boolean search using field codes to search a specific field and choosing a specific operator to use for the search. The Basic search screen is programmed to automatically recognize a Boolean search by the presence of an operator (for example, and, or, not, in among others). The Advanced Search screen does not have that ability.

  • The Verity search engine alone executes the search unless an All –Smart Search query is entered. Selection of the All –Smart Search query triggers the application of the Wilson proprietary search rules in addition to the Verity search algorithms.

Verity Relevancy-Ranking Algorithm Explained

Search results are ranked using the following parameters. (This is what happens when more than one word is entered in the Basic Search screen. For a single term entered in Basic Search only #1 applies):

  1. Density: the number of times the word occurs relative to the size of the document. This differs from simple frequency.

  2. Proximity: The occurrence of the terms near one another

  3. Completeness: Entries with a higher number of terms in the query that are found are ranked higher than fewer terms found.

  4. De-emphasis on Specific Words: Similar to stop words if words occur frequently in the document

Note also:

  • The stemmed variations of the query words are also included in the search (and hits of stemmed words are factored into the rank). For example, searching toxic mold also yields results for toxic molds.

  • Common stop words are automatically dropped from the search unless they are entered in quotation marks. If your query consists exclusively of stop words they are not dropped.

  • The Basic Search screen is capable of handling a Verity Query Language (VQL) query. Verity Query Language queries are ranked according to Verity algorithms. The Wilson search rules are not applied.

  • The Basic Search screen is capable of doing a Wilson rules-based search if the user types
    ‘(your search term) <in> all'.

Top

More Details of the Wilson Search Rules

A condensed version of the Bibliographic search rules would look like this:

  1. 100% = Exact match on subject

  2. 90% = Exact match on see reference

  3. 80% = Partial match on subject with three variations permitted
    a. The first part of a non-topical subject starts with your word, or the word in any order
    b. All of the words in the query are found in the subject (any order)
    c. A see reference occurs as a phrase in the subject (as when there is a subdivision)

  4. 70% = Exact match of the term in the title or other title (title enhancement)

  5. 60% and lower = Match on all of the words in the query. This search includes all non-limit fields such as abstract, source, or author plus anywhere in the full text of the article. Documents with more occurrences of the words will score higher.

There are Different Rules for the Different Classes of Product

Biography rules are:

  • 100% = if it finds all the words you typed in any of the following fields: Person, Profession, Place of Origin, Gender, Race, Birth Date, or Death Date

  • 90% = (not yet implemented), the words were found in the note that appears beside the names in the result set

  • 80% = The words are found in the Works by section at the end of a biography, or in the subject headings list on the Articles with Subject screen.

  • 70% = the words are found near each other in the text of the biographies (biographies that mention what you searched on, but are not necessarily about that topic/person)

  • lower means that most likely only one of the words you typed was found

Famous First Facts rules are:

  • 100% = Match on article heading or place

  • 90% = Match on descriptor

  • 80 % = Match on free text

WilsonWeb proprietary search rules apply to Advanced Search only when you leave the field drop-down at the All - Smart Search selection. If you choose author or personal name, a miniature version of the rules applies (the part of the rules dealing with names). If you select any other drop-down field, the native Verity relevancy-ranking algorithm applies.

Searching in the Wilson Biographies Databases (Overview)

In almost all cases, an All - Smart Search in Advanced Search is your best bet for searching.

The All - Smart Search is much more powerful than simply a search for the words in all fields. The All - Smart Search combines the power of fielded searches with the ease of a keyword search. Because students are so comfortable with searches in Web sites like Google, where they just type in a word(s), we have implemented the All - Smart Search, which allows a user to get the power of our indexing without needing to know which fields to search in.

We have indexed the biographies with the following information:

  • Person’s Name (including cross-reference names such as Samuel Clemens for Mark Twain)

  • Profession

  • Place of Birth (the state and/or country of birth)

  • Date of Birth

  • Date of Death

  • Gender

  • Race/Ethnicity

  • Works by the person (books, plays, movies, but not for all of the older bios)

  • Subject headings of magazine articles about the person

The terms found in these fields are specifically about the person, as opposed to just a word that appears in the text of a biography which might have no real connection to the person you are searching for.

Therefore, we created rules for the All - Smart Search that retrieves records that have your search terms in these fields high in the result rankings, while names that have the terms only mentioned in the text of the biography are lower in the rankings.

For example: if you search on Kurt Vonnegut in All - Smart Search the first two results are 100% hits on two different Kurt Vonneguts. Following that are numerous 70% hits alphabetically listed for entries that contain the words Kurt Vonnegut near each other in the text of the biographies.

Or, if you searched on Breakfast of Champions (a book by Vonnegut), you would get an 80% hit on the record for Kurt Vonnegut because those words are a title by him, followed by 70% hits for names that mention the phrase Breakfast of Champions in the text of the biographies; followed by numerous hits with lower rankings, which mention one of the words.

To sum up the All - Smart Search, users get the power of fielded searching on well-indexed material, but the user doesn’t need any knowledge of which fields to search. A user just sitting down and typing what they are looking for, is very likely get the records he or she needs, with the most important records listed first, followed by less relevant records.

Other Fields

While the All - Smart Search will satisfy most searching needs, we have made fielded searching possible. Among the more important fields are:

Person

This limits your search to the Person (name) field.

For example, in comparison to the All - Smart Search, if you searched on Kurt Vonnegut in Person, you would only get two hits, which are the two people named Kurt Vonnegut. This differs from the All - Smart Search in that the two records you get here are the same as the two 100% hits from an All - Smart Search search on Kurt Vonnegut. However, in the Person search, you will not get the records for the biographies that mention his name (the 70% hits from the All - Smart Search)

Text

This is a search of words contained in the text of a biography. This does not mean a word found anywhere. In addition to the biographies, we have records for books and magazines, and the headings and text of those citations are not considered text.

Other Fields

There are specific lines with drop-down lists for Birth Date, Profession/Activity, Place of Origin, Gender, and Ethnic Background, all of which will only retrieve records that have your search terms in those fields.

Browse

The Browse screen is useful if you don’t know the spelling of a term you are searching. All of our indexed fields (Name, Profession, Gender, Place of Origin, and others) can be searched from Browse.

For example, if you didn’t know how to spell Vonnegut, you could go to Browse, choose the Person field (which is the default), and type vonn. You will be presented with an alphabetical list of all names, with an indicator between Vonk, Nancy and Vonnegut, Bernard so you can easily scroll through the list to find the person you want.

Top

 

H.W. Wilson Home Page  
    © 2008 The HW Wilson Company®  800-367-6770 / 718-588-8400

    950 University Avenue, Bronx, New York 10452       Privacy Policy