Accessing WilsonWeb via Federated or Meta Search Systems

   
 

Contents

Section I: Introduction

Section II: Methods Available

   Z39.50 Method

     Overview

     Advantages and Disadvantages

     Authentication

     Search Queries and Result Count

     Result Displays

     Releasing the Session

   HTTP Method

     Overview

     Advantages and Disadvantages

     Authentication

     Search Queries and Result Count

     Result Displays

     Releasing the Session

Section III: Other Considerations

Section IV: Reference Documents

Section V: Contact Information

 

Section I: Introduction

This document describes methodologies for accessing WilsonWeb subscriptions via meta search gateways. The intended audience is the meta search gateway development staff. It presents two alternative ways one might approach sending queries via a gateway and then presenting the results in either a summary HTML format or a detail HTML format. It includes a discussion of the pros and cons of the two methodologies. For the purposes of this document, the term gateway simply means an intermediate system that performs a WilsonWeb search on behalf of some end user.

Section II: Methods Available

The model described is one where the gateway sends a query to WilsonWeb on behalf of some meta search system user. It is presumed that once the gateway determines if there are any hits on WilsonWeb for the given query, it will provide the user with an HTTP link that will send the user to WilsonWeb in order to display the records from WilsonWeb. Other possibilities exist, but since this is the predominant model requested by users, it is the only one described here.

WilsonWeb has two public interfaces for sending queries. One is to send the query via Z39.50, the other is to send the query via a search URL with the query specified as a parameter of the URL. A meta search to WilsonWeb could be built using either query method. The choice of methodology will depend on the capabilities and limitations of the gateway technology taking into account the capabilities and limitations of WilsonWeb. In both cases, the gateway needs to log the user in, indicate the desired database(s), perform the search, format the results (or some summary of the results) and insert the results into the web page viewed by the searcher. Also considered are the resource allocations of the two methods and other details of session management.

Top

Z39.50 Method

Overview

In this method, the gateway performs the query using Z39.50 and receives back a structured result set (in MARC or XML, say). The gateway then parses the result to get the desired level of information. It can then transform the result into HTML for the purposes of inserting the WilsonWeb result into the page sent back to the user. The HTML in this result will contain one or more HTTP links back to WilsonWeb that will permit browsing the result record details.

Advantages and Disadvantages

The advantage of the Z39.50 query is the detailed structure provided in the query response. A Z39.50 query request can be sent that returns only a hit count and no result records. Also, the Z39.50 server allows greater control over the number of records returned and the structure of the records returned. It is more straightforward to page through the results one block at a time in Z39.50 and receive records in either brief or full format as desired. Result response sets can be sorted into a number of desired orders. Also, the query syntax of Z39.50 is standardized, so the developer does not need to program a new query syntax in order to perform the query. The technologies used to access other Z39.50 targets can be utilized against Wilson’s Z39.50 server. Session management is also more straightforward with Z39.50 queries.
The disadvantage of this approach is that the meta search system must be capable of converting the Z39.50 result set to HTML since HTML is not one of the result formats native to the Z39.50 server.

The details of using the Wilson Z39.50 server are located here:
http://www.hwwilson.com/Documentation/WilsonWeb/z3950.htm.
The following provides a simple overview.

Authentication

All WilsonWeb subscribers can log into the Z39.50 server using their WilsonWeb userid and password. Send this as the id/pass component of the Z39.50 ‘init’ request. Although the Z39.50 server supports TCP/IP authentication, it is presumed this is only relevant for a gateway when it is operating from within the address space of the customer. Gateways operating as an ASP will not be able to use TCP/IP authentication and should use id/pass instead.

Search Queries and Result Count

Specify the desired database ID(s) and the Z39.50 query according to the Z39.50 documentation. The Z39.50 search response will indicate the size of the result set.

Result Displays

Use Z39.50 ‘present’ requests to return any number of records from the result set (up to 20,000 total for any given query). The formats available are MARC, SUTRS, GRS and XML. XML is probably the most beneficial because simple conversion and/or field selection operations can be done using XSLT, SAX or the DOM. By default records are returned in relevance ranked order; the Z39.50 ‘sort’ request can be used to return records in some other order if desired.

The next step is to format a display of each WilsonWeb record retrieved and to build a URL that will return the user to a given record. A Z39.50 XML response record from OmniFile might look like this:

<?xml version="1.0"?>
<record>
<TI>Kuru, prions, and human affairs: talking about epidemics</TI>
<AU>Lindenbaum, Shirley</AU>
<JN>Annual Review of Anthropology</JN>
<SO>Annual Review of Anthropology v. 30 (2001) p. 363-85</SO>
<PY>2001</PY>
<ISSN>0084-6570</ISSN>
<SU>Epidemics; Medical anthropology; Kuru; Bovine spongiform encephalopathy</SU>
<UD>20020122</UD>
<AN>200100104527017</AN>
</record>

The accession number <AN> from the above record can be used to produce a link back to WilsonWeb with the above record displayed. The URL would use the following syntax:

http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml?sid=HWW:OMNIFT&pid=%3Can%3E200100104527017%3C/an%3E

The jumpstart.jhtml page will do automatic TCP/IP authentication (if applicable). It will draw the WilsonWeb frameset and insert the requested record in the display frame. The ‘sid’ parameter specifies the desired database. The ‘pid’ parameter specifies the desired record. The accession number is wrapped in the tag <an>…</an> which is shown above in URL encoding.

If TCP/IP auto-authentication is not applicable, then the institution’s userid and password can be placed on the URL using the ‘un’ and ‘pw’ parameters:

http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml?sid=HWW:OMNIFT&pid=%3Can%3E200100104527017%3C/an%3E&un=userid&pw=password

Releasing the Session

WilsonWeb subscribers either have a site license to their subscribed databases or they may have a license limited by simultaneous users. In cases where there are limits on simultaneous usage (and to reduce the amount of server overhead), it is important to properly close the Z39.50 session after the search is performed and the desired records are extracted. Some subscriptions are limited to a single simultaneous user and in this case proper session management is particularly important. Use the Z39.50 ‘close’ request to end the Z39.50 session explicitly.

If the z39.50 session is not closed, then the simultaneous user slot provided to do the Z39.50 query will remain unavailable to other searchers. And since WilsonWeb is used to display the results in the above example, it is even more important to close the Z39.50 session since the jumpstart.jhtml page will attempt to open a new session for the user.

WilsonWeb uses cookies to manage sessions for web users. If a user is browsing results from WilsonWeb by following links produced by the meta search gateway, then the user’s browser needs to return this cookie (a jsessionid) on each WilsonWeb request. The cookies are not persistent and will be forgotten if the user closes their browser.

Top

HTTP Method

Overview

In this method, the gateway performs the authentication, search, display and logout all using HTTP URLs. Some meta search gateways attempt to perform these operations by submitting search forms copied from the WilsonWeb frameset; submitting them on the behalf of the user and proxying the HTML results in some fashion. While this is also technically feasible, it is probably more advantageous to perform the search using ‘get’ URLs. The URL syntax is less likely to change while the format of an HTML search form is more subject to change.

Advantages and Disadvantages

WilsonWeb uses a frameset that makes proxying results more difficult than non-frameset systems. In addition, the internal frame used for result displays references javascript functions that are in other frames. For this reason, it is important to send the user to a properly constructed frameset when attempting to view WilsonWeb results in HTML. The URL for the result set frame cannot be used as a destination link itself.

It is not possible on WilsonWeb, using URLs alone, to perform a search and receive only the hit count as a response. Only the Z39.50 method is capable of providing a result set count in a structured way. In order to get the result set size from an HTTP query, the gateway would need to isolate the hit count in the result frames returned.

It is not possible, using URLs alone, to change the result set size, sort order, or record format (brief/full) using the WilsonWeb URL search syntax. Nor is it possible to jump to any page other than the topmost page (records 1-10) of the result set. Again only the Z39.50 method is capable of these controls.

It is perhaps tempting to overcome the limitations of WilsonWeb’s jumpstart.jhtml syntax by submitting forms for record pagination, record sorting order changes or other operations that can be performed by forms, but not by the documented target URL parameters. These forms are not likely to be stable and use of them is not advised—Z39.50 probably provides a more stable approach.

Authentication

In the HTTP method, authentication and search are performed in a single step. The ‘jumpstart.jhtml’ page can log the user in using TCP/IP address checking, but this method applies only when the search is being conducted from a machine within the customer’s IP address space. Alternatively, the un= and pw= parameters must be specified to authenticate the user.

In response to the search query, WilsonWeb will return an HTTP ‘Set Cookie’ header with a jsessionid. It is important for the gateway to accept this cookie and use it in URL links described below. Otherwise subsequent attempts to view records will generate new sessions and inflate the simultaneous user slots allocated (perhaps to the point where no additional users will be allowed to view results.)

Search Queries and Result Count

The details of the URL query syntax are located here:

http://www.hwwilson.com/Documentation/WilsonWeb/targetURL.htm.

Using the advanced syntax, the gateway can perform any type of query. By default, up to 10 brief records in relevance rank order starting from the first will be returned in a frameset. The URL of the internal frame cannot be used as a destination link by itself. In order to provide a link to the destination result set, see the next section.
As stated above, the result count is shown as text within one of the returned frames. The page (http://vnweb.hwwilsonweb.com/hww/results/results_common_right_top.jhtml) will have HTML text looking like this:

<b>Search&nbsp;Summary Total:&nbsp;</b>1800

Result Displays 

Since the internal frame URL containing the records is not valid as a start page, the gateway must either:

  1. Return the entire frameset produced by the query to the user, or

  2. Provide a link to the jumpstart.jhtml URL which re-does the query, or

  3. Parse the HTML returned to provide jumpstart.jhtml links that return the user to one of the returned records.

Number 1 may not be feasible because it would mean proxying several HTTP response streams. (There are 7 visible frames and 1 hidden frame in the WilsonWeb frameset.)

Number 2 is feasible, but the important step here is to insert the jsessionid returned in the initial Set Cookie header into the URL that re-does the query. This will ensure that the same session used to perform the query will be used to view the results. The jsessionid goes on the URL just before the first query parameter like this:

http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml l;jsessionid=BMORP1XGEOMNFQA3DILCFFQ?prod=OMNIFT&query=cat+%3Cin%3E+KW

Number 3 means isolating the Accession Number of a returned record so that a jumpstart.jhtml link can be built to re-display the frameset showing the desired record. Such a link would look like this:

http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml;jsessionid=BMORP1XGEOMNFQA3DILCFFQ?
sid=HWW:OMNIFT&pid=%3Can%3E200100104527017%3C/an%3E

How to obtain the record accession number from the HTML display: the checkboxes used for marking records contain the record’s accession number as a value. If you parse the HTML response looking for the checkbox next to each record, it will look like this:

<input type="checkbox" name="checkbox" value="200100104527017" onClick="javascript:doCheckbox( this );" />1

Find and extract the record’s accession number from the value attribute of the checkbox. While this allows linking back to individual records within WilsonWeb, since the result set display contains only 10 records (by default), it will not be possible to link to records 11 (or greater). For this reason, re-doing the whole query is probably the best approach.

Releasing the Session

The following URL will release the session. Again, the jsessionid will need to be included in order for WilsonWeb to identify the correct session.

http://vnweb.hwwilsonweb.com/hww/logout.jhtml;jsessionid=BMORP1XGEOMNFQA3DILCFFQ

Whether or not to release the session depends on whether the gateway intends to return the user to the same session. If the link to view records uses the same jsessionid as was returned during the query, then you do not want to release the session. If the link to view records does not contain the jsessionid, then it would be best to release the session. If the gateway’s session manager leaves sessions open with the expectation that users may (soon) return to do another query, then the session can be left open for better performance, but the jsessionids should be coded into all URLs used for viewing results.

In any case, if the user (or the gateway) does not return to the same session, WilsonWeb will time out the session allocated to the search after 20 minutes of inactivity. Any simultaneous user slots that were allocated during the search will remain unavailable to other users until the time out occurs or until the session is explicitly terminated. For this reason, we recommend making a logout option available even though few users are likely to employ it.

Also, statistically speaking, if the number of systems searched in parallel with WilsonWeb is large, then the user may have a lot of choices as to which result sets to view. In this case, the number of searchers who return to the WilsonWeb result set may be much smaller than the number of searches performed. When this is true and the number of searches performed per session is also low, it may also be best to release the session since the number of return users is not a large percentage of the number of total searches. And although re-establishing the session takes more server processing than re-using the session, its best not to accumulate idle sessions since these will occupy simultaneous user slots. In any case, the best solution is to allow the customer to choose the session management policy based on whether they have limited simultaneous user slots or not.

Top

Section III: Other Considerations

The WilsonWeb product line contains a number of product types; there are article indexes with full text, book indexes, biography databases, fact databases and art object databases. To reduce overheads and increase performance, its best to limit searches to those products containing the document type the user is looking for. And due to the variety of products, the gateway needs to be able to parse different result types when constructing links back to WilsonWeb.

Within WilsonWeb, a result record will often contain a rich set of hyperlinks to full text documents or other related information. Some of these are customer-specific links to OPAC servers and OpenURL link resolvers. Since these are dynamically inserted, the user will gain access to the most relevant information if they are returned to WilsonWeb to view results. Attempts to reformat query results obtained from Z39.50 or the web are less likely to provide these extra features.

Multi-database searches in WilsonWeb are already de-duped. Therefore the gateway need not perform any extra de-duping steps when searching multiple WilsonWeb databases.

Section IV: Reference Documents

WilsonWeb URL Target syntax
http://www.hwwilson.com/Documentation/WilsonWeb/targetURL.htm

WilsonWeb Z39.50 Server Information
http://www.hwwilson.com/Documentation/WilsonWeb/z3950.htm

Section V: Contact Information

For questions regarding the contents of this document, please send an email to Bernie Seiler.

Top