|
Contents
Section I: Introduction
Section II: Methods Available
Z39.50 Method
Overview
Advantages and
Disadvantages
Authentication
Search Queries and
Result Count
Result Displays
Releasing the Session
HTTP Method
Overview
Advantages and
Disadvantages
Authentication
Search Queries and
Result Count
Result Displays
Releasing the Session
Section III: Other Considerations
Section IV: Reference Documents
Section V: Contact Information
Section I: Introduction
This document describes methodologies for accessing
WilsonWeb subscriptions via meta search gateways. The intended audience is
the meta search gateway development staff. It presents two alternative
ways one might approach sending queries via a gateway and then presenting
the results in either a summary HTML format or a detail HTML format. It
includes a discussion of the pros and cons of the two methodologies. For
the purposes of this document, the term gateway simply means an
intermediate system that performs a WilsonWeb search on behalf of some end
user.
Section II: Methods Available
The model described is one where the gateway sends a
query to WilsonWeb on behalf of some meta search system user. It is
presumed that once the gateway determines if there are any hits on
WilsonWeb for the given query, it will provide the user with an HTTP link
that will send the user to WilsonWeb in order to display the records from
WilsonWeb. Other possibilities exist, but since this is the predominant
model requested by users, it is the only one described here.
WilsonWeb has two public interfaces for sending queries.
One is to send the query via Z39.50, the other is to send the query via a
search URL with the query specified as a parameter of the URL. A meta
search to WilsonWeb could be built using either query method. The choice
of methodology will depend on the capabilities and limitations of the
gateway technology taking into account the capabilities and limitations of
WilsonWeb. In both cases, the gateway needs to log the user in, indicate
the desired database(s), perform the search, format the results (or some
summary of the results) and insert the results into the web page viewed by
the searcher. Also considered are the resource allocations of the two
methods and other details of session management.
Top
Z39.50 Method
Overview
In this method, the gateway performs the query using
Z39.50 and receives back a structured result set (in MARC or XML, say).
The gateway then parses the result to get the desired level of
information. It can then transform the result into HTML for the purposes
of inserting the WilsonWeb result into the page sent back to the user. The
HTML in this result will contain one or more HTTP links back to WilsonWeb
that will permit browsing the result record details.
Advantages and Disadvantages
The advantage of the Z39.50 query is the detailed
structure provided in the query response. A Z39.50 query request can be
sent that returns only a hit count and no result records. Also, the Z39.50
server allows greater control over the number of records returned and the
structure of the records returned. It is more straightforward to page
through the results one block at a time in Z39.50 and receive records in
either brief or full format as desired. Result response sets can be sorted
into a number of desired orders. Also, the query syntax of Z39.50 is
standardized, so the developer does not need to program a new query syntax
in order to perform the query. The technologies used to access other
Z39.50 targets can be utilized against Wilson’s Z39.50 server. Session
management is also more straightforward with Z39.50 queries.
The disadvantage of this approach is that the meta search system must be
capable of converting the Z39.50 result set to HTML since HTML is not one
of the result formats native to the Z39.50 server.
The details of using the Wilson Z39.50 server are
located here:
http://www.hwwilson.com/Documentation/WilsonWeb/z3950.htm.
The following provides a simple overview.
Authentication
All WilsonWeb subscribers can log into the Z39.50 server
using their WilsonWeb userid and password. Send this as the id/pass
component of the Z39.50 ‘init’ request. Although the Z39.50 server
supports TCP/IP authentication, it is presumed this is only relevant for a
gateway when it is operating from within the address space of the
customer. Gateways operating as an ASP will not be able to use TCP/IP
authentication and should use id/pass instead.
Search Queries and Result Count
Specify the desired database ID(s) and the Z39.50 query
according to the Z39.50 documentation. The Z39.50 search response will
indicate the size of the result set.
Result Displays
Use Z39.50 ‘present’ requests to return any number of
records from the result set (up to 20,000 total for any given query). The
formats available are MARC, SUTRS, GRS and XML. XML is probably the most
beneficial because simple conversion and/or field selection operations can
be done using XSLT, SAX or the DOM. By default records are returned in
relevance ranked order; the Z39.50 ‘sort’ request can be used to return
records in some other order if desired.
The next step is to format a display of each WilsonWeb
record retrieved and to build a URL that will return the user to a given
record. A Z39.50 XML response record from OmniFile might look like this:
<?xml version="1.0"?>
<record>
<TI>Kuru, prions, and human affairs: talking about epidemics</TI>
<AU>Lindenbaum, Shirley</AU>
<JN>Annual Review of Anthropology</JN>
<SO>Annual Review of Anthropology v. 30 (2001) p. 363-85</SO>
<PY>2001</PY>
<ISSN>0084-6570</ISSN>
<SU>Epidemics; Medical anthropology; Kuru; Bovine spongiform
encephalopathy</SU>
<UD>20020122</UD>
<AN>200100104527017</AN>
</record>
The accession number <AN> from the above record can be
used to produce a link back to WilsonWeb with the above record displayed.
The URL would use the following syntax:
http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml?sid=HWW:OMNIFT&pid=%3Can%3E200100104527017%3C/an%3E
The jumpstart.jhtml page will do automatic TCP/IP
authentication (if applicable). It will draw the WilsonWeb frameset and
insert the requested record in the display frame. The ‘sid’ parameter
specifies the desired database. The ‘pid’ parameter specifies the desired
record. The accession number is wrapped in the tag <an>…</an> which is
shown above in URL encoding.
If TCP/IP auto-authentication is not applicable, then
the institution’s userid and password can be placed on the URL using the
‘un’ and ‘pw’ parameters:
http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml?sid=HWW:OMNIFT&pid=%3Can%3E200100104527017%3C/an%3E&un=userid&pw=password
Releasing the Session
WilsonWeb subscribers either have a site license to
their subscribed databases or they may have a license limited by
simultaneous users. In cases where there are limits on simultaneous usage
(and to reduce the amount of server overhead), it is important to properly
close the Z39.50 session after the search is performed and the desired
records are extracted. Some subscriptions are limited to a single
simultaneous user and in this case proper session management is
particularly important. Use the Z39.50 ‘close’ request to end the Z39.50
session explicitly.
If the z39.50 session is not closed, then the simultaneous user slot
provided to do the Z39.50 query will remain unavailable to other
searchers. And since WilsonWeb is used to display the results in the above
example, it is even more important to close the Z39.50 session since the
jumpstart.jhtml page will attempt to open a new session for the user.
WilsonWeb uses cookies to manage sessions for web users.
If a user is browsing results from WilsonWeb by following links produced
by the meta search gateway, then the user’s browser needs to return this
cookie (a jsessionid) on each WilsonWeb request. The cookies are not
persistent and will be forgotten if the user closes their browser.
Top
HTTP Method
Overview
In this method, the gateway performs the authentication,
search, display and logout all using HTTP URLs. Some meta search gateways
attempt to perform these operations by submitting search forms copied from
the WilsonWeb frameset; submitting them on the behalf of the user and
proxying the HTML results in some fashion. While this is also technically
feasible, it is probably more advantageous to perform the search using
‘get’ URLs. The URL syntax is less likely to change while the format of an
HTML search form is more subject to change.
Advantages
and Disadvantages
WilsonWeb uses a frameset that makes proxying results
more difficult than non-frameset systems. In addition, the internal frame
used for result displays references javascript functions that are in other
frames. For this reason, it is important to send the user to a properly
constructed frameset when attempting to view WilsonWeb results in HTML.
The URL for the result set frame cannot be used as a destination link
itself.
It is not possible on WilsonWeb, using URLs alone, to
perform a search and receive only the hit count as a response. Only the
Z39.50 method is capable of providing a result set count in a structured
way. In order to get the result set size from an HTTP query, the gateway
would need to isolate the hit count in the result frames returned.
It is not possible, using URLs alone, to change the
result set size, sort order, or record format (brief/full) using the
WilsonWeb URL search syntax. Nor is it possible to jump to any page other
than the topmost page (records 1-10) of the result set. Again only the
Z39.50 method is capable of these controls.
It is perhaps tempting to overcome the limitations of
WilsonWeb’s jumpstart.jhtml syntax by submitting forms for record
pagination, record sorting order changes or other operations that can be
performed by forms, but not by the documented target URL parameters. These
forms are not likely to be stable and use of them is not advised—Z39.50
probably provides a more stable approach.
Authentication
In the HTTP method, authentication and search are
performed in a single step. The ‘jumpstart.jhtml’ page can log the user in
using TCP/IP address checking, but this method applies only when the
search is being conducted from a machine within the customer’s IP address
space. Alternatively, the un= and pw= parameters must be specified to
authenticate the user.
In response to the search query, WilsonWeb will return
an HTTP ‘Set Cookie’ header with a jsessionid. It is important for the
gateway to accept this cookie and use it in URL links described below.
Otherwise subsequent attempts to view records will generate new sessions
and inflate the simultaneous user slots allocated (perhaps to the point
where no additional users will be allowed to view results.)
Search
Queries and Result Count
The details of the URL query syntax are located here:
http://www.hwwilson.com/Documentation/WilsonWeb/targetURL.htm.
Using the advanced syntax, the gateway can perform any
type of query. By default, up to 10 brief records in relevance rank order
starting from the first will be returned in a frameset. The URL of the
internal frame cannot be used as a destination link by itself. In order to
provide a link to the destination result set, see the next section.
As stated above, the result count is shown as text within one of the
returned frames. The page (http://vnweb.hwwilsonweb.com/hww/results/results_common_right_top.jhtml)
will have HTML text looking like this:
<b>Search Summary Total: </b>1800
Result Displays
Since the internal frame URL containing the records is
not valid as a start page, the gateway must either:
-
Return the entire frameset produced by the query to the
user, or
-
Provide a link to the jumpstart.jhtml URL which re-does
the query, or
-
Parse the HTML returned to provide jumpstart.jhtml
links that return the user to one of the returned records.
Number 1 may not be feasible because it would mean
proxying several HTTP response streams. (There are 7 visible frames and 1
hidden frame in the WilsonWeb frameset.)
Number 2 is feasible, but the important step here is to
insert the jsessionid returned in the initial Set Cookie header into the
URL that re-does the query. This will ensure that the same session used to
perform the query will be used to view the results. The jsessionid goes on
the URL just before the first query parameter like this:
http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml l;jsessionid=BMORP1XGEOMNFQA3DILCFFQ?prod=OMNIFT&query=cat+%3Cin%3E+KW
Number 3 means isolating the Accession Number of a
returned record so that a jumpstart.jhtml link can be built to re-display
the frameset showing the desired record. Such a link would look like this:
http://vnweb.hwwilsonweb.com/hww/
jumpstart.jhtml;jsessionid=BMORP1XGEOMNFQA3DILCFFQ?
sid=HWW:OMNIFT&pid=%3Can%3E200100104527017%3C/an%3E
How to obtain the record accession number from the
HTML display: the checkboxes used for marking records contain the
record’s accession number as a value. If you parse the HTML response
looking for the checkbox next to each record, it will look like this:
<input type="checkbox"
name="checkbox" value="200100104527017" onClick="javascript:doCheckbox(
this );" />1
Find and extract the record’s accession number from the
value attribute of the checkbox. While this allows linking back to
individual records within WilsonWeb, since the result set display contains
only 10 records (by default), it will not be possible to link to records
11 (or greater). For this reason, re-doing the whole query is probably the
best approach.
Releasing the
Session
The following URL will release the session. Again, the
jsessionid will need to be included in order for WilsonWeb to identify the
correct session.
http://vnweb.hwwilsonweb.com/hww/logout.jhtml;jsessionid=BMORP1XGEOMNFQA3DILCFFQ
Whether or not to release the session depends on whether
the gateway intends to return the user to the same session. If the link to
view records uses the same jsessionid as was returned during the query,
then you do not want to release the session. If the link to view records
does not contain the jsessionid, then it would be best to release the
session. If the gateway’s session manager leaves sessions open with the
expectation that users may (soon) return to do another query, then the
session can be left open for better performance, but the jsessionids
should be coded into all URLs used for viewing results.
In any case, if the user (or the gateway) does not
return to the same session, WilsonWeb will time out the session allocated
to the search after 20 minutes of inactivity. Any simultaneous user slots
that were allocated during the search will remain unavailable to other
users until the time out occurs or until the session is explicitly
terminated. For this reason, we recommend making a logout option available
even though few users are likely to employ it.
Also, statistically speaking, if the number of systems
searched in parallel with WilsonWeb is large, then the user may have a lot
of choices as to which result sets to view. In this case, the number of
searchers who return to the WilsonWeb result set may be much smaller than
the number of searches performed. When this is true and the number of
searches performed per session is also low, it may also be best to release
the session since the number of return users is not a large percentage of
the number of total searches. And although re-establishing the session
takes more server processing than re-using the session, its best not to
accumulate idle sessions since these will occupy simultaneous user slots.
In any case, the best solution is to allow the customer to choose the
session management policy based on whether they have limited simultaneous
user slots or not.
Top
Section III: Other Considerations
The WilsonWeb product line contains a number of product
types; there are article indexes with full text, book indexes, biography
databases, fact databases and art object databases. To reduce overheads
and increase performance, its best to limit searches to those products
containing the document type the user is looking for. And due to the
variety of products, the gateway needs to be able to parse different
result types when constructing links back to WilsonWeb.
Within WilsonWeb, a result record will often contain a
rich set of hyperlinks to full text documents or other related
information. Some of these are customer-specific links to OPAC servers and
OpenURL link resolvers. Since these are dynamically inserted, the user
will gain access to the most relevant information if they are returned to
WilsonWeb to view results. Attempts to reformat query results obtained
from Z39.50 or the web are less likely to provide these extra features.
Multi-database searches in WilsonWeb are already
de-duped. Therefore the gateway need not perform any extra de-duping steps
when searching multiple WilsonWeb databases.
Section
IV: Reference Documents
WilsonWeb URL Target syntax
http://www.hwwilson.com/Documentation/WilsonWeb/targetURL.htm
WilsonWeb Z39.50 Server Information
http://www.hwwilson.com/Documentation/WilsonWeb/z3950.htm
Section
V: Contact Information
For questions regarding the contents of this document,
please send an email to Bernie
Seiler.
Top
|