Fall Meeting Prague

Nov 2009


 User requirements

for the BHL-Europe portal


Evaluation of the User Requirement Survey

Oct-Nov 2009

BHL-Europe User Requirements (PDF files of results)


Participants: total 52 (for questions 8-14 only 33 due to a program bug)

19 participants (37 %) responded to all questions but only the results for questions 1-7 were transmitted, the other answers were lost. This bug was evenly distributed and all consortium members suffered equally (I analysed the IP numbers which were transmitted for all participants). Results of questions 1-7 did not differ significantly among the two groups, so the remaining 33 participants should be equally representative than the entire group.
[18 Nov]: the reason for the bug had obviously been that 19 participants did not enter a freetext answer to question 13 "Any further suggestions?" - this had the effect that the complete block of questions 8-14 were not submitted to surveymonkey without that the participants had the opportunity to notice that. So: we should never make a freetext-only question obligatory.

In most questions 3 items very important, important and not important could be selected.
I would have preferred to provoke answers "I agree with" or "I support" a given statement. This would have made the evaluation easier.

The 3 offered items represented unevenly distributed levels of agreement. 5 items would have been better.
"Very important" = 1, "important" = 2, "not important" = 5.
Items 3 and 4 were not offered: "moderately important" and "little important".

It is impossible to say how the participants understood the term "important" in this context (as a choice for a moderate answer, or as a stronger than moderate answer?). In the evaluation I assume that some did so, and some did otherwise, "important" may represent something like a value of 2.5.
I observed that many participants only selected 1 or 2, but not 5 (for example all CSIC participants). This can make it extremely difficult to detect preferences.

Obviously only potential users working in taxonomic contexts in scientific institutes were selected to be asked.
Participants must have been told which kind of service BHL-Europe would like to create (a BHL-like portal with some more functions).
Many participants did not know much about the data resources. Some seem to have thought that BHL-Europe has personal resources to evaluate the literature and to tell the users which species were reported to live in which geographic units.


Evaluation of the survey in detail


1. Describe a situation in which you would come to the BHL-Europe Portal.

95 % look for information on species
The first answer was a control question and just documents that the selection of participants was appropriate.
65 % (1-6 times/week: 93 %) consult the original description
65 % (csic 100 %) are interested in geographical information (GBIF,
40-50 % (csic 100 %) like to use ubio-like tools (lists of publications in which taxonomic names were mentioned, less important (15 %) was "in a given time interval")
30 % (38 % of those who selected "taxonomic names" in question 6) need local fauna/flora overviews ( (serveral participants noted that we wrote fauna and forgot flora)

--> original descriptions and geographical distribution is needed, ubio-tools are also requested

Freetext answers:

Requested items within the scope of BHL-Europe:

Old texts are often difficult to order from libraries. Here BHL could really help!

The "standard" monographs and journals are much less problem to find than the 'unusual' ones. Eastern Europe and the former USSR are troublesome.
The articles which are in natural history journals rather than botanical journals are harder to obtain. Plant names that are in seedlists are problematic, and the documentation for those that are in exsiccata are very hard to get.

I'm looking for a scanned, and if possible usably OCR'd version of a specific taxonomic work. This is about the only potentially useful function I currently see BHL fulfilling. I would never use BHL to try and discover the formal taxonomic history of a taxon name (as most of your above questions imply), as so much of the content at BHL is very badly OCR'd and there is no expectation that the full relevant literature has even been scanned.

The following requirements suggest that modern literature should be digitized:

What is the conservation statut of a given species ?

How many publications on a particular species/group are published in a particular time interval? E.g. in 2004.

information on the biology of the species (ecology, genetics, nature conservation etc). So would it be possible to provide metadata level distinction of papers?

Requested items outside the scope of BHL-Europe:

Prominent links to wider museum / herbarium collections that have been digitised for the web under other digitisation schemes (i.e. a true portal).

Publications on a given taxon in a country or set of countries.
This requires thorough analysis of the contents by experts!

Also infraspecifical variations
This requires thorough analysis of the contents by experts! 


2. What sort of information would you primarily request when you come to this site?

90 % taxonomic literature
85 % scientific names
60 % bibliographies (was not understood what this should have meant in this context)
50 % artwork
35 % tables, statistical evaluations


scientists' biographies (2 x)


3. How you are going to use the items you are looking for?

In the first 4 questions they just answered the same as in question 2:
they look up literature (90 %), scientific names (75 %), artwork (50 %).
"Browse through the items online" rated 50 %.
The participants did not understand that the main question was to select between "I know exactly what I am looking for, usually I also know exactly the page I have to consult"  and "I just open the book and read it".

Second 4 questions:

80 % (1-6 times/week: 90 %) save to harddisk (they need PDFs and accurate OCRs, to be downloaded quickly)
55 % read online
40 % print on paper
30 % link to their website (we should have had a question "I don't have such a website", should also have included "integrate to my database or expert library")

This result seems to reflect the current situation in which it is much quicker to download a PDF to harddisk and then to read it, than to read it online in a very slow browser. It would have been important to ask the reasons for their preferences.
- PDFs can also be read in case you are offline.
- Browser can also be read in case you do not use your own computer.
We should have asked the participants to explain better their situation.


4. Which parts of the items you are going to use?

90 % use text (1-6 times/week: 100 %)
80 % use images
60 % use tables, table of contents, abstract, list of references
50 % use subject and person indexes


Species and other higher taxa descriptions, with strong emphasis on original descriptions. This includes text, images but only rarely other types of information.
This was the expected answer - of course all parts are needed in a book, but mainly text and images.

Would it be possible to include geographical index as well.
This documents that the questions caused serious misunderstandings. The participant thought that BHL-Europe would create such indexes.


5. What is the search functionality you would like to have? How do you search for literature?

This did not give a clear result, I suspect that nobody knew what was actually asked meant by the given answers. Many questions were not self-explaining. Several participants noted this in the freetext.
The contents of freetext answers to the above question were much more useful.

85 % Google-like search
75 % Search for all the words, Boolean operators (and, or, not)
60 % Browse catalogue, browse content/items, use simple search, choose metadata fields, choose between search options, search for exact phrases, search for any of the words
30 % Tolerant search (hare gives hair)
20 % Search without the words

A Google-like search is a search for all the words. This is also the usual way browsing an online catalogue works.


But you MUST do a much better job indexing by author, year etc. what you have scanned, and providing a portal that makes this information accessible to potential users.

Despite several attempts to use BHL I have not been able (except with insider help) to even find literature that you have scanned - and once found, the quality has been unnacceptable.

In general all search functions should be optimised and have a response time of 5 seconds maximum, providing a list with DIRECT links to relevant documents.

Diacritics to be matched with closest simple Latin letter (e.g. ñ = n) to be selected or not


6. Which are the metadata fields you would like to include in your (simple, advanced) search?

In other words: what will users type in to find something?
It would perhaps have been better to ask what users are used to. It should also have been clear to us that it is most useful to offer many options. We should have asked which one should we select as the default search options.

100 % author
95 %  year
90 %  title
85 %  taxonomic names
65 %  subject
50 %  object entity (title, serial, volume...)
30 %  object type (text, image)
25 %  contributor
20 %  language

--> author and year are extremely important for the search.
--> scientific literature research is widely independent from language aspects 


The more options for a search, the better (but with only one option required). Please see the RHS plant selector website for an excellent example:


7. How would you like to sort the results?

The results obtained by which kind of request? It makes a difference if the user searched for literature or for taoxnomic names.

95 % by author
80 % by year (current OPAC model)
60 % by title alphabetical (current BHL model)
50 % by content relevance (current Gallica model)

Importance of change direction (a-z / z-a) as a tool was rated 50:50.


By species name (2 x).

--> a scientific literature search page should be sorted by author, secondly by year.
There are no online literature catalogues which currently sort by author. We have to discuss how such a service can be achieved. In many BHL metadata the author does not show up or is hidden.
Journals cannot be sorted by author. In OPAC they are sorted by year - and are listed at the first year when the forst volume of the journal appeared.


8. What is the quality of digital content you need and want for display and download?
What file-types would you like to see at the BHL-Europe Portal?

File formats:

95 % PDF (70 % selected parts)
60 % OCR text (40 % all documents)
50 % JPG
40 % Excel files


70 % Low resolution for display
60 % High resolution for download
30 % Low resolution for download
15 % High resolution for display


Download of Articles - not only the whole book.

All possibilities. After looking the content by fast operations, it might become important to have high resolution.

It's very important for me to have a quick access to botanic descriptions, so I prefer low/average resolution. Maybe would it be useful to be able to switch between low & high resolution for the same item ?

It is utmost important to use different scanning types. A book of 500 text pages does not have to be scanned in 1200 dpi colour scans! All text pages can easily be scanned in b/w and saved as .tiff files. This way a pdf generated from such a book can be quite small and is much more comfortable to work with!
On the other hand, colour plates as well as greytone plates SHOULD be scanned in high resolution and in colour so the information in the drawings / paintings is preserved in the best way possible, which is highly important! 

For the web presentation all pages should be small (albeit of a quality allowing good reading). This way a fast loading process for online viewing is guaranteed. Downloadable pdfs should be in high resolution but of manageable size (see comment above).

Download of results (bibliographies, etc) in RIS, BibTeX or TXT format in order to be able to import into bibliography software

--> Users like to be able to individually select between various file formats and file sizes. Default browser presentation should include small files that load in extremely quickly. PDF files for download should be like in Gallica, with possible choice to select start and end pages, and with possible choice in terms of file size.

Maybe it is possible to select an intermediate default file size, that can be increased. Maybe the program can have a memory for each item for the dowloaded PDF file size and offer as default the file size which the last user selected. By this was the text-only works would quickly have small file sizes by default, and the plate volumes large file sizes by default.

OCR is of slightly lower importance, users know that OCR quality is not reliable. Taxonomists rarely rely on OCR, they work with image-based or image-supported PDFs.


9. What other additional services would you like to use in the BHL-Europe Portal?
Consider what you can perform on other similar Web sites.

80 % Stable URL of books
50 % Stable URL of individual pages
30 % Include functionalities of other providers
20 % Translations


It is important that plate numbers are named in the digitized books! For instance, in other databases the page numbers are given but the plate numbers are omitted, so one has to spend a long time until one finds the plate of interest!


10. What is the feedback functionality you would like to use?

This question would have required some questions for calibration, questions to which the participants could have answered with "I agree/.../I do not agree", to know how many are actually aware of certain problems (speed, missing plate numbers). Only those participants could be expected to give useful answers.

50 % Bug reporting form
40 % General feedback form
35 % Scan quality report
30 % Metadata editing  (I doubt that all users understood the term "metadata", I assume many thought they should correct errors in original texts or OCRs)
20 % See progress of error fixing

--> bugs should be reported, metadata errors not fixed by users

We have the problem that soon millions of books without plate numbers will need numbers for plates. How can this be solved? Who is going to do the work, and who is going to pay for it?


11. How frequently would you visit the BHL-Europe Portal?

Every day



1-3 times per week



1-6 times per month



1-6 days in 6 months



less frequently



- average was once every 4 days
- 50 % of the participants would visit less than once in 1 week
- 25 % of the participants would visit less than once in 1 month
- "every day" was selected by less users than "once in 3 months"

--> web presentations must remain stable over long periods, most users would not consult BHL-Europe every day


12. Which way do you like to know novelties about the portal?

70 % website
50 % e-mail
40 % newsletter (1-6 times/week: 60 %)
15 % rss-feed (1-6 times/week: 7 %)


13. Any other suggestions?


Make the site faster!

In general it is very important to access the information in very vew steps and have very high computing power to povide query results very fast (maximum 5 seconds).

Efficient search functions:

Make it more comprehensive (I rarely find what I am looking for)!

Site should be simple:

Keep the web front page simple, not filled with "all the things we can offer at one sight".

Downloads of relevant books are sufficient. Don't play with complex issues!

One single website for literature:

I need just one global access point for all the literature, seraching across all repositories and all available catalogues.

Import references 'automatically' into webservices like 2Collab, CiteULike, EndNote Web, Connotea, and Mendeley


14. BHL-Europe Partner acronym?

Only 33 answers, and several participants did not know the acronym, some typed in nonsense (?: EDLF, BR, EVAFU). It was not possible to filter by partner acronym if one participant wrote CSIC and the next one csic. A scrollbox would have been needed - to be understood also by those scientists who do not know the administrative acronym for the institution as a BHL-Europe consortium member.

NBGB (7), CSIC (7), MNHN (6), RGBE (4), MfN (4), NMP (3), UH-Viikki (3), RBINS (3), UGOE (2), NAT (2), MNHN (1), MIZPAS (1), LANDOE (1)

Summary: main results of the survey

Generally: the main statements of my Leiden and Graz 2009 presentations were confirmed, particularly those on basic front page design requirements, efficiency, stable URL of books, speed and file size requirements.
The main differences and news were:

For the presentation of the web portal:

For the default search function of the front page:

For the presentation of the results on the front page:

For the presentation of the digitized works:

For page-level metadata:

Links to AnimalBase contributions for   Meeting Berlin May 2009    Meeting Leiden Aug 2009