Tuesday, 25 July 2017

Data protection in libraries

The Data Protection Act 1998 affects libraries and in 2011 CILIP issued Privacy Guidelines to help librarians ensure that they comply with the spirit and letter of that law. This helps them resolve "the tension between the freedom of access to information and the right of an individual to privacy concerning personal data".

As the guidelines state: "Information professionals collect data from their users in order to provide the most efficient and effective service. We must protect the personal data we collect and be very clear to our users why we collect the data, how we store it, how we process it and for how long we keep the data. Occasionally by law we may be required to give personal data to a third party; we need to ensure that this is justified."

Libraries hold information on their users' names, addresses, dates of birth, sex, how many books they have borrowed and what those books are, charges incurred, etc. Libraries need to take decisions, usually in the form of institutional policies, which lay out how long that data can be kept for and on what occasions it can be accessed or passed on to other organisations. Libraries should undertake data audits to ensure that they do not hold any more data than that which is necessary, and that they are complying with the law. Access to the information needs to be limited to only those staff who are authorised and need access to that data. Staff need to be trained to be aware of the issues involved. Data is a valuable resource and can be shared and re-used for other purposes, in medical research for instance, but this is problematic. There are government guidelines that cover data sharing. Personal data can be lost if memory sticks get lost. In such cases the users need to be informed. To avoid this data should not be put on memory sticks, or, if it is, the memory sticks should be encrypted. Decommissioned equipment should be wiped. Disaster plans should include a clause on how to handle personal information and Libraries should have policies on how to handle memory sticks found in the library.

Library users have a right to make 'Subject Access requests' to find out what data is being held about them, and, in exceptional circumstances police and national security staff can make requests for information relating to users. Library staff need to be trained in how to handle such requests. The law also covers the use of CCTV, photography and filming. There are other laws which also relate to these areas, such as the Human Rights act 1998, the Regulation of Investigatory Powers Act of 2006, the Terrorism Act of 2006, etc.

Fortunately the CILIP ethics committee can advise CILIP members and JISC also provides advice on these matters.

Bibliography CILIP User Privacy in Libraries: Guidelines for the Reflective practitioner

Wednesday, 19 July 2017

Text and Data Mining in Medicine

Image credit: Open Knowledge P3260857, https://flic.kr/p/e87SvU
For the first time today we had a visit from a user who wanted advice on how to undertake text and data mining using already created data. There has been a rise in the profile of Clinical Informatics as a discipline. It has been said that the patient is the most underused resource in medicine, and clinical informatics experts are now able to interact with not only the reports of research projects but the data from those projects, as well as data in other databases involving data from the users of health services and other databases.

The ethical and legal issues around this work are complex and important, for instance the necessity to ensure that the data is anonymised. At Cambridge University we now have an Office of Scholarly Communication to which researchers can be directed. This office  advises on  Open Access and the Data repositories, and Data Management . They train researchers and librarians to be aware of some of these issues and signpost them to the more specific services where appropriate. I was pleased to find a Text and Data Mining library guide, including some helpful videos, to which I could direct my reader. I also directed him to our Hospital's Research and Development team (R&D. I found him some useful books on our library catalogue Handbook, and some articles that dealt with data mining his own specialist healthcare area.

  In Cambridge researchers use Symplectic Elements to manage their research data. There are also journals such as 'Medical Informatics' or the 'Journal of Biomedical Informatics' that deal with this area. Before a library reader can undertake embarking on a career in Clinical Informatics they need to understand the basic concepts that all library users need to know, so I told him about the database training that we offer and explained that he would need this to understand, for instance, how to use MESH subject headings. We looked and discovered that there was a MESH subject heading for Data mining. https://www.ncbi.nlm.nih.gov/mesh/?term=data+mining and one for Medical informatics a MESH term for Medical Informatics https://www.ncbi.nlm.nih.gov/mesh/?term=medical+informatics. In Cambridge the The Cambridge Big Data Strategic Research Initiative brings together researchers from across the University to address challenges presented by our access to unprecedented volumes of data, and a Data Mining symposium was held recently to address these issues.

Books on the subject seem to be The Cambridge Text Mining Handbook  also Data mining : practical machine learning tools and techniques / Ian H. Witten, Eibe Frank [and] Mark A. Hall.

Monday, 17 July 2017

Flipster mobile journals app

I was interested to read an advert from Ebsco which explained their Flipster app. https://vimeo.com/220054241?utm_medium=email&utm_source=all_c_ww&utm_campaign=all_inside_flipster_20170714&utm_content=watch-now. A library can use the app to add a carousel to their website that advertises their most popular Ebsco ejournals and readers can read them easily using a mobile phone app.

Monday, 3 July 2017

Linked Data

There has been some mention of Linked Data in the library lists that I subscribe to. Here are some links about it:

Manu Sporny's "What is Linked Data" (see below)

OCLC's "Linked Data for Libraries"

OCLC Research: Hanging together : Data Designed for Discovery

See also my earlier post: Managing Digital data

Tuesday, 27 June 2017

EAHIL Conference in Dublin

EAHIL is the European Association for Health Information and Libraries and the EAHIL2017 Dublin Conference this year was on the theme of Diversity in Practice: Integrating, Inspiring and Innovative. My colleagues who attended the  conference in Dublin have published some excellent posts about their experiences. EJB87 has written an interesting account of the issues concerned in leading online training which results from a talk given by Thomas Allen from the World Health Organisation in Switzerland. This included watching the video “Conference Call in Real Life” .  Apparently the conference tool that most people use these days is WebEx.

The same colleague, who likes to remain anonymous, also attended a Continuing Education workshop on search strategy issues . on search strategy issues1. The talk discussed PRESS ( Peer Review Electronic Search Strategy) and AMSTAR, measurement tool to assess the methodological quality of systematic reviews A similar talk, entitled 'Workshop 4. How to teach search methods for evidence based practice: horses for courses or one size fits all' was given by of the Evidence Synthesis team at PenCLAHR, University of Exeter. S.

Several other sessions discussed the importance of gathering impact data to demonstrate the value of library services. The  NHS Knowledge for Healthcare Impact Toolkit   is a tool that is useful when considering how to do this. I learned the new term 'knowledge mobilisation'1. My colleague LibrarianErrant (who also likes to remain anonymous) has posted about this issue, amongst other things.


1. Brettle, Alison, and Rogers, Morewenna, PRESSing search strategies and AMSTARing systematic reviews: have a go session. http://eahil2017.net/wp-content/uploads/2017/03/CEC-Course-DescriptionsV6.pdf accessed 27/06/2017

2. Treadway, Victoria [et. al. ] The role of an embedded librarian as knowledge Mobiliser in Critical Care, in EAHIL2017 Diversity in Practice : Inspiring, innovative [...] Book of Abstracts, http://eahil2017.net/wp-content/uploads/2017/06/Abstracts-ICML-EAHIL-2017.pdf
accessed 27/06/2017

You can download the conference abstracts booklet here.

Libraries in Latvia

Yesterday I attended a talk by two Latvian Librarians, Gunta Vaserina and Inge Batare. Gunta gave us an introduction to life in Latvia. Latvia is a European Union country but the population is relatively small, at about 2 million. The language is Latvian. There are four main cities, Riga, JĊĞrmala, Daugavpils and Ventspils. She spoke a little about these and showed us photos. Daugavpils was started as a castle by a river. Ventspils had miles of straight white beach and complexes overhung with woods. There is a very beautiful National Park in the Riga area, which looked rather Alpine to my eyes. Gunta explained that there are national days when everyone celebrates with dancing and song, particularly the Summer Solstice and the Matins festivals.

Inge spoke about the many Universities, which include the University of Latvia, Rega Technical and Paul Stradinge, where the pair work. At their University there are 14 librarians. They showed slides of the workspaces in their library. Their staff provide 800 hours of information skills training a year and their library has over 200,000 electronic resources.

Wednesday, 21 June 2017

E learning for Healthcare

E-Learning for Healthcare  is a web resource that is used to provide  train NHS staff. It provides tools and resources and Athens Account holders can access hundreds of different online courses by registering on the site and enrolling for a course.
The eLearning for Life site
NHS elearning
The courses are listed here:


and they can be accessed here: http://www.e-lfh.org.uk/home/ 

There are hundreds of courses on a wide range of subjects, for instance budgeting, safe use of insulin, supporting carers, safeguarding children, pain management or leadership for clinicians.  A nurse I spoke to said that this was her preferred learning method since you could take your time and do the courses anywhere you could take your computer.

Her only concern was that her credits on that website would not syncronise on her DOT training log  on the Hospital intranet.

I note that our hospital, Addenbrookes, AKA Cambridge University Hospitals,have developed their own online learning portal http://elearning.cam-pgmc.ac.uk/

Wednesday, 14 June 2017

Managing Citations, Altmetrics and the Ultimate Research Tool

Librarian graphic from AskALibrarian
As you may remember (I see I last posted on this subject on 21st December 2016) there is a 23 Research Things MOOC available to Cambridge academics run by Cambridge Librarians. The nice thing about a MOOC is that you can do it at your own pace. So, finally, we reach the last three things. Thing 21 is Managing citations, a subject which which librarians delight in pontificating about.

When you present work it is important not only to have something valuable to say but to present it in an orderly and professional manner. Indeed, in life I have found out that one can get away with not saying anything very much at all as long as you [don't] say it in an orderly manner, whereas you can put forward the most original, relevant and well founded arguments, but if you don't present them on an appropriate stage and with panache and accuracy, nobody will pay much attention, and we will suffer the fate of Patrick Matthew, who postulated his theory of Natural Selection in the appendix to his 1831 book on Naval Timber and Arboriculture. Or the friend of a very brilliant friend of mine whose first attempt to publish in a very reputable journal was delayed when peers criticized his referencing style. Fortunately he got published there in the end though - and named me as a contributor.

Anyway, whatever and wherever you publish, it has to look good, and that involves managing your citations nicely. The best way to do this, of course, is not, as I remember my professor telling me in my youth, by using a system of filing cards, but rather by using bibliographical reference management software such  as Endnote, Mendeley, or, the one I use, Zotero. The 23 research things video concentrated on the latter, which is free, and they stress that you can use it not only for citing things, as I did, but also for storing pdf's and whole web pages so that all the resources for a research project can be stored safely online where moths and rust cannot decay.

The next video (22) tackled the subject of Altmetrics, (see my previous post on the subject) discussing how to use analytics in Twitter. I learned about TwitReach and the importance of tracking the impact of your research.

Then we came to the final video(23) on the Ultimate Research Resource which turned out to be.... the library. Libraries are said to be the poor man's university - nobody knows who said this first -, but in that case what is a University library???  So, next time you need help, who ya gonna call?

Wednesday, 10 May 2017

Moving over to a new Library Management System

We are currently implementing a new library management system to work with our recently installed cross-resource discovery system and I went to a talk about this on 10th January.


The purchase was approved after a long tendering and consultation process, not described here. After buying the product, the first implementation discussions were held at company executive board level, in June 2016. The company are based abroad.

The next meetings were between the University and the company Implementation team, based in Europe, who were made up of a range of specialists including data migration specialists and product specialists.

Communication channels will be set up including the use of a Base Camp Online Project management site, using web sessions, project management timelines including all tasks involved, who does what, when, key key deadlines, etc.

They discussed how multiple databases will go into 1, as well as the practicalities of the migration. The server for the new Library management system (LMS) database is in the Netherlands.

In Cambridge there is also a data preparation team who will clean up the data for export by running reports to identify errors. Some work on the records can be done within the old library management system using the Batch Cat functionality. This then gets re-loaded but the system can only handle so many index keys at once so the records have to be done in batches.

A test database is being loaded to run parallel to the final database. Records are uploaded to the test database for testing. The first data test load to be uploaded will be dummy records provided by the company. This will allow for testing of the functionality of the system.

The next records uploaded, which will be in March, will be the first version of the data. This data will have to be prepared for a few days prior to that, so all work on cleaning up the data will have to finish by that date.

Once the records are loaded at each stage, more data review work will be done and any further necessary changes made. Then the same records will be loaded again, to ensure that any errors identified have been eliminated.

The University is currently in the process of recruiting a trainer, who will then recruit a team of trainers. They will design a training process and material. Much training material is already available on the company’s Knowledge Centre. Library staff will be invited for training by these trainers, but they will be given material to watch and read beforehand as a pre-requisite to attending the courses.

Much of the training will be modular so there will be different training for each of the Acquisitions, Resource Manager, Systems administrator (the new name for cataloguing), Circulation , ‘Fulfillment’ (American spelling with two ‘l’s) (systems users, locations, groups and policies) and Analytics modules. Not everyone will attend all modules, only those deemed appropriate for them.

A University team will prepare to integrate the new cross-resource discovery system into the University’s existing processes for instance working on the University Library website, liaising with the University authentication software team, cloud printing functionality for the LMS outputs, and the team that administers Cuffs payments. The recently installed resource discovery system user interface is, a system that was designed to integrate seamlessly with Alma, but it is currently bolted on to the six databases in the previous LMS and needs to be prepared for the switch over so there will be more system configuration work to be done than that which would have been necessary had there been an implementation of both a discovery system and its underlying database at once.

In order to prepare for the migration of databases, detailed forms describing the databases have to be completed using Excel. Since in Excel copy and paste functionality would affect the formulas, each of the fields has to be filled in by hand and then proofread. The existing six databases include data from about seventy libraries, so filling in data about these libraries into the forms involves a large amount of weeks – one form took four weeks to complete. Once the forms are done they have to be checked and reworked in an iterative process.

The data from the library databases will be transferred online to the server based in mainland Europe using highly secure ‘tunnelling protocol’ through the program Port Tunnel

Under the new system each patron, who currently could have numerous records in many libraries should have a unique record that is valid across the whole system. This involves some patron de-duplification work.

All work on the old system will freeze on 17th July which will be deemed the fiscal rollover date for the year. All reports will be run by that day and the Voyager clients will no longer be available for cataloguing, acquisitions work or serials check-in. There will be no patron functionality on the OPAC so Patrons will not be able to manage their loans or to reserve or hold books. Circulation will still be available however through the offline circulation client of the new LMS.

During this week the final version of the data will be loaded onto the new database, and final healthchecks run. Towards the end of that week the final implementation of the discovery platform will take place, where it is linked to the new database, and then the product will go live.

The new database will contain statistics for how many time a book has been borrowed, or how many times a reader has borrowed books, but the transaction history will not be carried across.

The work of Reader Services Working Group

In order for so many libraries to share one system there will have to be some changes to processes. It was appreciated that many processes had been policy led, but nevertheless in future library managers would have to select their library systems policies from a restricted pallet of choices.

The group would discuss how the implementation would affect current processes and give recommendations as to how processes could be tweaked to meet the functionality. As the single Patron records would be visible and shared by all libraries policies would need to be drawn up as to who could edit which fields in the patron record, what sort of notes, if any, could be added to records, what text would appear on circulation notices sent out, etc.

It was already known that a single record expiry date would apply across the system, and that there was functionality for a block to be put on a record across the system if, for example, anyone’s debts amounted to over £!,000 or some very high rate.

Procedures would need to be drawn up to deal with Patrons with disabilities, for instance those using proxy patrons.

Practices could become more standardised. There might, for instance be less scope for flexibility and for staff to override the system in the case of individual circumstances than there is now.

Renewal allowances in the new system did not operate according to the number of times a record was renewed but upon how much time had passed since the date of borrowing. There is a self-issue client available for the current system but there was not one available for the new system unless one purchases a self-issue machine from the company . The user group were negotiating with the company as to whether some work around could be effected for users who currently use this client with a self-issue machine provided by a different company, or sometimes just with a scanner attached to a computer on a desk.

Policies were being discussed with a catalogue advisory group.

Work on the Data migration.

8263662 records were checked and 2771178 of these were amended.

The batch cat program in the existing LMS had to be run to make some changes in these. Before this was done a program was run to identify records with problematic Library of Congress number fields. Apparently some cataloguers mistake the space for an error and delete it. Potential problematic records had to be checked in person. Once these records were corrected, all the records were loaded were run through the batch cat program to make big bolt changes across the system, where 1.5 million records were changed.

Once the changes were made an email was sent to each library involved listing the changes that had happened to their record and giving them access to a web program which displayed a simple online display of the changed records. Paul suggested that we might do randomised spot checks on these records but also, using our knowledge of the collections we worked with, we could select specific items to test if we identified that there might be duplication problems with those records. He said you could spot records that might be problematic because if you sorted the record by their Connect number you could see instances where multiple items had been deduplicated to the same new bibliographical record.

Deduplification usually resulted from a cataloguing error. In some instances records had to be changed manually, usually by editing the 245 field, and some libraries were sent long lists of such problem records and asked to edit the records, and again a web display was available for them to view if they wanted to check on how the record was displaying.

To do list: 
At the end of the meeting a summary was given of the preparations that libraries would have to make, particularly:
  • They must allocate staff time for training, including preparing for meetings beforehand by watching prerequisite videos. 
  • They must watch out for the data preparation emails and undertake the suggested data checks. They must warn users in good time of the coming changes including the systems freeze. 
  • They must ensure that books do not fall due during the week of the systems freeze. 
  • They must prepare for staff deployment on other activities or encourage staff to take leave during the week of the systems freeze.   

We asked who could run reports on the new system? It must be explained that at present there were a limited number of tailor-made reports available, which are not part of the purchased system but were written in Pico as a user work-around by a canny librarian on the University Library staff with a user interface on a University Library web page available to restricted staff through University authentication. These reports were originally devised as a work around because librarians could not run reports on their own stock or their own patrons through the existing system, as many of us were used to doing in other systems such as Unicorn.

Unfortunately it sounds as if most librarians will still not be able to create their own reports but will instead have access to another set of tailor-made reports designed as and when the implementation team have time, after the implementation.

This means that if a librarian wants a tailor- made report they will not be able to design one for themselves but will have to be dependent on asking another busy systems librarian for help, which the systems librarian might or might not consider to be a very high priority.

This means in effect that cataloguers or library managers who want to identify rogue records or to do work on a particular section of the stock, or to get a shelf-list, or generally to get an overview of the quality of records in their library, will not be able to do so for some time, which is disappointing. Library widget?

Another question asked whether the library widget would still work. We were told that it would not. (presumably a new one might be created if deemed appropriate).

When we returned to the library There was some discussion about Self issue machines – apparently all self-issue machines are SIP2 compliant but in the opinion of one member of staff, the practice in some libraries of just using a computer with a scanner on a desk caused lots of problems because users did not seem capable of putting a book in the correct place for a desensitiser to work. The member of staff said she spent all day attending to alarms that went off when the readers tried to leave with their still-sensitised books.

The type of self-issue machines in public libraries, where you just place your book on a plate, are much less idiot-proof, but required RFID tags to be fitted to all the books in the library.

Wednesday, 3 May 2017


A reader has requested an article cited in Researchgate, which prompted me to read this interesting article on the subject on Wikipedia.

Chain indexing

This article on Chain indexing, a concept invented by Dr S. R. Ranganathan, made me think. Since both classmarks and subject headings are related to subject, it is silly to duplicate the work and ideally it would be good if libraries could find a way to do both jobs at once.

Data privacy

Interesting article on libraries and data privacy here.

Tuesday, 18 April 2017


MarcEdit is free software developed primarily for the library cataloging community to lower technical barriers and empower librarians to take more control over their organization’s metadata. Created in 1999 by Terry Reese, creator of  as a global editing utility for MARC data, MarcEdit has since evolved into a full suite of tools to create, harvest, and manipulate a variety of non-MARC metadata as well.

Terry runs courses in how and why to use this software, and his courses have been advertised lately on library lists, which is how I got to hear about them. These courses are useful for manipulating information in databases at institutional level.

Friday, 31 March 2017

Making an infographic

Venngage is a program that you can try for free and you can create nice infographics this way.
A poster made with venngage

You can see the full version of this infographic online. If you want to download your creation you can upgrade which costs $19 per month for a personal account or $49 per month for a business account.

A similar program is piktochart.

Wednesday, 15 March 2017