Friday, 30 March 2012

Short-cited IV: bibliometric comparisons

As part of my on-going attempt to get to grips with citation data, I attended a short lunchtime session on bibliometrics at the library today. Unfortunately, I did not really learn much more than I had from exploring my own ResearcherID and Google Scholar pages, which I think reflects the reality of the murkiness of bibliometrics more than my own thoroughness.

I had hoped to learn a bit more about Scopus, as this is apparently the bibliometrics provider that will be used as part of REF but, due to its expense, we do not have access at Southampton. (This begs the question as to how useful Scopus really is - if the average person cannot access it, what's the point?)

The introduction reiterated some key points that are worth repeating. The first is that no one source is (currently) ideal and so it can be sensible to use more than one. Moreover, the biases in Web of Science and Google Scholar are very discipline-specific. The Computer Science REF panel, for example, apparently uses Google Scholar instead of Scopus, as this better reflects their important citations. (Conference proceedings are much more important in Computer Science than biology, which is actually a bit of a barrier for computational biology bringing together computer scientists and bioinformaticians.) 

Beyond the citation data sources, there are also differences in citation profiles for different disciplines, depending on how active a field is (crudely, more researchers = more citations) and how rapidly ideas are taken up and/or can be applied (Adler R et al (2008) Citation Statistics).

This is clearly true for both the age of papers cited (the citation curve, top) and the number of publications cited. There is an issue within an issue here, as exemplified by the bottom chart: how do you define a field? When does biological science become life science? This is a particular issue for me, as bioinformatics often falls down the cracks: neither computer science nor experimental biology are appropriate comparators.

The upshot of this is that you should only ever compare like-with-like, i.e. the same metric using the same dataset and the same subject area. With this in mind, it does not really matter too much what you use; although the absolute numbers differ, the relative rankings of authors has been shown to be fairly consistent between resources. (I need to dig out the source for this!) This is, of course, only true if it is the citation stats that you are after. Personally, I link to my citation pages predominantly as a way to look at the citations, so I want completeness. I'm still not convinced that bibliometric scores are really useful for any kind of comparison beyond a bit of ego massaging, which brings me onto the final point of the introduction that is worth repeating: don't use metrics alone to make decisions. Any decisions. Ever!

The session did not go into the different citation metrics themselves but did highlight a few resources for calculating different statistics, although only using Google Scholar (with all its pitfalls) as a source. In addition to Publish or Perish, which I mentioned before, they drew attention to QuadSearch, the Mozilla Firefox Google Scholar add-in, and Scholarometer. As with the fancy Microsoft Academic Research stats and comparisons, however, "garbage in, garbage out". In particular, I am not sure that any of these tools give you the option to edit your Google Scholar data by adding or subtracting citations. (The cited work included, yes, but not the citations.) Publish or Perish certainly doesn't seem to and this is the one that the presenter thought was the best.

My take-home message: if there is a genuine desire by governments, 
league tables and employers to make use of citation data, they need to invest in making robust and accurate tools for collecting these data and policing them. (If I was disingenuous, I could easily "claim" some other R Edwards publications in my ResearcherID, Google or Microsoft profile and I suspect that not many people would notice.) Until that time, I am not sure that there is much we can do apart from learn how to best present our own citations on the one hand, while resisting their use on the other. (Never use metrics alone to make decisions!)

Wednesday, 21 March 2012

Short-cited III: affirmative action

In a previous post about citation metrics, I criticised Web of Science for under-estimating citations due to (a) only including stuff that's indexed in ISI, and (b) mistakes in their database that cause citations to be lost.

Although I stand by those criticisms - and I am not the first to make them - it is worth pointing out the something can be done about the second one, at least. In another previous post, I mentioned a paper about Bobtail squid that had cited one of my papers on E. hux proteomics. This was not listed as a citation in ISI due to an error in the reference list for that paper. Fortunately, Thomson Reuters Technical Support allows you to report errors and, I can happily report in this case, they will fix them.

If your Google Scholar count for given paper is higher than for Web of Science, it might therefore be having a quick look at the difference and checking the reference list of those missing papers.

Monday, 19 March 2012

Another River Cottage veg success

Tonight, my lovely wife made another recipe from Hugh Fearnley-Whittingstall's Veg Every Day cookbook. Two recipes, in fact: Coconut Leeks and Carrot, Cashew and Orange Salad. Happily, the recipes for both are available on the Channel 4 website (just click the links) as the series accompanying the book (or is it vice vesra) is currently showing.

Both recipes are delicious and feel very wholesome as well as being very tasty. The coconut leek dish is hearty and, with rice, makes a perfect main dish for dinner. Carrot and cumin is an excellent combination and with the orange too, this is a very fresh and light salad that I am looking forward to having again. (Leftovers for lunch, yay!) With the weather brightening up, the start of BBQ season will be upon us before we know it and this would make an excellent side salad. Bring on the sun!

Sunday, 18 March 2012

The Drift Inn, New Forest

Today, we met my parents for lunch at the Drift Inn, a New Forest pub attached to the Beaulieu Hotel. We first visited this pub in January, when my brother was over from Ireland for a conference in the hotel and we met him for a pint and some dinner. The pub was cosy and the food was really tasty and we made a note to return. The atmosphere in the pub was really nice, with locals and their dogs as well as city folk like ourselves. As they say on their website:
Local ales, good food, cosy fires and friendly staff... Wellies, walking boots and dogs welcome.
I'd certainly agree with all of the above. The menu is a mixture of classics (burger, lasagne, fish & chips etc.) and some more interesting dishes (bubble and squeak burger). They also have daily specials, that always includes a pie and a steak of some sort. That first night, tempted as I always am by a good burger, I had game pie. It was delicious and bode very well for the other menu items. My starter - bubble and squeak topped with an egg and mushroom sauce - was also good.

A couple of weeks ago, we went back, this time for Saturday lunch. Tempted as I was again by the burger, I was in a bit of a fish mood and went for the beer-battered fish and chips. It was every bit as good as my memory of the pie and the chips were fantastic - crispy on the outside and fluffy in the middle; they must be at least twice-cooked. The mushy peas were a little disappointing, being mushed actual peas, rather than the usual processed kind. They were still nice, just not what I expected (or fancied).

Two successes out of two meant that it was natural choice for an (early) Mother's Day/Birthday/St Patrick's Day lunch with the folks. Today, I succumbed and went for the burger. (Although the Beef & Guinness pie was a close second choice.) Although not quite up to the standard of the Dandy Lion in Bradford-on-Avon, the burger was really good; juicy and meaty and full of flavour. Mmmmm. The chips were fantastic again and definitely get my seal of approval. The allure of the starters was too strong and my wife and I (unnecessarily) shared some smoked mackerel with celeriac remoulade. Good as it was, next time I think I'll save room for a chocolate brownie! (Although the potato skins my parents shared for a starter also looked tempting.)

Stuffed to the gills, we skipped dessert and went for a bit of a stroll after lunch before heading back home for the England-Ireland rugby match. A future pub walk beckons, I think.

Saturday, 17 March 2012

Happy Paddy's Day!

Or, Happy St Patrick's Day, if you are feeling more formal. Maybe even "St Pat's". But not Happy "St Patty's Day". Never "St Patty's Day"!! (No, I'm not Irish but having lived in Dublin for six years, I am pretty sure about this.) He's not a burger, he was a 5th Century Amazing Maurice (or, more accurately, Keith) of snakes. (You might need to be a Pratchett fan with a cursory knowledge of post-glaciation species distributions to get that one.)

That is almost my last word on the matter, other than to share this rather good and geeky St Patrick's Day YouTube song that is doing the rounds, combining two of my favourite things: beer and science. I like the fact that the singer looks like he's got in the spirit and had a jar or two before the performance. My only real criticism... it's not the black stuff in the glass.

SPOILER ALERT! Here are the lyrics:
In the year of our lord eighteen hundred and eleven
On March the seventeenth day
I will raise up a beer and I'll raise up a cheer
For Saccharomyces cerevisiae
Here's to brewers yeast, that humblest of all beast
Producing carbon gas reducing acetaldehyde
But my friends that isn't all -- it makes ethyl alcohol
That is what the yeast excretes and that's what we imbibe

Anaerobic respiration*
Also known as fermentation
NADH oxidation
Give me a beer


My intestinal wall absorbs that ethanol
And soon it passes through my blood-brain barrier
There's a girl in the next seat who I didn't think that sweet
But after a few drinks I want to marry her
I guess it's not surprising, my dopamine is rising
And my glutamate receptors are all shot
I'd surely be bemoaning all the extra serotonin
But my judgement is impaired and my confidence is not

Allosteric modulation
No Long Term Potentiation
Hastens my inebriation
Give me a beer


When ethanol is in me, some shows up in my kidneys
And inhibits vasopressin by degrees
A decrease in aquaporins hinders water re-absorption
And pretty soon I really have to pee
Well my liver breaks it down so my body can rebound
By my store of glycogen is soon depleted
And tomorrow when I'm sober I will also be hungover
Cause I flushed electrolytes that my nerves and muscles needed

Diuretic activation
Urination urination
Urination dehydration
Give me a beer


I also love the little disclaimer added by the author, cadamole, marked with the asterisk:
*Actually, this isn't true. While both anaerobic respiration and fermentation occur without the use of oxygen, anaerobic respiration utilizes the electron transport chain to generate ATP, while fermentation does not. My bad. I would have remembered that if I wasn't trying so hard to rhyme. A new corrected version is now up on my channel:
Dedication to accuracy such as this should really be rewarded, so go on... click on the link and give this guy some more views/likes. It's what St Patrick would have wanted.

Mmmm.... Madeira

Tonight's wine tasting was a Madeira wine tasting. Madeira's come in a variety of styles but the most famous (I think) and best are very raisiny fortified wines. They range in sweetness but the sweetest "Malmsey" style is the nicest, in my book.

We had some interesting wines (Madeira table wine = bad!), including a 40 year old Verdelho from Blandy's. This was pretty nice, I must admit, and at £133 a pop, this was not a wine I was likely to drink any other time (or again, for that matter).

The standout wine of the night for me, however, was a delicious 1994 Cossant Gordon Malmsey [Colhieta], paired with a delicious madeira cake - not the English sponge cake but, rather, the traditional honey cake. (Recipe here, where I also half-inched the picture, below.)

Madeira wine has a pretty interesting history. I'm certainly not going to repeat all of it here (that's what Wikipedia is for) but I like the fact that Madeira, like so many great discoveries, was an accidnet. The Madeirans were shipping their wine aborad for sale to far-flung destinations, such as India, but they failed to sell it all - if it's like the Rose table wine we had tonight, I'm not surprised. The unsold wine was shipped back to Madeira on the same ships but, by the time it got home, it had changed. For the better. Gone was the pale, bland (I wonder if this was the origin of "Blandy's") wine and in its place was something dark and raisiny and delicious. For a while, they deliberatley shipped it long distances to make the transformation before finally realising that it was the heat of passing through the tropics that was responsible for the transformation. Now, they just stick the barrels up near the winery roof for a while, where it can get the required heat without the cost.

A favoured wine of Napoleon and Churchill, among others, it's an interesting wine with an interesting history and well worth a glass or two - especially if you have some tasty Madeira honey cake to go with it!

Friday, 16 March 2012

Python ValueError: bad marshal data

I have been programming for many years but consider myself to be somewhat of an "empirical programmer", i.e. I am almost entirely self-taught. As a result, I sometimes come across new and exciting error messages that I have neither encountered nor understand. I have just had one such error:
ValueError: bad marshal data
This was associated with an import command for several modules.

I still don't know what bad marshal data is (it sounds like it should be something to do with Wild West movies) but, fortunately, I have found an easy fix: just delete all the compiled *.pyc files. Missing ones are remade when you run your python code anyway. Problem solved without any need to delve into the murky underworld of bad marshals.

Tuesday, 13 March 2012

The Grant Museum of Zoology

Today I was in London for a meeting at the Nuffield Foundation on Bedford Square. After the meeting, I had a bit of time to kill before the train home, so I paid a little visit to The Grant Museum of Zoology, which is part of University College London.

This place is great and well worth a visit if you are in the area and like Natural History. It's only open 1-5pm on weekdays but it's pretty small, so you don't need much time to look around. The essence of the museum is quite "old school" and it reminded me quite a lot of the Natural History Museum in Dublin a.k.a. "The Dead Zoo", with lots of skeletons and stuffed animals.

One of my favourite exhibits is the "jar of moles", which is quite literally a jar of moles! It was accompanied by jars of other animals, such as lizards. As the signs explained, these collections of the same animal were compiled for teaching purposes, not just because someone was addicted to collecting moles. The cabinet next to the jars of animals had a bunch of different brains in jars along with possibly my favourite sign in the museum. It was next to a brain coral and read quite simply: "Brain coral. This is not a brain."

As well as educating about the animals themselves, with lots of skeletons (such as this mole, to keep the theme going), the Grant is educational about museums themselves; there are lessons on taxadermy and critter display mixed in with the collections and some history about some of the collectors. (They have some of Thomas Huxley's specimens.) It's also an interesting mix of old and new - sitting in front of glass cabinets full of preserved specimens were a bunch of iPads as part of their "QRator" project, with interactive questions and challenges.

All in all, a good way to kill half an hour or so in the Euston area.

Monday, 12 March 2012

Short-cited II: Microsoft Academic Research

Since my last post, my Microsoft Academic Research page has been updated in light of my editing. It seems that my initial views of over-inflation were wrong, though. Whereas Google has my h-index at +2 versus Web of Science, Microsoft has me at -2. This seems to be, at least in part, due to a lack of recent publications. My 2012 papers are missing, for example, which makes me wonder how many other papers (and therefore citations) are missing.

The site does seem to have some nice features but, because the underlying data seems to be unreliable, I'm not convinced that they're (currently) that useful. The co-authors listing, for example, has one of my regular collaborators Norman Davey twice, presumably because he moved institutions. Unfortunately, the Citation Graph does not work on an iPad, so I'm not sure if this makes up for any of the mess.

Similarly, the organisation comparison function seems interesting but does it really mean anything if you cannot trust it? I guess that kind of sums up bibliometrics all round, though.

Monday, 5 March 2012

A short-cited look at bibliometrics with Web of Knowledge

A few things have got me thinking more about citations recently. First, the dreaded "Research Excellence Framework" is looming, for which each academic in the UK has to submit their four "best" papers. "Best", of course, is a rather subjective issue. Although journal Impact Factors and citation data are not meant to be taken into account for REF, the number of times that a paper has been cited is one indicator that can help determine how your papers are received. The second thing was that I was recently looking through a bunch of CVs as part of a recruitment process and the importance of maximising your perceived impact was clear.

I've had one eye on these issues for a while, so I have had both my ResearcherID and Google Scholar publication metrics linked from my website (and this blog) but never really thought too hard about either. My assumption was that the ResearcherID metrics, provided through Thomson Reuters Web of Knowledge, would be the better metric provider, as it's an "official" supplier of citations and is manually curated. Google Scholar, on the other hand, is more automated and has a tendency to over-inflate citations by including stuff that might be weeded out by more careful citation monitors.

Looking into things a bit more, though, the ResearcherID metrics do not appear to be as trustworthy as I had assumed. The problem with Web of Knowledge is that (a) they only include stuff that's indexed in ISI, and (b) for a manually curated citation index there seem to be a lot of mistakes that cause citations to be lost. As a result, although libraries seem to prefer Web of Knowledge, the citation metric calculator "Publish or Perish" uses Google Scholar. The publications of mine that I've looked at certainly back this up. One, for example, has ZERO citations on Web of Knowledge but Google Scholar lists three perfectly acceptable (in my mind) peer-reviewed citations.

It's not all positive for Google Scholar, though, as the criticisms levelled at it are also valid. Although Google are the kings of automated searches and returning relevant data, they are not flawless and I have noticed the odd duplication here and there. I have not checked yet (as the numbers I've checked are bigger) but I would not be surprised if there were also some citations missing; I know this has been an issue for some colleagues. The other problem is that, unlike ISI, there is little or no filtering of the types of citations returned. Going to the other end of the spectrum, and looking at my most highly cited paper, Google Scholar had added 15 citations, including the PDF manual of one my software packages. (There's a lesson in citation-inflation there, I think!)

This issue of over- and under-reporting of citations is not new and has been reported but I had never realised the extent of the under-reporting before. Furthermore, a couple of the other extras returned by Google are less cut-and-dried with respect to their "inflation" status. One was a doctoral thesis, which is a genuine peer-reviewed publication and I would consider a real citation. Indeed, including theses could be one of the biggest assets of Google Scholar, for it is normally hard (or impossible) to find out about theses that cite your work unless it is followed up by a paper. As a result, not only is the citation a useful discovery but potentially the thesis itself. Another two were foreign language (i.e. not English) publications, which again might well be perfectly valid. (Not speaking Polish, I cannot tell in this instance.)

Then, there is Microsoft Academic Research, although this seems to inflate things even more as it includes extra publications belonging to other people - at the moment, anyway. I've created a LiveID account and done a bit of cleaning up of my publication list, so it will be interesting to see what it says after that. (I now have a couple of publication missing but I am not sure which ones.)

So, which to use?! Currently, my feeling is that none of them are perfect. For me, ResearcherID is a definite underestimate but, at the same time, the extra 96(!) citations from Google Scholar are not all valid. It makes a difference - my h-index goes up by 2 with Google versus ISI - but it would be just as bad to be perceived as inflating my citation metrics as it would be to under-sell myself. The only real solution at the moment is to provide both metrics (and maybe Microsoft too, if that is different again) and keep an eye for one that allows editing of both publications and citations. (I don't think any of them currently have this function.) In the long run, though, I have a horrible feeling that I'll have to compile the genuine citations from the different sources myself. If nothing else, it will settle the question of which is best - for me, at least.

Saturday, 3 March 2012