Finding and managing literature on EA topics

Hi! I'm Kat Steiner - a few of you may have met me at EA Global London recently. I'm a librarian at the University of Oxford, so I spend a lot of time working with people on how to find literature (books, journal articles, reports) in their chosen area, how to organise it, and how to reference it correctly. After the conference, I realised that these are useful skills for the EA community to have as well, and I'm more than willing to teach them! I usually take several hours in a practical class doing this so this is my attempt to distil just the fundamentals into a relatively readable blog post.  

Disclaimer: I am not an academic, and EA covers a broad range of topics. I'm sure many of you will have favourite sources which I will inevitably fail to mention. None of this is going to be a comprehensive list! So please share any databases and websites that you couldn't do without in the comments for everyone to see, and I hope that some of the sources I do include are new to you and worth checking out.

Second disclaimer: I am based in Oxford, so I am more familiar with the databases that Oxford subscribes to. If you are affiliated with another university, it is worth seeing if your library has guides (often called LibGuides) on the databases you have access to.


The TL;DR:

1) What are you going to search for?

This involves breaking your question into concepts, thinking about the relevant importance of each one, coming up with synonyms, related concepts, broader and narrower terms, alternate spellings, that sort of thing. It feels like unnecessary work but it will save you time (and missing important content) in the long run.

2) Where are you going to search for it?

Think about how much time you have - it's usually worth using at least 2 subject-specific databases like ArXiv, PubMed, Web of Science, Scopus. 

You can also leverage Google's search algorithm to search within domains like .gov or .gov.uk, for useful PDF reports.

3) How are you going to search for it?

How does each website work? Is there an advanced search you can use for the words you came up with in 1? Can you filter by useful things like date or language? Do articles come with keywords, tags, or a thesaurus of useful terms?

4) Managing your PDFs, citations and referencing

Reference management software can save you a lot of time if you're writing long-form academic work or for publication. It will do all the pesky formatting of your references to a particular style, keep track of what you cite in your work as you go along, and even help you manage your PDFs. But it won't read the stuff for you, and you will have to do some data cleanup.

5) How libraries and librarians can help you!

Even if you're not part of a university, a local academic library may still be of use. You can often pay a small fee to be able to go there and use their electronic subscriptions (for non-commercial use), which means better databases and access to a lot more journal articles.

Librarians can also help, either in person, or by writing LibGuides - try searching Google for some of those on your area of interest.

Ok, let's get down to the details.

6) An attempt at a not-at-all comprehensive list of sources of literature

What it says on the tin.


What are you going to search for?

We're going to take a concrete example. You have a vague idea what you want to know about, which is 'the effectiveness of deworming interventions in Africa'. You might go to PubMed Central to search for that as it's medicine-related, but if you just put it in the simple search you get 459 results, way too many to read! And they might not even contain all of the relevant literature - what if the article was about Kenya but didn't mention Africa?

It's important to think about your terms - what synonyms might there be? What broader or narrower terms could you use? Do you get too many results or too few? How important are your concepts - do you really want articles that mention deworming in the full text but not the abstract, or the title? Are there alternative spellings for some of the concepts? Globalisation vs. globalization, labour vs labor are two common ones. Also acronyms are important - consider spelling them out as well as using the abbreviation (QALY, DALY, RCT).

Mind-mapping is great for this sort of work. For the example above, I came up with:

I've split the question into 3 concepts and tried to think of related terms for each one. Then I've grouped them together and thought about how I would logically search for them with OR and AND (these are called Boolean operators). Some databases also allow NOT, but you have to be careful in using it because you can lose relevant papers just because they mention an irrelevant word in passing.


Where are you going to search for it?

We're all pretty used to searching Google and perhaps Google Scholar - you plug in a few words and you get millions of results. You scroll down the first half a page, click a few links, and you're away. Fantastic. But what about all the stuff you're missing because it's on the 5th page, or the 50th?

Google will always try to give you as many results as it can, in what it considers the most 'relevant' order. But when you're searching for dry academic literature, this can work against you, and you run the risk of not finding important things, as well as wasting time reading poorly-researched stuff.

Google Scholar is a little different - it's trying to find scholarly articles and citations which match your search. But it will still include a lot of stuff that's completely unrelated to your topic if you're not careful, and it doesn't vet its contents, not does it contain everything ever written. And if you want to search really specifically, its attempts to be clever can work against you.

Subject-specific databases are very different from Google in that they don't do as much of the searching work for you. They won't search for synonyms or related terms, so you need to think about those first - luckily we did that in the previous section. But they are more likely to give you actually relevant results instead of a lot of noise. They are also curated by humans (or in the case of Semantic Scholar, machine-learning techniques), which means that someone has tried to work out what an article is all about, even if it doesn't give many clues in the title. Some databases tag articles with subjects or keywords drawn from a controlled vocabulary to help you, and some like Web of Science will tell you where an article has been cited later (although never comprehensively).

See the end of this article for a list of some databases you might want to try searching in, as well as good sources of data and reports.


How are you going to search for it?

Now we need to think of a way to search for these concepts. It's generally best to look for an advanced search option. That will tell you what search techniques are available - sometimes you can narrow down by date, language, and so on. Often they look a lot like this:

So I might build my search up in PubMed with: 

deworm* OR de-worm* OR "intestinal worm*" or "soil-transmitted helminth*"

I'm using * to say that I don't care what comes after the start of the word - this picks up things like deworming or worms

I'm also using the " marks to say I want to find a whole phrase and not the separate words.

And I'm using OR in capitals to say I want to match at least one of these terms because they're synonyms (sort of).

I'd probably also choose to search for these in the title, because I do actually want my article to be about deworming. I do that with the drop-down menu.

Then on the next line, I add my next concept, Africa. I don't want to lose things that don't mention Africa in the title or abstract if they mention other countries, so I'll search all fields for 

Africa OR African OR Africans

I could use the * again but PubMed gives me an error - it found too many options for words starting Africa, so I pick the ones I care the most about.

I type this into the second line down so the concepts are linked by AND, because I want to find something to do with deworming AND something to do with Africa.

Then I move on to my third concept. I might search in the abstract for

effective OR effectiveness OR "cost-effective*" OR "cost effective*" OR "cost-benefi*"

Having done that search I find 25 results. Much more manageable! It's worth doing a few more searches with different combinations of title, abstract, full text, just to see if there are some I'm missing, but that's a really good start.

Obviously, PubMed isn't going to have all the articles ever on deworming, so you might want to try a different database like Web of Science (if you have it through your university) which covers more of the social sciences as well - doing that quickly gave me 50 results so you do get different things.

Other tips and tricks

Some databases have a thesaurus and a set of keywords for each article - this can be manually done by experts or via machine learning like Semantic Scholar. These are great if you've missed an important synonym or bit of jargon from the field. You're unlikely to find a perfect thesaurus term for each of your concepts, but you can use a combination of your own terms and those from a thesaurus to good effect.

Web of Science does lots of fancy work on citations - you can see who has cited a paper later. Google Scholar also does this for free so it's useful to check that out if you think you've found a seminal paper. To trace citations backwards, look at the list of references as you're reading and try and find them.

Being clever when searching Google

Even Google has an advanced search option! After you've run a search it's under 'Settings'. You can use quote marks " to search for particular phrases or filter by region, language, date.

You can also use it to search a particular domain - say you want to know what the UK government has written about deworming - you can put .gov.uk in the 'site or domain' box (only one domain at a time, sadly). You can dictate that your search terms appear in the title of the page, or the url, not just somewhere in the text. You can also restrict the file type, so if you're looking for reports, PDFs would be a good bet.

Here is a video of me demoing some of these techniques:



Reference management

Here is a teaser of what reference management software can help you achieve:

If you are thinking of writing content for publication, or really any long-form academic work, you should be thinking about how you're going to keep track of and reference what you read. It's easy to lose track of where ideas came from and you don't want to be accused of plagiarism down the line, or waste lots of time having to search for things you read months ago all over again.

Reference management software does some of this work for. The two main free options are Mendeley and Zotero, and the two main paid ones are Endnote and Refworks. If you use LateX to typeset then you may be familiar with BibTeX - you can also use the other types of software even if you are using LaTeX.

Most of the choice between them is personal preference and whether or not you want to pay (or you have institutional access). They all do broadly the same thing, so I will keep things fairly generic, but with a Mendeley flavour, as that's what I know most about.

You can use browser plugins like the Mendeley Web Importer to add details of what you're reading in your browser (webpages, online news sites, PDFs of journal articles) automatically to your library. 

You can use your software as a way to organise your PDFs, either by saving them all to one folder and getting the software to add details of anything in there into your library, or by having the software automatically rename your PDFs using a particular schema.

You can use plugins for Word, Pages, etc. to help you correctly reference - find the item you want to cite in your library and it will automatically insert the reference in your text and in the Reference List at the end. If you need to correct anything about the citation (page numbers, year, authors), you can refresh your document and it will automatically make the corrections!

Some versions of this software will allow you to share libraries and documents (with limited cloud storage capacity usually) with other people so this can be helpful when you are writing collaboratively.

Of particular relevance to EA: you may be working on something really interdisciplinary and want to submit it to several different journals. These journals will all have their own ways of referencing, and these vary widely between disciplines. For example, philosophy uses footnotes and endnotes, while social sciences mostly use in-line citations - this is where you would say something like "Librarianship is an interesting degree to study (Jenkins, 2014)," and then both have a reference list at the end. And the references will have slightly different formatting - italics, where to put the full stops, how many authors to include if there are loads. You don't want to be doing all of that by hand if each journal has different requirements - instead, your reference management software will do it all with one click just by selecting a different citation style.

Cautionary note: This software isn't going to solve all your problems. You do still have to do a fair amount of manual cleanup on your library of citations - if you put rubbish in, you'll get rubbish out in your reference list. Sometimes browser plugins can't pick stuff up from a secured PDF and you have to type the details of the journal article in yourself. But overall, I think anyone planning to write anything for publication should definitely consider getting to grips with Zotero or Mendeley - it will save you time in the long-run!


Where can you go for more help

If you're part of a university, your librarian! I do one-to-one tutorials on these things all the time, and big classes for all our new graduate students. Your library almost certainly does too.

The internet! There are loads of resources written by librarians called LibGuides which are available free to read. Oxford's one on reference management software is here http://libguides.bodleian.ox.ac.uk/reference-management These are also fantastic sources of places to search - here are some on EA-related areas:

If you're not affiliated with a university, but you live in a big university town, you should check what membership options there are for independent researchers. For example, in Oxford, it's pretty cheap to get a reader card for the Bodleian Libraries, and then you can go in and access all of their online resources (except a few legal databases). More information is here: http://www.bodleian.ox.ac.uk/using/getting-a-readers-card

This is something for non-profits to consider as well: it may be worth factoring in a regular bit of money and time for someone to go and sit in a library and run some literature searches - you'll often get better results with a subject-specific database and you can download PDFs of the articles to read later using the library's paid subscriptions. (Be aware that this usually isn't allowed if you're a commercial business, but otherwise it's fine as long as you're not obviously trying to download the entirety of JSTOR in one go).

The Bodleian even offers a scanning service where if they only have a print copy of an older item, you can have a scan of a chapter or article within 24 hours for £2 if you have a reader card. So you wouldn't even need to send someone to scan it themselves.


An attempt at a not-at-all comprehensive list of sources of literature

[Some of these are completely free, some have some free and some paid-for content, and some are subscription-only. See above for my suggestions on what to do if you don't have a subscription. Some will provide citations but not necessarily full text - these are known as 'bibliographic databases'. The ones in bold are those that I think are the largest - if you are tight on time I would pick a couple of these from your subject area and search them.]

arXiv.org (free) - database of pre-prints (versions of articles before a publisher formatted them) from science, mathematics, computer science, economics, and engineering disciplines

SSRN (free) - the Social Science Research Network - like arXiv but for the social sciences more broadly

PubMed Central (free) - a massive repository of medical science literature

Semantic Scholar (free) - a search engine which allows you to search across various free repositories including arXiv.org and PubMed Central. It uses machine learning to classify papers, giving it some of the advantages of a subject-specific database, although without an advanced search option

RePEc (free) - Research Papers in Economics - a volunteer-run repository for economics pre-prints and papers

NBER (free) - National Bureau of Economic Research - they produce lots of US reports on economics

PhilPapers (free) - a bibliographic database of philosophy papers (not necessarily the full text)

The Existential Risk Research Assessment (free) - a new (and incomplete) bibliography of papers on existential risk, being put together by the Centre for the Study of Existential Risk and crowd-sourced by people like you!

UK Data Service (free) - access to major UK government-sponsored surveys and economic data

World Bank Data Catalog (free) - access to the World Bank's global development data

ICPSR (free) - a huge data archive of social science research data

EThOS (free) - the best resource for UK dissertations and theses. Not all will be available online.

ORA (free) - Oxford's own repository of pre-print papers - many will not be available until after an embargo period of 6 months - 2 years
Many other institutions have their own repositories - if you are looking for a particular paper by an academic, you can try looking there for a copy

JSTOR (subscription) - a huge collection of digitised journal articles covering all subjects

Scopus (subscription) - a major interdisciplinary bibliographic database

Web of Science (subscription) - a major interdisciplinary bibliographic database, including collections like the Social Sciences Citation Index.

Philosopher's Index (subscription) - one of the biggest bibliographic databases of philosophy

EconLit (subscription) - indexes over 120 years of economics literature from around the world

OECD iLibrary (subscription) - the online library of the Organisation for Economic Cooperation and Development including data, reports, articles and books

ACM Digital Library (subscription) - journal articles and conference proceedings from the Association for Computing Machinery

MathSciNet (subscription) - a bibliographic database for the mathematical sciences

PsycINFO (subscription) - a large bibliographic database for psychology

Comment author: RyanCarey 14 November 2017 01:41:21AM 4 points [-]

Thanks for writing this. I agree that reference management is really useful for paper-writing, and I have come across a bunch of these resources repeatedly. I get the impression people vary a bunch in how much they use subject-specific databases and the structured queries. I usually get by pretty well with Google Scholar. I don't encounter too much noise with the machine learning and biology work that I tend to read, although I can imagine they would be super useful if I was publishing a literature review.

The video at the start is a cool blog post structure. I wonder if anyone else will try it...

Comment author: kastrel  (EA Profile) 14 November 2017 10:47:53AM 3 points [-]

Thanks! I really didn't want it to be boring and dry, and I'm not on here a lot so I though having a face to put to the blog would help.

How thorough you need to be absolutely depends on what you're working on - obviously if you're writing a literature review for publication you need to do a bit more due diligence than if you're just looking for the next thing to read. I would recommend Semantic Scholar as a more finely-tuned alternative to Google Scholar while still having a lot of free content.

Comment author: casebash 15 November 2017 03:24:41AM 0 points [-]

"I would recommend Semantic Scholar as a more finely-tuned alternative to Google Scholar while still having a lot of free content" - any specific ways in which it works better?

Comment author: kastrel  (EA Profile) 16 November 2017 09:49:34AM 0 points [-]

I haven't used it in anger yet, but I think Semantic Scholar only searches databases that give you free access to the PDFs - so if you want to know you'll actually be able to click through and read the article, that's an advantage over Google Scholar, which will bring citations which are paywalled or unavailable online as results.

I believe also only searches (fairly) respectable databases like ArXiv and PubMed Central, so you are less likely to get poor-quality results.

Comment author: Pablo_Stafforini 13 November 2017 07:33:48PM 4 points [-]

Thank you for writing this! The images under 'What are you going to search for?' are not loading.

Comment author: kastrel  (EA Profile) 14 November 2017 10:45:41AM 1 point [-]

Thanks for flagging this up - I think I've fixed that now.

Comment author: joshjacobson  (EA Profile) 13 November 2017 09:02:32PM *  1 point [-]

This looks great! Looking forward to doing a more detailed read when I have more time, but I already see some resources and techniques I wasn't aware of or have failed to fully implement thus far, so this will serve as added motivation and a nice reference.

Sci-Hub is another resource that is likely to be highly useful to those without institutional access to journal subscriptions. And I find that the archive of Data Is Plural is a great source for data on a wide variety of topics: http://bit.ly/2h3bNzQ

Comment author: kastrel  (EA Profile) 14 November 2017 10:49:39AM 2 points [-]

Thanks! I can't recommend Sci-Hub or I might have my librarianship license revoked! But that archive looks really interesting.