Search Strategies #1
Searching for a needle in a haystack... but are you looking in the right haystack?
Sometimes searching for a reference can feel like searching for a needle in a haystack. For some searches you want more than one needle - and sometimes you want more than one needle in more than one haystack. And depending on your needs, you might want to look for sewing needles and knitting needles. You might even be looking for record needles!
Before you search, ask yourself: are you looking for the right needle in the right haystack?
There are many ways to find information now. The beauty of digital information management is that a reference can be indexed, tagged, and virtually stored in many locations. That doesn’t necessarily make it any easier to find information, though, as the sheer volume of material presents a challenge. This is why you should formulate a search strategy that takes into account what to search, as well as where to search. If you are a scientist conducting a search for scientific references, you may limit yourself to PubMed, SciFinder, or Google Scholar. But there are many options out there, and it helps to familiarize yourself with the tools at your disposal.
Search engines are not databases
There is a tendency to conflate search engines with databases. When you say “I searched PubMed”, you are using the PubMed search engine to search the MEDLINE database, in addition to a few other sources. When you say “I searched SciFinder”, you are using the SciFinder search platform to search the CAplus database.
And even if your database encompasses the same material - say, scientific journals - databases are updated with varying frequencies, and they have their own particular ways of indexing their contents. You might recognize that a search in PubMed will overlap with a search in SciFinder, as both MEDLINE and CAplus (Chemical Abstracts Plus) encompass references pertaining to biomedical literature. But the CAplus database will include references containing chemical substance related information, and MEDLINE will include more medical reference literature that may not be captured by CAplus.
Search engines and databases can vary quite widely. And you may not be searching as broadly as you think.
Be aware of your blind spots
You may think that a particular bit of information is new because you haven’t found it in your Google search. But without a thorough, comprehensive search, the information may already be out there… you just haven’t found it because you weren’t looking in the right place. Note that most of the search engines and databases I discuss are pertinent to the chemical and biological sciences - but not necessarily relevant to information pertaining to, say, sports science or baseball.
Using a search platform and database that is specifically tailored to a field of study can shut as many doors as it opens. In other words, developing a search strategy that is too narrow and limited to searching only one database may inadvertently omit wide swaths of information. This is where a knowledgeable information scientist can help you find appropriate databases and search platforms.
We can use MEDLINE and CAplus again as an example of blind spots. CAplus includes patent literature, whereas MEDLINE does not. I suspect that many biologists don’t think about patents because patents aren’t indexed in MEDLINE and are not searchable in PubMed. Chemists who have learned to use SciFinder as a search engine will likely be exposed to patent literature, simply because a search for an obscure chemical compound may lead to finding an equally obscure patent publication.
Be aware of what sources your search engine and database are using.
Indexing is subjective
Again, the beauty of digital information management and storage is that a reference can be indexed, tagged, and virtually stored in many locations. But that doesn’t mean everyone keeps their information stored in the same way.
If you visit someone’s house and you need a towel, where will you look? You might first look in the linen closet, if there is one, or in the bathroom. But what if you are looking for a dish towel? Would that be in the linen closet or in the kitchen? Are the beach towels stored in the linen closet or with the swim gear? What if you are looking for a cleaning cloth - would that be in the kitchen, or the mud room, or the garage, or…?
These are all towels, so they could be indexed as towels, but they may also be indexed by location, or function, or color, or texture, or size, or any other property. But indexing is somewhat subjective. Some households may not take note of the texture or material of their towels, whereas some households may painstakingly separate their Egyptian cotton bath sheets from their microfiber dusting cloths.
Using indexing can power up your search, but it can also work against you. Know how to use indexing effectively.
A few search platforms and databases
If you are a scientist, by the time you make it through your undergraduate years, you will most likely encounter the PubMed, PubChem, and Google Scholar search engines. They are handy and free. But chances are you are not using them to their full potential. And if you are affiliated with an institution - be it academic, industry, or governmental - you may have access to some advanced search platforms. Below I have listed a few search platforms and databases - look for them on your information center’s website.
CAplus is a database available from CAS (Chemical Abstracts Service). If you are looking for chemical substances, this is the spot. (I will spare you my experiences with the physical version of Chemical Abstracts… for now.)
SciFinder is a search platform from CAS that is a reasonably user-friendly way to access the CAplus database. Although biologists shy away from SciFinder, there are sequence searching capabilities (BLAST, anyone?), it searches patent literature, and the alert notifications all make it a resource that biologists should learn.
STN is a search platform from CAS. It is the gold standard for chemical substance searching, and any thorough and comprehensive search will include an STN search. You can peruse the STN database summary sheets, including the CAplus database summary sheet, to learn more about what’s possible with an STN search. It is not user friendly. :)
Reaxys is available from Elsevier. It is a descendent of the Beilstein CrossFire platform. Comparing Reaxys and SciFinder is a great way to study personal preferences in a search engine, as well as subtle differences between how you access the same material through different databases.
EMBASE is a biomedical research database from Elsevier. It includes journals and conferences not captured by MEDLINE, so that it captures more references, but it also indexes its contents in such a way to really empower a knowledgeable searcher to find more granular information. Emtree indexing is similar to MeSH indexing. A search in EMBASE will almost always yield more hits than a search in MEDLINE.
The Ovid search platform can access many databases, including EMBASE and Ovid MEDLINE. You can search across multiple databases simultaneously (assuming you subscribe to those databases). In certain databases, you can even search the full text of a document, not just the abstract.
Web of Science is a Clarivate product, so it also includes information sourced from Clarivate’s Derwent Innovation intellectual property platform.
Cortellis is a search platform from Clarivate that includes modules such as Competitive Intelligence (formerly the Internet Drug Database IDdb3), Drug Discovery Intelligence (formerly Integrity), Regulatory Intelligence, OFF-X and more. The Cortellis DDI dashboard is a fun tool for exploring research, and Cortellis CI is a good resource if you’re looking to dip your toes into patent information and analysis. You can also find Current Patents Gazette housed within the Cortellis CI platform. If you work in industry, Cortellis is a valuable tool for accessing a high level overview of research areas and patents.
One thing you will notice if you haven’t already… some of these platforms and databases are produced by the same companies (e.g., CAS, Elsevier, Clarivate) producing other information resources. Sometimes there is the potential for cross-platform integration.
Now what?
Everything in this post is information that your institution’s librarians or information scientists can share - but so many people assume that they know how to search. They don’t ask for help and they don’t learn about other available tools and resources. But information science is constantly growing and adapting, and it is well worth your time to learn more about what’s out there.
I hope to make this the first of a series of posts about search strategies. Future topics will include more about indexing, developing and refining a search strategy, ontologies, setting up effective literature alerts, knowledge graphs and citation maps, AI and natural language processing, and more. If you have any requests for specific topics, please ask.