GPT “PubMed Searcher” for exhaustive PubMed searches using ChatGPT and API.
ChatGPT is equipped with a web access function using Bing, but it is better to prompt the user to use a specific article search engine when searching for articles on ChatGPT. However, ChatGPT does not load the PubMed site well, and the following problems are likely to occur.
Halcination occurs when PubMed search results are not loaded (false results are displayed).This happens because of the following
The abstract page of an article does not load properly.
Only one page of the article search results can be read.
(1) Dynamic content
The PubMed website often dynamically generates content and uses JavaScript to display information. This makes it difficult for web scraping tools to accurately retrieve information.
(2) Complexity of page structure
The structure of PubMed web pages is complex, involving many links and database queries. This can make it difficult to extract specific information accurately.
(3) Limitations and defensesPubMed has restrictions and safeguards to prevent excessive access by automated tools, which may limit the retrieval of information.
Therefore, you can avoid such problems by retrieving information via the API instead of accessing the PubMed site. In the following, I introduce a GPT “PubMed Searcher” that retrieves PubMed search results via API.
1. How GPT "PubMed Searcher" works
Paid users of ChatGPT (Plus members: $20/month subscription) can create their own personal bot, GPT. You can create a PubMed Searcher with this GPT (Since an individual API key is required, this service is not open to others and can only be used by the user). In addition, PubMed releases its API free of charge (strictly speaking, you can specify PubMed as a parameter for selecting a database for Entrez's API provided by NCBI). API is an abbreviation for Application Programming Interface, which is an interface for software to communicate with each other. API stands for Application Programming Interface, and is an interface for software to communicate with each other, and is a so-called “key for exchanging information”. The API allows ChatGPT to access the PubMed database directly and pull article information.
The PubMed API requires an account with NCBI and your own API key. In other words, even if you create this GPT, you must not allow others to use it, and it is for your own use only. For this reason, the PubMed Searcher you have created is not available in the GPT store, and you must create it yourself. However, it is not so difficult to create a PubMed Searcher by following the procedure below, and you can do it from your smart phone.
2. Procedure for creating GPT "PubMed Searcher"
(1) Create an ncbi account
First, access the following URL to create an ncbi account.
https://www.ncbi.nlm.nih.gov/account/(2) Obtain an API key
After obtaining an account, access the following account setup page and obtain an API key from the API Key Management section at the bottom of the page. This API key is for your own use only and should not be disclosed to others.
https://www.ncbi.nlm.nih.gov/account/settings(3) Creating a GPT
Access the web version of ChatGPT, click “Explore GPTs” on the left bar, and then click "+ Create" on the upper right to open the GPT builder. Select "Configure" in the middle of the top bar, and set as follows:
- Icon image: You can set any image you like, or ask DALL-E3 to generate an image for you.
- Name: Set any name you like (e.g., PubMed Searcher).
This GPT will assist the user in retrieving literature information from PubMed. It should be able to search PubMed using a specific query provided by the user and return relevant article information GPT should leverage NCBI account details and API keys to access PubMed data.- Conversation starters: No special setting required.
Keep in mind throughout
- Follow the chain-of-thought process to carefully execute the task step-by-step.
- Before outputting your response in Task Enforcement, make sure that you have properly followed the GPT's prompt before responding, and if not, revise your response accordingly.
Clinical Questionへの対応:
```yaml
clinical_question:
steps:
- Extract as many all relevant terms (synonym or quasi-synonym) as possible from the Clinical Question provided by the user according to the following rules.
- Terms to be included in the search formula should not be in Japanese.
- When the pharmacological or chemical class name of a drug is specified, include the pharmacological, chemical, or generic name of the drug.
- Create your search query according to the following rules.
- Enclose terms in double quotation marks.
- Use AND, OR operators for tech theory.
- Ensure the search query is highly sensitive but not highly specific.
- Ask the user for confirmation before performing the search.
search_results:
steps:
- Indicate the number of results found.
- If ESearch results exceed 3, do not display article details in the response.
- Use ESummary to compile all results into an Excel file.
- Include all articles in the Excel file without setting a limit on the number of articles.
excel_file:
columns:
- PMID
- URL: Create a hyperlink to the article's PubMed page in the format https://pubmed.ncbi.nlm.nih.gov/PMID/
- Title
upload:
steps:
- Verify that the number of articles in the file matches the number of search results.
- Ensure each article includes all required information.
- Redo the process if any information is missing.
efetch:
usage: Use EFetch only when you need information on abstracts, authors or doi.
elink:
usage: Use ELink when you need information of similar papers, referenced papers or cited referenced papers.
- Knowledge: No special settings required.
- Capabilities: Check "Web Reference" and "Code Interpreter".
- Actions: Click "Create new action" and copy and paste the following into the "Schema" field
openapi: 3.0.0- In the copied and pasted schema, there are three places where "Your NCBI API Key." is written (shown in red in the box above), so replace those places with the API key you obtained.
info:
title: PubMed API
description: API to search and retrieve literature from PubMed using esearch, esummary, and efetch endpoints.
version: 1.0.0
servers:
- url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils
description: NCBI E-utilities server
paths:
/esearch.fcgi:
get:
operationId: searchLiterature
summary: Searches for literature in PubMed.
parameters:
- name: db
in: query
required: true
schema:
type: string
description: The database to search (e.g., pubmed).
- name: term
in: query
required: true
schema:
type: string
description: The search term(s).
- name: retmax
in: query
required: false
schema:
type: integer
description: The maximum number of results to return.
- name: retmode
in: query
required: false
schema:
type: string
description: The return mode (e.g., xml, json).
- name: api_key
in: query
required: true
schema:
type: string
description: Your NCBI API key.
responses:
'200':
description: Search results
content:
application/json:
schema:
type: object
properties:
count:
type: integer
retmax:
type: integer
retstart:
type: integer
ids:
type: array
items:
type: string
/esummary.fcgi:
get:
operationId: getSummary
summary: Retrieves the summary of literature based on search results.
parameters:
- name: db
in: query
required: true
schema:
type: string
description: The database to search (e.g., pubmed).
- name: id
in: query
required: true
schema:
type: string
description: A comma-separated list of UIDs of the articles.
- name: retmode
in: query
required: false
schema:
type: string
description: The return mode (e.g., xml, json).
- name: api_key
in: query
required: true
schema:
type: string
description: Your NCBI API key.
responses:
'200':
description: Summary details of the search results
content:
application/json:
schema:
type: object
properties:
uid:
type: string
title:
type: string
source:
type: string
pubdate:
type: string
authors:
type: array
items:
type: string
volume:
type: string
issue:
type: string
pages:
type: string
doi:
type: string
/efetch.fcgi:
get:
operationId: fetchDetails
summary: Fetches the details including abstracts for specified UIDs.
parameters:
- name: db
in: query
required: true
schema:
type: string
description: The database to search (e.g., pubmed).
- name: id
in: query
required: true
schema:
type: string
description: A comma-separated list of UIDs of the articles.
- name: rettype
in: query
required: false
schema:
type: string
description: The return type (e.g., abstract).
- name: retmode
in: query
required: false
schema:
type: string
description: The return mode (e.g., xml, text).
- name: api_key
in: query
required: true
schema:
type: string
description: Your NCBI API key.
responses:
'200':
description: Detailed information including abstracts
content:
text/plain:
schema:
type: string
/elink.fcgi:
get:
operationId: fetchSimilarArticles
summary: Fetches similar articles for a specified UID.
parameters:
- name: dbfrom
in: query
required: true
schema:
type: string
description: The originating database (e.g., pubmed).
- name: db
in: query
required: true
schema:
type: string
description: The database to link to (e.g., pubmed).
- name: id
in: query
required: true
schema:
type: string
description: A comma-separated list of UIDs of the articles.
- name: cmd
in: query
required: true
schema:
type: string
description: Command to run (e.g., neighbor).
- name: api_key
in: query
required: true
schema:
type: string
description: Your NCBI API Key.
responses:
'200':
description: Similar articles information
content:
text/plain:
schema:
type: string
/elink.fcgi:
get:
operationId: fetchReferences
summary: Fetches references for a specified UID.
parameters:
- name: dbfrom
in: query
required: true
schema:
type: string
description: The originating database (e.g., pubmed).
- name: linkname
in: query
required: true
schema:
type: string
description: The name of the link (e.g., pubmed_pubmed_refs).
- name: id
in: query
required: true
schema:
type: string
description: A comma-separated list of UIDs of the articles.
- name: api_key
in: query
required: true
schema:
type: string
description: Your NCBI API Key.
responses:
'200':
description: References information
content:
text/plain:
schema:
type: string
/elink.fcgi:
get:
operationId: fetchCitedBy
summary: Fetches cited references for a specified UID.
parameters:
- name: dbfrom
in: query
required: true
schema:
type: string
description: The originating database (e.g., pubmed).
- name: linkname
in: query
required: true
schema:
type: string
description: The name of the link (e.g., pubmed_pubmed_citedin).
- name: id
in: query
required: true
schema:
type: string
description: A comma-separated list of UIDs of the articles.
- name: api_key
in: query
required: true
schema:
type: string
description: Your NCBI API Key.
responses:
'200':
description: Cited references information
content:
text/plain:
schema:
type: string
- Click "Create" in the upper right corner. Set the disclosure range to "only me".
3. What you can do with GPT “PubMed Researcher
This GPT allows you to do the following by the above instructions.
(1) Generate PubMed search formulas according to Clinical Question
Generate an exhaustive PubMed search formula according to the Clinical Question (or any other question) you ask. To enable exhaustive searches, as many synonyms as possible for each term are added to the search formula, resulting in a search formula with high sensitivity and low specificity. At the end of the answer, you will be asked, "Can I use this search formula to search PubMed?" If you need to modify the formula, you can tell the modifications there.
When a PubMed search is ordered, GPT first uses an API key to obtain the PMIDs of articles that have been hit as search results in ESearch. This obtains the number of hits and PMIDs of the articles. In this case, the response is "X papers hit". Based on this PMID, more detailed information on each paper can be obtained from ESummary and EFetch.
(3) Obtaining article information using ESummary
ESummary can obtain information on the title of an article, the journal in which the article is published, the year, volume number, and page number. Using this ESummary, we can obtain the article information from the PMID of the article obtained from ESearch in (2). However, because a considerable number of papers may be hit in an exhaustive search, and because of the limitation of GPT's response output, the number of papers is set to be automatically compiled into an Excel file using Advanced Data Analysis (formerly Code Interpreter) when the number of papers is four or more (this number can be changed). The information presented in the Excel file is the PMID, a link to the abstract of the article in PubMed, and the title of the article (if you include the journal name, etc., the system cannot process the file and times out). Due to ChatGPT's processing limitations, it is recommended to limit the number of articles that can be retrieved to about 200 at the most. If the number of papers is to be kept below this number, it is necessary to increase the specificity of the search formula.
(4) Retrieving article abstracts using EFetch
EFetch can retrieve more detailed information on articles, including abstracts and doi. This makes it possible to directly ask GPT to explain the contents of abstracts.
ELink can also be used to obtain information on similar articles, referenced articles, and cited reference articles for a particular paper. Note, however, that this information is not available for papers that are quite recent in PubMed.