MedMCQA Homepage

About

MedMCQA, a large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.
The MedMCQA task can be formulated as X = {Q, O} where Q represents the questions in the text, O represents the candidate options, multiple candidate answers are given for each question O = {O1, O2, ..., On}. The goal is to select the single or multiple answers from the option set.

Dataset

MedMCQA has More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity.

Submission

To submit your model, please follow the instructions in the GitHub repository.

Citation

If you use MedMCQA in your research, please cite our paper by:


@InProceedings{pmlr-v174-pal22a,
  title = 	 {MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering},
  author =       {Pal, Ankit and Umapathi, Logesh Kumar and Sankarasubbu, Malaikannan},
  booktitle = 	 {Proceedings of the Conference on Health, Inference, and Learning},
  pages = 	 {248--260},
  year = 	 {2022},
  editor = 	 {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan},
  volume = 	 {174},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--08 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v174/pal22a/pal22a.pdf},
  url = 	 {https://proceedings.mlr.press/v174/pal22a.html},
  abstract = 	 {This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects & topics. A detailed explanation of the solution, along with the above information, is provided in this study.}
}

Leaderboard (w/o Context)

In the w/o Context setting, For the experiments that do not use context,
[CLS] Question [SEP] Option [SEP]

	Model	Code	Test Set	Dev Set
	Model	Code	Acc (%)	Acc (%)
1 March 10, 2022	BERT (Devlin et al., 2019) _Base		0.33	0.35
1 March 10, 2022	BioBERT (Lee et al.,2020)		0.37	0.38
1 March 10, 2022	SciBERT (Beltagy et al., 2019)		0.39	0.39
1 March 10, 2022	PubmedBERT(Gu et al., 2022)		0.41	0.40
1 December 5, 2022	Codex 5-shot CoT (Liévin et al., 2022)		0.60	0.63

Leaderboard (with Context)

In the with Context setting, These contexts are combined by [SEP] token with the concatenation of question and answer pair. This creates four input sequences per question.
[CLS] Context [SEP] Question [SEP] Option [SEP]

	Model	Code	Test Set	Dev Set
	Model	Code	Acc (%)	Acc (%)
1 March 10, 2022	BERT (Devlin et al., 2019) _Base		0.37	0.35
1 March 10, 2022	BioBERT (Lee et al.,2020)		0.42	0.39
1 March 10, 2022	SciBERT (Beltagy et al., 2019)		0.43	0.41
1 March 10, 2022	PubmedBERT(Gu et al., 2022)		0.47	0.43
1 July 17, 2022	InstructGPT zero-shot CoT (Liévin et al., 2022)		0.49	0.49
1 September 23, 2022	VOD BioLinkBERT (Liévin et al., 2022)		0.58	0.63