Publications (selected)

An updated list should be on my Google Scholar page (google typically does a better job than me).


A Sarker, M Belousov, J Friedrichs, K Hakala, S Kiritchenko, F Mehryary, S Han, T Tran, A Rios, R Kavuluru, B de Bruijn, F Ginter, D Mahata, S Mohammad, G Nenadic, G Gonzalez-Hernandez. Data and systems for medication-related text classification and concept normalization from Twitter: Insights from the Social Media Mining for Health (SMM4H) 2017 shared task. Journal of the American Medical Informatics Association. [Article in press]

A Sarker, G Gonzalez-Hernandez. An unsupervised and customizable misspelling generator for mining noisy health-related text sources. arXiv preprint arXiv:1806.00910. [under review; Journal of Biomedical Informatics]

A Magge, D Weissenbacher, A Sarker, M Scotch, G Gonzalez-Hernandez. Deep neural networks and distant supervision for geographic location mention extraction. Bioinformatics 34 (13), i565-i573.

A Sarker, G Gonzalez, FJ DeRoos, LS Nelson, J Perrone. Toxicovigilance through social media: quantifying abuse-indicating information in Twitter data. EAPCCT. 56 (6), 454.

M Rouhizadeh, A Magge, A Klein, A Sarker, G Gonzalez. A Rule-based Approach to Determining Pregnancy Timeframe from Contextual Social Media Postings. In Proceedings of the 2018 International Conference on Digital Health, 16-20.


A Sarker, P Chandrashekar, A Magge, H Cai, AZ Klein, G Gonzalez. Discovering cohorts of pregnant women from social media for safety surveillance and analysis. Journal of Medical Internet Research (JMIR)DOI:10.2196/jmir.8164 [Article in press]. [resources]

A Sarker. A Customizable Pipeline for Social Media Text Normalization. Social Network Analysis and Mining. DOI: 10.1007/s13278-017-0464-z. [resources]

G Gonzalez, A Sarker, K O’Connor, G Savova. Capturing the Patient’s Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. IMIA Yearbook of Medical Informatics. 2017:214-27

A Sarker*, D Weissenbacher*, T Tahsin, M Scotch, G Gonzalez. Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods. Proceedings of AMIA TBI Symposium. 2017. [*equal contribution first authors].

A Sarker, G Gonzalez. HLP@UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors. SemEval-2017. 640-643. 2017.

AZ Klein, A Sarker, M Rouhizadeh, K O’Connor, G Gonzalez. Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System. BioNLP. 2017. Pages 136-142. [resources]

A Sarker, G Gonzalez. A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilitiesData in Brief Journal. Volume 10. Pages 121-131. DOI: 2017. [resources]

A Sarker, A Magge, A Sharma. Dermatologic Concerns Communicated Through TwitterInternational Journal of Dermatology. doi:10.1111/ijd.13506. 2017.

A Sarker, D Malone, G Gonzalez. Authors’ Reply to Jouanjus and Colleagues’ Comment on “Social Media Mining for Toxicovigilance: Monitoring Prescription Medication Abuse from Twitter”. Drug Safety. Feb;40(2):187-188. doi: 10.1007/s40264-016-0498-6. 2017.


I Korkontzelos, A Nikfarjam, M Shardlow, A Sarker, S Ananiadou, G Gonzalez. Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts. Journal of Biomedical Informatics (JBI). Volume 62. Pages 148-158. 2016.

A Sarker, K O’Connor, R Ginn, M Scotch, K Smith, Dan Malone, G Gonzalez. Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter. Drug Safety. Pages 1-10. DOI: 10.1007/s40264-015-0379-4. 2016. [resources]

A Sarker, D Molla, C Paris. Query-oriented evidence extraction to support evidence-based medicine practiceJournal of Biomedical Informatics (JBI). Volume 59. Pages 169-184. DOI: 2016. [code]

R Sullivan, A Sarker, K O’Connor, A Goodin, M Karlsrud, G Gonzalez. Monitoring nutritional supplements: Challenges and promises of mining user comments for adverse events. Proceedings of the Pacific Symposium on Biocomputing. 2016.

A Sarker, G Gonzalez. DiegoLab16 at Semeval-2016 Task 4: Sentiment Analysis in Twitter using Centroids, Clusters and Sentiment Lexicons. SemEval-16. 214-219. 2016.

MJ Paul, A Sarker, J Brownstein, A Nikfarjam, M Scotch, K Smith, G Gonzalez. Social media mining for public health monitoring and surveillance. Proceedings of the Pacific Symposium on Biocomputing. 2016.

A Sarker, A Nikfarjam, G Gonzalez. Social Media Mining Shared Task Workshop. Proceedings of the Pacific Symposium on Biocomputing. 2016. [description] [task 1 page] [task 2 data]


D Molla, E Santiago-Martinez, A Sarker, C Paris. A Corpus for Research in Text Processing for Evidence Based MedicineJournal of Language Resources and Evaluations. 2015. [resources]

A Sarker, R Ginn, A Nikfarjam, K O’Connor, K Smith, S Jayaraman, T Upadhaya, G Gonzalez. Utilizing social media data for pharmacovigilance: A review. Journal of Biomedical Informatics (JBI). Volume 54. Pages 202-212. DOI: 2015. [Editor’s choice; Nominated for ATLAS (September, 2015)] [resources]

A Sarker, D Molla, Cecile Paris. Automatic evidence quality prediction to support evidence-based decision making. Artificial Intelligence in Medicine (AIIM). Volume 64. Pages 89-103. DOI: 2015. [corpus]

A Nikfarjam, A Sarker, K O’Connor, R Ginn, G Gonzalez. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association (JAMIA). DOI: Pages: 0-11. 2015. [data/resources] [code]

A Sarker, A Nikfarjam, D Weissenbacher, G Gonzalez. DIEGOLab: An Approach for Message-level Sentiment Classification in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SEMEVAL). Pages: 510-514. 2015.


A Sarker, G Gonzalez. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of Biomedical Informatics (JBI). Volume 53. Pages: 196-207. DOI: 10.1016/j.jbi.2014.11.002. 2014. [data/resources] [code]

K O’Connor, A Nikfarjam, R Ginn, P Pimpalkhute, A Sarker, K Smith, G Gonzalez. Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions. In Proceedings of the American Medical Informatics Association Annual Symposium (AMIA). 2014.

R Ginn, P Pimpalkhute, A Nikfarjam, A Patki, K O’Connor, A Sarker, G Gonzlez. Mining Twitter for adverse drug reaction mentions: a corpus classification benchmark. In Proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (BIOTXTM). 2014. [resources]

D Molla, C Jones, A Sarker. Impact of citing papers for summarisation of clinical documents. In Proceedings of the Australasian Language Technology Association (ALTA) Workshop. Pages: 79-87. 2014.

A Patki, A Sarker, P Pimpalkhute, A Nikfarjam, R Ginn, K O’Connor, K Smith, G Gonzalez. Mining Adverse Drug Reaction Signals from Social Media: Going Beyond Extraction. In Proceedings of BiolinkSIG. 2014.

A Sarker. Automated Medical Text Summarisation to Support Evidence-based Medicine. Ph.D Thesis. Macquarie University. 2014.


A Sarker, D Molla, C Paris. An Approach for Query-Focused Text Summarisation for Evidence Based Medicine. Artificial Intelligence in Medicine (AIME). Lecture Notes in Computer Science. Volume: 7885. Publisher: Springer Berlin Heidelberg. Pages: 295-304. 2013.

A Sarker, D Molla, C Paris. Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification. In Proceedings of the International Joint Conference on Natural Language Processing. Pages: 712-718. 2013.

A Sarker, D Molla, C Paris. An Approach for Automatic Multi-label Classification of Medical Sentences. In Proceedings of the 4th International Louhi Workshop on Health Document Text Mining and Information Analysis (LOUHI). 2013.

2012 and earlier 

A Sarker, D Molla, C Paris. Extractive evidence based medicine summarisation based on sentence-specific statistics. In proceedings of Computer Based Medical Systems (CBMS). 2012.

A Sarker, D Molla, C Paris. Towards two-step multi-document summarisation for evidence based medicine: a quantitative analysis. In Proceedings of the Australasian Language Technology Association (ALTA) Workshop. Pages: 79-87. 2012.

A Sarker, D Molla, C Paris. Outcome Polarity Identification of Medical Papers. In Proceedings of the Australasian Language Technology Association (ALTA) Workshop. Pages: 105-114. 2011.

A Sarker, D Molla, C Paris. Towards Automatic Grading of Evidence. In Proceedings of the 4th International Louhi Workshop on Health Document Text Mining and Information Analysis (LOUHI). Pages: 51-58. 2011.

D Molla, A Sarker. Automatic grading of evidence: the 2011 ALTA shared task. In Proceedings of the Autralasian Language Technology Association (ALTA) Workshop. Pages: 4-8. 2011.

A Sarker, D Molla. A rule based approach for automatic identification of publication types of medical papers. In Proceedings of the Australasian Document Computing Symposium (ADCS). 2010.

A Sarker, LGC Hamey. Improved reconstruction of flutter shutter images for motion blur reduction. In proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA). Pages: 317-422. DOI: 10.1109/DICTA.2010.77. 2010. [BEST PAPER PRIZE]

A Sarker. Motion Blur Reduction from Captured Images.Macquarie University. 2010. [UNIVERSITY MEDAL RECIPIENT].


Unpublished (in progress):

Also working on: 

domain adaptation techniques for social media based text normalization (i.e., efficient normalization of medical tweets)

– deep learning based concept similarity measurements for umls

Please email me at: if you want to know more about these research tasks in progress.