Research

My research interests lie at the intersection of computational linguistics, social media mining, applied machine learning and biomedical sciences. Most of my recent works have been in health-related natural language. My research tasks involve health-related language data from various sources such as published literature, clinical records, and social media. I am interested in solving both big data and small data problems. 

The following is a brief description of my current and past projects.

See the  Publications page for a list of my recent publications.

I also publish code via bitbucket and data/resources (e.g., via mendeley)

[This page is being updated… 11/1/2017]
<!–

Selected Current Project(s):

Mining social network postings for mentions of potential adverse drug reactions

The goal of this project is to deploy the infrastructure needed to generate signals of adverse drug reactions from social media posts. The overall pipeline for the task includes several sub-systems and one or many modules within each sub-system. A simplified pipeline, starting from data collection and ending in a visual representation of generated signals, is shown below.

Screen Shot 2017-08-31 at 10.42.49 PM.png 

This project is funded by the National Institute of Health (NIH).

URL: http://projectreporter.nih.gov/project_info_description.cfm?projectnumber=1R01LM011176-01

Social media based prospective observational cohort study to assess drug-safety during pregnancy

This project focuses on developing an innovative, data-centric, social media oriented approach to complementing the knowledge gathered from pregnancy registries regarding drug exposure during pregnancy. The key innovation of this project is the design of a prospective observational cohort study from social media data, where the processes of cohort enrollment, longitudinal data collection, and outcome analysis are all automated. Such an approach, if successful, will enable the systematic analysis of data from cohort sizes in the scale of millions (billions of posts). The following diagram provides a simplified view of the pipeline for this project.

Screen Shot 2017-08-31 at 11.09.05 PM–>

Mining social network postings for monitoring prescription medication abuse

Prescription medication abuse and overdose are the fastest growing drug-related problem in the USA. The growing nature of this problem necessitates the implementation of improved monitoring strategies for investigating the prevalence and patterns of abuse of specific medications. The primary aims are to assess the possibility of utilizing social media as a resource for automatic monitoring of prescription medication abuse and to devise automatic techniques for identifying and assessing the extent of abuse of various prescription medications.

cohortpipeline

[DRAFT DIAGRAM]

Text summarization for evidence-based medicine

The goal of this project is to develop algorithms for performing query-focused text summarization of published medical literature for evidence-based medicine. The specific tasks involve: (i) automated classification of the qualities of medical evidence, (ii) single-document, query-focused, extractive summarization, and (iii) multi-document summarization via sentence-level polarity classification. The following diagram illustrates all the tasks and the entire workflow of the developed system.

Screen Shot 2017-09-01 at 9.08.22 PM.png

URL: http://web.science.mq.edu.au/~diego/medicalnlp/