In this paper, we evaluate a baseline word embedding model for a set of clinical notes derived from patient records. For our baseline, we extract features for this embedding using the Word2Vec module from the gensim package. We also build two models, a word2vec skipgram model with negative sampling and a positive point-wise mutual information (PPMI) model by training on the processed clinical notes. Our evaluation shows that both the PPMI and the skipgram models show improved results for medically-related terms when compared with the baseline model. PPMI shows the best result out of all three models.
Hathaitorn Rojnirun, Oluseye Bankole