Discriminative Boosted Bayesian Networks For Learning Multiple Cardiovascular Procedures
Nandini Ramanan
Dallas, Texas
- 0 Collaborators
We consider the problem of predicting three procedures, viz, EKG, Angioplasty and Valve Replacement procedures jointly from Electronic Health Records (EHR) and develop a discriminative boosted Bayesian network algorithm. Differences between our proposed approach and standard Bayes Net structure learners are (1) we do not assume that the number of features (observations) are uniform across training examples and (2) our method explicitly handles the precision-recall tradeoff. Our empirical evaluations on a real EHR data demonstrate the superiority of this proposed approach to learning these procedures individually ...learn more
Project status: Published/In Market
Overview / Usage
We consider the problem of modeling multiple cardiovascular procedures jointly. Specifically, we consider three of the most common procedures: electrocardiogram (EKG), angioplasty and valve replacement procedures. For example, consider a patient who enters a hospital. Our aim is to build a system that can predict if these procedures are going to be performed on the patient based on his/her clinical measurements along with behavioral data till the admission. Our hypothesis, that we verify empirically, is that joint modeling of these procedures is more effective than modeling each of them separately. Yet another important advantage of predicting these procedures is that it will enable the prediction of future medical costs for a patient and the hospital leading to a better allocation of monetary and hospital resources towards the patient treatment.
Methodology / Approach
we propose a discriminative structure learning algorithm where the loglikelihood is factored by the individual conditional likelihood of the target variables. Specifically, we assume the presence of two types of variables - observed variables/features(in our case, clinical measurements and the behavioral data) and target/modeling variables (in our case, the cardiovascular procedures). Given these separate set of variables, we proceed to learn the conditional distributions of the target variables. To avoid cyclic dependencies in the model, we assume the presence of an ordering among the target variables. This is not a critical assumption as typically, the number of target variables is small enough to search through the space of all possible orderings efficiently. For learning each conditional distribution, we employ the recently successful gradient boosting methods. The advantage of these methods is that they can simultaneously learn the qualitative and quantitative dependencies in the conditional distribution. Finally, given that our data is highly imbalanced, i.e., the number of patients on whom a certain procedure is performed is smaller than the number of patients who did not have that procedure, we employ a cost-based loglikelihood function that allows us to balance the precision vs. recall (sensitivity vs. specificity) trade-off in a principled manner.
Technologies Used
Java
Machine learning