Assessing the role of granularity upon decision tree complexity and accuracy in predicting nursing home defficiencies

Quality of a nursing home depends not only on having vast amount of information, but requires the ability to comprehend and represent the core features that can cause a particular deficiency. Decision trees exist as a tool that can create a rule-based representation of the features and their causal effect. Rough Sets, on the other hand, provide mechanisms to extract significant features and contradictory rules in a data set. However, as little use has been made of these techniques in nursing home research; therefore, this study explores the relationships between feature granulation and decision tree comprehension and accuracy in prediction of deficiencies. After cleaning data taken from the U.S. Medicare website, three forms of granulations were performed: attribute grouping, removal of insignificant attributes and finally, increasing data consistency by removing contradictory cases. The study found that attribute grouping decreased the tree size, whereas removing insignificant attributes decreased tree complexity and data consistency. In addition, positive correlation was discovered between the removal of insignificant attributes and the error in prediction. Lastly, increasing the data consistency only showed negative correlation with error in prediction. Therefore, obtaining comprehension and accuracy of a model requires a balance between attribute binning, removal and data consistency.

Publications