Friday, December 10 2021
12:30pm - 2:30pm
Masters Presentation
Data discretization with parallel computing and its effect on model performance.

Discrete values have essential roles in data mining and knowledge discovery. Attribute discretization aims to find concise data representations as adequate for the learning task to retain as much information in the original continuous attribute as possible. Discrete values are about intervals of numbers that are easier to use and comprehend as intuitively they are closer to a knowledge-level representation than continuous values. Many studies show that predictive tasks can benefit from discretization: rules with discrete values are usually shorter and discretization can improve predictive accuracy. The discretization techniques may also address data issues such as handling missing values, outliers and at the same time, maintaining interpretability from a modeling perspective. This work investigates different discretization methods focusing on their practical implementation in predictive modeling for the small business and consumer lending industry. We showcase the most commonly used algorithms and compare their efficiency based on the quality of prediction and computational costs metrics achieved on a real-life dataset. We also explore the most efficient method within a parallel computational framework environment.
Speaker:Andrei Matveev
Location:Zoom See email

Download as iCalendar