Comparison of Classification Methods used in Machine Learning for Dysgraphia Identification

Dysgraphia is a disorder that affects writing skills. Dysgraphia Identification at an early age of a child's development is a difficult task. It can be identified using problematic skills associated with Dysgraphia difficulty. In this study motor ability, space knowledge, copying skill, Visual Spatial Response are some of the features included for Dysgraphia identification. The features that affect Dysgraphia disability are analyzed using a feature selection technique EN (Elastic Net). The significant features are classified using machine learning techniques. The classification models compared are KNN (K-Nearest Neighbors), Naïve Bayes, Decision tree, Random Forest, SVM (Support Vector Machine) on the Dysgraphia dataset. Results indicate the highest performance of the Random forest classification model for Dysgraphia identification.


Introduction
Dysgraphia is a learning difficulty caused by a neurobiological disorder, Dysgraphia symptoms can be analyzed through problematic skills related to Dysgraphia difficulty. Dysgraphia in some cases can be due to Dyslexia when a learner is incapable to pronounce properly and write incorrect spellings. [1] Dysgraphia in learners can show symptoms of weak motor skills, space knowledge, or incorrect spellings.

[2] [3]
Visual-motor integration is required to perform tasks related to writing.
[4] Weakness in Motor skills affects the movements of hands, which as a result cause illegibility in writing, [5] lack of Motor skills indirectly adds to the Dysgraphia symptoms. Another Reason that causes Dysgraphia symptom is lack of space knowledge.
[3], [6] When the learner writes improperly out of line maximum time, this indicates the learner has no space knowledge. The lack of cognition can result in Dyslexia and this indirectly contributes to Dysgraphia difficulties. [7]  Figure 1 explains the skills that are related to Dysgraphia identification. Dysgraphia disorder can be determined by analyzing skills related to motor ability and cognitive skills. Activities that are focused on visual-motor integration skills can be used to detect motor capabilities. Space knowledge can also be used to determine motor skills. In case, if Dysgraphia is due to Dyslexia cognitive skills can be observed through spelling, sentence word expression, jumbled words, and handwriting legibility. [21] [23] Learners are different and can show single or combine symptoms of Dysgraphia. [6] Therefore, it is important to analyze which features highly contribute in Dysgraphia identification. These features provide learner characteristics and behavior to identify Dysgraphia. Dysgraphia identification for providing learner specific environment in learning is important. [28] The feature selection method Elastic Net has been used on the Dysgraphia dataset for feature selection. The proposed study solved Dysgraphia identification problem by using classification algorithms. Classification models are compared and the model with high performance is selected for Dysgraphia identification. So, The main focus of this work is to select features that can be used to identify Dysgraphia Difficulty. Secondly, all these factors will be analyzed and highly contributing features will be selected using the feature selection method EN(Elastic Net). Thirdly, these features will be trained on KNN, Linear regression, Naïve Bayes, decision tree, and Random forest models. The model with the highest performance will be used for Dysgraphia identification.

Related Work
Learning difficulties and its severity in India has been discussed in many studies. Learning Difficulty specifically, Dysgraphia is a disorder that makes it difficult for learners to write as normal learners [8] Dysgraphia is a neurobiological disorder and causes the learner to have poor writing skills. This disorder can be a result of weak motor skills, no space knowledge, or low cognitive skills. [9] [21] The development of motor skills is required at elementary school for better handwriting skills.
[10] In early childhood, motor skills development using dot-connecting exercise help to improve the underdeveloped muscular activities. [5] Space knowledge of learners with Dysgraphia is a problematic skill, it results in illegible handwriting.
[10] the weak visual-spatial response also makes writing difficult, and learners often get confused with left-right direction. [11] Sentence structure and word expression are other problematic skills that contribute to Dysgraphia symptoms. Developed motor skills can improve the writing capabilities of a learner. How motor skills affect Dysgraphia severity is not explored in previous studies.
[12] Data mining has been used to understand the fact and plan actions with that data. [14]. The data analysis process will generate a clear idea about the fact related to the data and correlation among data present. This will help in maintaining the effectiveness of the data for prediction. Abundant data of scholarly need to be visualized properly for fact generation from data. Feature selection has been extensively used for data knowledge and to understand the relationship among features. [14] [15] [24] [25] The data of learners with Dysgraphia along with normal learners data need to be properly explored before using machine learning and deep learning methods. Dyslexia, Dysgraphia, and Dyscalculia Identification has been solved using the fuzzy k-mean clustering approach. [13] The deep learning approach is used for solving writing difficulties, cognitive disabilities, [16], and spoken language understanding [17]. Machine learning models SVM, KNN, and Random Forest are compared for Dyslexia, Dysgraphia, and Dyscalculia prediction. The input used in these Machine Learning models is all extracted from the game based screening of these disabilities. [19] Machine learning models SVM, KNN results are evaluated and compared on accuracy performance metrics, these models with high accuracy are then used in ensemble machine learning models for final results. The performance of each machine learning model is individually compared with the ensemble machine learning model. [18] The author in this study has proposed the identification of Dyslexia, Dysgraphia, and Dyscalculia through a mobile application. Handwriting samples and audio samples are analyzed for prediction.
[20] Dysgraphia has been predicted using 52 extracted handwriting attributes (velocity, acceleration, etc…). PCA has been used to visualize attributes from handwriting. [21] Dysgraphia is predicted based machine learning model for third-grade children developmental Dysgraphia prediction. Pen pressure, pen position, and pen lifts are taken as input by using a digital writing pad, input in the machine learning model is given from 99 samples for prediction. The dataset used for Dysgraphia prediction includes data related to dot-connecting exercise to analyze the motor skills of learners with Dysgraphia. Writing samples to analyze the legibility and space knowledge of learners. Other parameters for dysgraphia prediction are based on skills related to sentence structure, sentence word expression, visual-motor integration, and visual-spatial relation. The Pretest and handwritten content were taken as input in form of a questionnaire in a computer-based test. Score and time data of learners with a learning disability and non-learning disability of 240 learners are analyzed. Boolean input is taken through sentence structure, word formation, and visual-spatial response, handwritten content is analyzed using image processing technique Structural Similarity Index Measure (SSIM), spellings check through a spelling checker. Our previous study explains the extraction of features, subtype, and their mapping with Dysgraphia difficulties in detail. [23]

Feature Selection and Classification
Features have been selected using EN, feature selection technique. This technique is selected based on its performance in previous researches. [24] - [26] The most important features are selected and are trained and tested over classification models KNN, Naïve Bayes, Decision tree, Random Forest, SVM. Some of these classification models are used for Dysgraphia prediction in previous studies.
[18] [19] The 80 % of data is used in this study as training data and the rest 20% as the testing data. These five classification models are compared for their performance, accuracy, and AUC/ROC curve is used as the performance metrics used for comparing these models. Equation 1 represents the accuracy, TP-True Positive and are the instances which have been correctly predicted positive instance, TN -True Negative are the instances which are correctly predicted negative instances, FP-False Positive is the instance which are incorrectly predicted positive instance, FN-False Negative is the instance which is incorrectly predicted negative instance Feature selection and all classification models are implemented using scikit learn package in Python (version-3.6).

Result and Discussion
Feature Selection is the only way to cut through a dataset, Here it is identified to pick data points like literacy skill, Phonological Awareness, Visual-Spatial Relation, Visual-Motor Integration, Spellings, Handwriting Legibility, Short /Long term memory. Still, they are more than 20 data points from single datapoints like literacy skill, phonological awareness, sentence word expression, word formation, addition, subtraction, reasoning, place value, direction, rhyming, basic mathematic skills, word problem, decoding, random naming, and spellings to derived data points such as visual motor integration, handwriting legibility, reading analysis. Feature Extraction, feature scaling, feature transformation, are the techniques for improving the accuracy of a data-based model many techniques. Starting from feature selection, there are various Feature selection techniques used for feature analysis. In this study Elastic Net (EN) is used, It is one of the effective feature selection method used in data exploration. The coefficient values of the features analyzed are depicted in Figure  2. The highly contributing factors for Dysgraphia prediction are found to be Legibility, motor skills (VMI_MS), Visual-spatial response (VSR_LR_1), basic Reading skills, Literacy skills (LS_LI), and spellings. It has been revealed by the feature selection process that motor skill and space knowledge are majorly contributing features with high significance for Dysgraphia identification. The dimensionality of dataset is reduced by 15.97% using EN feature selection method. Sixteen Features are selected after dimensionality reduction of the dataset. These Sixteen Features are sorted from lowest to highest significance in Dysgraphia identification in Figure 2. These sixteen features are trained using classification models KNN, Naïve Bayes, Decision tree, Random Forest, SVM.  Table 1 represents the accuracy of KNN, Naïve Bayes, Decision tree, Random Forest, SVM models the Random Forest has been found to have the highest accuracy. The accuracy is influenced when these models are trained after using feature selection methods. This shows how feature selection can improve the overall accuracy of the trained models. The Accuracy and AUC/ROC value when compared after using EN Feature Selection method improved significantly.
The accuracy of all classification methods has increased by some decimal points when only selected features are trained over all Machine Learning models. Feature selection before implementing a classification model hence proved to improve overall efficacy of classification models. The accuracy of Random Forest, KNN, and SVM are comparatively the same with 99.03%, 99.00%, and 99.00% accuracy score. Naïve Bayes and SVM yielded 91.58% and 91.00% accuracy score. It is evident from figure 3 that AUC for the KNN, Decision tree, and RF ROC curve is higher than that for the ROC curve of the Decision tree and SVM. The performance of KNN, NB, and RF is found to be comparatively the same for Dysgraphia prediction, NB and SVM performance yielded the low performance when compared with KNN, Decision tree, and RF classification methods.
The results indicate that the performance of most classification algorithms is comparatively same on dataset for dysgraphia identification. KNN, Decision tree and Random Forest performance was significant in predicting the correct output when compared with performance of Naïve Bayes and SVM classification algorithm. When, the models were trained on selected features with high significance using Feature selection method Elastic Net. It has slightly influenced the accuracy metrics of classification models. The limitation of this study is that no deep learning approach is discussed and compared in this study. is influencing the accuracies on same dataset and selected features when compared with accuracies of Machine Learning models. Moreover, Dysgraphia dataset can be improved by integrating Cnn for extracting data of learners with Dysgraphia.

Conclusion
In this study, Dysgraphia disability has been analyzed using the Feature selection method EN. It has been analyzed in the proposed study that Motor skills affect the Dysgraphia severity and contribute majorly to developing Dysgraphia from the initial years of child development. Other features such as cognition and space knowledge have also contributed to Dysgraphia severity in children. These models which have been used previously are compared for Dysgraphia identification. Random forest yielded the highest accuracy for Dysgraphia prediction with 99.03% accuracy. The dataset can be enhanced by integrating IOT devices to get real time data. Also, classification process needs to be improved by comparing deep learning approaches in further studies.