From Data to Diagnosis: Knowledge-Driven, Explainable AI for Reliable Early Autism Detection

Qusai Shambour, Mahran Al-Zyoud, AbdelRahman Hussein
Interdisciplinary Journal of Information, Knowledge, and Management  •  Volume 20  •  2025  •  pp. 032

The primary aim of this study is to address the persistent challenge of delayed autism spectrum disorder (ASD) diagnosis in toddlers. Early detection enables timely interventions that can improve developmental outcomes; however, conventional approaches rely on lengthy and resource-intensive behavioral assessments. We therefore introduce an interpretable AI screening framework designed to accelerate ASD triage while providing clinically understandable rationales to support decision-making.

Traditional ASD diagnosis depends on expert behavioral evaluation and parent reports which, despite their value, are time-consuming and capacity-limited, delaying access to early intervention. With ASD prevalence rising, scalable and effective approaches are urgently needed. This study proposes a robust AI framework for early ASD detection that integrates targeted preprocessing, feature selection, principled model optimization, and post-hoc explanations, aiming to improve diagnostic utility and clarity for end users in clinical and community settings.

We develop a unified, reproducible pipeline that combines data preprocessing, class balancing, feature selection, and Bayesian hyperparameter tuning. The pipeline also incorporates SHapley Additive exPlanations (SHAP) to provide model explanations. Six diverse machine learning models – Extreme Gradient Boosting (XGB), Histogram-based Gradient Boosting (HGB), Random Forest (RF), Naïve Bayes (NB), Mixture Discriminant Analysis (MDA), and Multi-layer Perceptron (MLP) – are evaluated to assess framework robustness rather than to crown a single best classifier. A cross-cultural dataset of toddlers aged 12–36 months (n=1,560) is constructed by merging two public sources containing Q-CHAT-10 items with demographic attributes. Preprocessing removes non-informative variables and encodes categorical features; Gaussian noise-based upsampling (GNUS) mitigates post-merge imbalance; RobustScaler stabilizes training. Gradient Boosting Feature Selection (GBFS) ranks and reduces features to enhance parsimony and interpretability. Performance is reported via accuracy, precision, recall, F1, and Matthews Correlation Coefficient (MCC). Model behavior is elucidated with SHAP to reveal feature contributions and decision pathways.

This work presents an interpretable AI framework for early ASD detection that couples performance with clinician-oriented explanation in a single pipeline. Rather than optimizing for accuracy alone, we emphasize synergy among preprocessing, balancing, feature selection, and explanation – the multimodel evaluation evidence adaptability across algorithmic families. GBFS and SHAP are used to ensure concise, explainable predictions. Notably, the framework achieved very strong internal validation results (high F1 and MCC across folds) with XGB, while SHAP-derived patterns aligned with clinical heuristics. Results are promising but preliminary, pending external, multi-site validation.

GNUS and robust normalization improved generalization on the cross-cultural dataset. With GBFS-selected features, XGB achieved near-ceiling internal scores across key metrics, a trend observed – though to a lesser extent – in other models after comparable optimization. SHAP consistently highlighted behaviors such as gaze-following and social/emotional responsiveness among the most influential predictors, in line with clinical practice. Collectively, the findings indicate that interpretable ML can complement conventional screening, while warranting prospective and external validation to assess generalizability and potential dataset shift.

Clinicians and community programs may consider adopting interpretable ML as a screening aid to prioritize referrals and shorten time-to-assessment. Attention to features repeatedly identified as influential can guide focused early interventions and resource allocation.

Future studies should test the framework on larger and more diverse cohorts to evaluate generalizability. Exploring ensembles and deeper architectures, as well as alternative preprocessing, resampling, and feature selection strategies, may further enhance performance, particularly for cases that are borderline.

Earlier, more reliable screening can improve access to services during critical neurodevelopmental windows. Integrating interpretable AI into practice may also strengthen clinician confidence in ML-assisted tools, supporting responsible, human-centered deployment and broader public health benefits.

Next steps include conducting real-world pilots across various clinical/community settings, integrating with complementary diagnostic tools to build multimodal platforms, and systematically evaluating balancing/optimization choices. These directions will help translate the framework into practical impact and inspire analogous applications in pediatric neurodevelopmental assessment.

data-to-diagnosis, early diagnosis, autism spectrum disorder, machine learning, explainable artificial intelligence, interpretable AI, knowledge-driven screening
6 total downloads
Share this
 Back

Back to Top ↑