Deep Learning-Based Cost-Effective and Responsive Robot for Autism Treatment

  1. Singh, Aditya 4
  2. Raj, Kislay 1
  3. Kumar, Teerath 1
  4. Verma, Swapnil 2
  5. Roy, Arunabha M. 3
  6. González Aguilera, Diego 5
  1. 1 SFI for Research Training in Artificial Intelligence, Dublin City University, D09 Dublin, Ireland
  2. 2 United Kingdom Atomic Energy Authority, Abingdon OX14 3DB, UK
  3. 3 Aerospace Engineering Department, University of Michigan, Ann Arbor, MI 48109, USA
  4. 4 Center of Intelligent Robotics, Indian Institute of Information Technology, Allahabad 211015, India
  5. 5 Universidad de Salamanca

    Universidad de Salamanca

    Salamanca, España



ISSN: 2504-446X

Year of publication: 2023

Volume: 7

Issue: 2

Pages: 81

Type: Article

DOI: 10.3390/DRONES7020081 GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Drones


Recent studies state that, for a person with autism spectrum disorder, learning and improvement is often seen in environments where technological tools are involved. A robot is an excellent tool to be used in therapy and teaching. It can transform teaching methods, not just in the classrooms but also in the in-house clinical practices. With the rapid advancement in deep learning techniques, robots became more capable of handling human behaviour. In this paper, we present a cost-efficient, socially designed robot called ‘Tinku’, developed to assist in teaching special needs children. ‘Tinku’ is low cost but is full of features and has the ability to produce human-like expressions. Its design is inspired by the widely accepted animated character ‘WALL-E’. Its capabilities include offline speech processing and computer vision—we used light object detection models, such as Yolo v3-tiny and single shot detector (SSD)—for obstacle avoidance, non-verbal communication, expressing emotions in an anthropomorphic way, etc. It uses an onboard deep learning technique to localize the objects in the scene and uses the information for semantic perception. We have developed several lessons for training using these features. A sample lesson about brushing is discussed to show the robot’s capabilities. Tinku is cute, and loaded with lots of features, and the management of all the processes is mind-blowing. It is developed in the supervision of clinical experts and its condition for application is taken care of. A small survey on the appearance is also discussed. More importantly, it is tested on small children for the acceptance of the technology and compatibility in terms of voice interaction. It helps autistic kids using state-of-the-art deep learning models. Autism Spectral disorders are being increasingly identified today’s world. The studies show that children are prone to interact with technology more comfortably than a with human instructor. To fulfil this demand, we presented a cost-effective solution in the form of a robot with some common lessons for the training of an autism-affected child.

Bibliographic References

  • Aleem, S., Kumar, T., Little, S., Bendechache, M., Brennan, R., and McGuinness, K. (2022). Random data augmentation based enhancement: A generalized enhancement approach for medical datasets. arXiv.
  • Kumar, (2021), J. Broadcast Eng., 26, pp. 844
  • Khan, W., Raj, K., Kumar, T., Roy, A., and Luo, B. (2022). Introducing urdu digits dataset with demonstration of an efficient and robust noisy decoder-based pseudo example generator. Symmetry, 14.
  • Chandio, A., Gui, G., Kumar, T., Ullah, I., Ranjbarzadeh, R., Roy, A., Hussain, A., and Shen, Y. (2022). Precise Single-stage Detector. arXiv.
  • Roy, (2022), Neural Comput. Appl., 34, pp. 3895, 10.1007/s00521-021-06651-x
  • Naude, J., and Joubert, D. (2019, January 16–20). The Aerial Elephant Dataset: A New Public Benchmark for Aerial Object Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.
  • Kim, Y., Park, J., Jang, Y., Ali, M., Oh, T., and Bae, S. (2021, January 11–17). Distilling Global and Local Logits with Densely Connected Relations. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada.
  • Tran, (2021), IEEE Access, 9, pp. 133914, 10.1109/ACCESS.2021.3115911
  • Ali, (2020), IEEE Access, 8, pp. 158702, 10.1109/ACCESS.2020.3017211
  • Khan, W., Turab, M., Ahmad, W., Ahmad, S., Kumar, K., and Luo, B. (2022). Data Dimension Reduction makes ML Algorithms efficient. arXiv.
  • Kumar, T., Park, J., and Bae, S. (2022, January 23–27). Intra-Class Random Erasing (ICRE) augmentation for audio classification. Proceedings of the Korean Society of Broadcast Engineers Conference, Las Vegas, NV, USA.
  • Park, (2020), J. Broadcast Eng., 25, pp. 854
  • Turab, M., Kumar, T., Bendechache, M., and Saber, T. (2022). Investigating multi-feature selection and ensembling for audio classification. arXiv.
  • Park, J., Kumar, T., and Bae, S. (2022, November 16). Search of an Optimal Sound Augmentation Policy for Environmental Sound Classification with Deep Neural Networks. Available online:
  • Sarwar, S., Turab, M., Channa, D., Chandio, A., Sohu, M., and Kumar, V. (2022). Advanced Audio Aid for Blind People. arXiv.
  • Singh, A., Ranjbarzadeh, R., Raj, K., Kumar, T., and Roy, A. (2023). Understanding EEG signals for subject-wise Definition of Armoni Activities. arXiv.
  • Ullah, I., Khan, S., Imran, M., and Lee, Y. (2021). RweetMiner: Automatic identification and categorization of help requests on twitter during disasters. Expert Syst. Appl., 176.
  • Kowsari, K., Meimandi, K.J., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D. (2019). Text classification algorithms: A survey. Information, 10.
  • Jamil, (2022), AI, 3, pp. 260, 10.3390/ai3020016
  • Roy, A.M., and Bhaduri, J. (2022). Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4. Comput. Electron. Agric., 193.
  • Roy, (2021), AI, 2, pp. 413, 10.3390/ai2030026
  • Roy, A.M. (2022). An efficient multi-scale CNN model with intrinsic feature integration for motor imagery EEG subject classification in brain-machine interfaces. Biomed. Signal Process. Control, 74.
  • Roy, A.M. (2022). A multi-scale fusion CNN model based on adaptive transfer learning for multi-class MI classification in BCI system. bioRxiv.
  • Roy, A.M. (2022). Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain–computer interface. Eng. Appl. Artif. Intell., 116.
  • Bose, R., and Roy, A. (2022). Accurate Deep Learning Sub-Grid Scale Models for Large Eddy Simulations, Bulletin of the American Physical Society.
  • Khan, W., Kumar, T., Cheng, Z., Raj, K., Roy, A., and Luo, B. (2022). SQL and NoSQL Databases Software architectures performance analysis and assessments—A Systematic Literature review. arXiv.
  • Dillmann, (2004), Robot. Auton. Syst., 47, pp. 109, 10.1016/j.robot.2004.03.005
  • Sahin, (2014), Educ. Sci. Theory Pract., 14, pp. 309
  • Mubin, O., Stevens, C., Shahid, S., Al Mahmud, A., and Dong, J. (2013). A review of the applicability of robots in education. J. Technol. Educ. Learn., 1.
  • Singh, (2022), Neural Comput. Appl., 34, pp. 15617, 10.1007/s00521-022-07273-7
  • Kumar, (2021), IEEE Access, 9, pp. 167663, 10.1109/ACCESS.2021.3124200
  • Chio, A., Shen, Y., Bendechache, M., Inayat, I., and Kumar, T. (2021). AUDD: Audio Urdu digits dataset for automatic audio Urdu digit recognition. Appl. Sci., 11.
  • Singh, A., Pandey, P., and Nandi, G. (2021, January 11–13). Influence of human mindset and societal structure in the spread of technology for Service Robots. Proceedings of the 2021 IEEE 8th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Dehradun, India.
  • Belpaeme, T., Kennedy, J., Ramachandrran, A., Scassellati, B., and Tanaka, F. (2018). Social robots for education: A review. Sci. Robot., 3.
  • Billard, (2003), Robot. Auton. Syst., 42, pp. 259, 10.1016/S0921-8890(02)00380-9
  • Ricks, D., and Colton, M. (2010, January 3–8). Trends and considerations in robot-assisted autism therapy. Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA.
  • Breazeal, C., Kidd, C., Thomaz, A., Hoffman, G., and Berlin, M. (2005, January 2–6). Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, (IROS 2005), Edmonton, AB, Canada.
  • Fong, (2003), Robot. Auton. Syst., 42, pp. 143, 10.1016/S0921-8890(02)00372-X
  • Bar-Cohen, Y., and Breazeal, C. (2003, January 3–6). Biologically inspired intelligent robots. Proceedings of the Smart Structures and Materials 2003: Electroactive Polymer Actuators and Devices (EAPAD), San Diego, CA, USA.
  • Kidd, C., and Breazeal, C. (October, January 28). Effect of a robot on user perceptions. Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), Sendai, Japan.
  • Breazeal, (2003), Robot. Auton. Syst., 42, pp. 167, 10.1016/S0921-8890(02)00373-1
  • Maleki, F., and Farhoudi, Z. (2015). Making Humanoid Robots More Acceptable Based on the Study of Robot Characters in Animation. IAES Int. J. Robot. Autom., 4.
  • School, T. (2022, November 16). Topcliffe Primary School. Available online:
  • Lite, T. (2022, November 16). TensorFlow Lite. Available online:
  • Phadtare, (2021), Int. J. Sci. Res. Eng. Man, 5, pp. 1
  • Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24–27). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.
  • Girshick, R. (2015, January 7–13). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.
  • Ren, (2015), Adv. Neural Inf. Process. Syst., 28, pp. 2969239
  • Adarsh, P., Rathi, P., and Kumar, M. (2020, January 6–7). YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India.
  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., and Berg, A. (2016, January 11–14). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.
  • Ning, C., Zhou, H., Song, Y., and Tang, J. (2017, January 10–14). Inception single shot multibox detector for object detection. Proceedings of the 2017 IEEE International Conference on Multimedia & ExpoWorkshops (ICMEW), Hong Kong, China.
  • Roy, A., Bhaduri, J., Kumar, T., and Raj, K. (2022). WilDect-YOLO: An efficient and robust computer vision-based accurate object localization model for automated endangered wildlife detection. Ecol. Inform.
  • Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv.
  • Ding, S., Long, F., Fan, H., Liu, L., and Wang, Y. (2019, January 24–27). A novel YOLOv3-tiny network for unmanned airship obstacle detection. Proceedings of the 2019 IEEE 8th Data Driven Control and Learning Systems Conference (DDCLS), Dali, China.
  • RobotLAB Group (2022, November 16). NAO Version Six Price. Available online: