The calling has been there for the government statisticians to embrace AI albeit with caution and calibration
With the advent of intuitive machine coherent processes like intelligence, recognition, enlightenment, systematic enquiry, and resolution connected with evolution in data collection, and accumulation, rational investigations, and computer processing power, AI provides occasions to companionship and addition to human intelligence and enhances how the people lead their work and life.
In 2018, NITI Aayog developed a National Strategy for Artificial Intelligence namely, # AI for All. This strategy aims to ensure the safe and responsible use of AI. It also focuses on how India can leverage AI for social inclusion and economic growth. The field of AI is vast which includes Machine Learning, Natural Language Processing, Large Language Models, Cognitive Computing, etc. The usage also is of varying kinds depending on the field in which it is applied. One of the tools of governance, the Official Statistical System is a field where the calling has been there to embrace but the handshake is yet to take place.
The collection of data is one of the most important ingredients of Official Statistics. The Collection of data is done mostly through Census Surveys and as a by-product of administrative action like data generated as part of tax collection exercise. While undertaking Census and Surveys, one of the biggest problems faced is data quality and non-response which leads to missing data. With the help of AI, the task of finding missing and problematic data and imputation i.e. task of altering incorrect values and inserting missing values can be done in a much faster manner by applying the ML models. Another possible use of AI could be the use of a Large Language model-based Chatbot in the data collection process. This Chatbot like Chat GPT would assist in providing information instantly to the enumerator from the fieldwork manual. It would save them from the trouble of going through the voluminous documents time and again which leads to drudgery and monotony in the fieldwork. Another use case in a country like ours which has more than twenty Official languages could be an AI-based utility that helps in two-way conversation between the enumerator and the respondent which will go a long way in resolving the language barrier faced by our data warriors in the field. Further, the multi-modal bots with the capacity to read images would be of great help as they would be able to give immediate clarification to the enumerator in case confusion regarding definition arises while collecting data.
Another potential use of AI, especially ML could be the classification and coding of Textual Data for official statistics. In the context of the coding and classification work in statistical organisations, the given set is typically a text or narrative provided by the respondent from a survey or administrative data source. For example, it could describe an individual's occupation or the economic activity of the company described in an administrative business register. With the increasing use of new data sources, the text data that statistical organisations might work with could also include product descriptions scraped from the internet or text posts obtained from social media platforms such as Twitter. The aim of classification and coding in this scenario could be to classify the descriptions into international or corporate statistical classification systems, such as Standard Industrial Classification (SIC), and Standard Occupational Classification (SOC) for easy understanding and quick analysis.
Image data is still a relatively new type of data for statistical organisations but there is a growing need for exploring how image data can be used for the production of statistics. These models could associate the variables of interest (e.g., building type, land cover type) with the images in the training data set and classify new images with reasonable levels of accuracy. Automation (or partial automation) of these tasks can allow a large volume of information in the image data to be processed in a reliable and fast way. The cost of deployment including the computation required is one of the major factors affecting the deployment of AI in an organisation. The other main challenge facing statistical organisations to advance the use of AI in organisations is bringing together the required skills as AI requires people from Domain, Computer Science and Statistics; all working together on a project together for its meaningful use.
The alignment of AI use cases with the vision of the organisation is the most important aspect of the adoption of AI. The development of AI should not be an ad-hoc process but a part of a well-thought-out strategy which has the support of the top management and confidence of all stakeholders as a tool for reducing their burden rather than replacing them or their work.
(The writers are Dy Director Generals in the Ministry of Statistics & Programme Implementation, views are personal)