Q&A with Snehamoy Mukherjee: Disrupting Life Sciences Analytics with Data Democratization and Industrialization


The life sciences sector is undergoing a massive boom in health data availability due to the rapid and widespread adoption of AI&ML tech. However, the flow of this data is still heavily restricted due to a multitude of reasons. In order to effectively reap the full benefits of data to drive improvements in health care delivery, it is essential that organizations democratize their data. In this Q&A, Snehamoy Mukherjee, Partner – Delivery at TheMathCompany, draws from his expertise and experience to provide a broad explanation of data democratization in the life sciences sector and why it will be key to delivering healthcare services that are more effective and people-centric.

1. Let's start with the fundamentals: how can using analytics in the life sciences industry aid in the goal of becoming patient-centric?

Ans. As the man who immortalized the lyrics “Imagine all the people, sharing all the world”, had John Lennon been alive today, he would have asked us to do the same with respect to the use of data to help improve the lives of patients. Because of this, I cannot have thought of a better brand ambassador for the cause of introducing data democratization in the life sciences sector than him. 
Imagine receiving diagnoses and prescription recommendations at hospitals similar to product suggestions on Amazon. This is exactly what sophisticated AI models will use data to do. With the help of these algorithms, healthcare professionals can efficiently and accurately diagnose disease and prescribe medications [1] at the custom dosages for patients based on their unique requirements, instead of following the traditional, one-size-fits-all approach. Think of such models like a ChatGPT for doctors! [2] 

2. While the life sciences data industry has been witnessing exponential growth, restrictions on data sharing has been cited as the main obstacle to getting the most out of healthcare data. To what extent is this true? Why is this so?

Ans. In 2008, Google had deployed a software named Google Health, with the aim of collecting data from patients and creating the single largest repository of patient-centric data in the world. This information would include patients’ medication history, existing health conditions, lab results, etc. But the ambitious project was shut down in 2012 [3] due to concerns regarding data privacy and protection. While researchers have warned that the lack of public support for such projects [4] due to privacy concerns has significantly hampered the progress of health care in terms of efficiency and precision through the support of AI, according to ex-Google analyst Ray Kurzweil, by 2030, AI will be able to find a cure to all ailments and man will become immortal! [5] Even if not to that extent, the key to significant advancement in this space will be the collection and collation of data in a unified data repository on patients across the world. Homo sapiens will then be able to achieve the status of Homo deus (with due respect to Yuval Noah Harari) [6] through the use and support of AI&ML technologies.

3. It is now obvious that healthcare companies will not be able to fully realize the value of data and analytics in the absence of data democratization. What hurdles have prevented the establishment of data democratization in this sector?

Ans. While addressing data privacy concerns and establishing standards for data use has a large role to play in advancing data democratization, it is just as necessary to know what data to collect and how to use that data to build effective models. For example, during the thick of the COVID-19 pandemic, numerous hospitals [7] had made data available for data scientists across the world so that they could use it to build AI models that could find a cure for the deadly disease. But this initiative failed as the hospitals were not collecting the right kind of data. To build effective AI models, the data of those patients who had survived (which was a larger percentage) was needed too. But the data that was shared was mostly of those patients who had died. This is what I refer to as the “succumber bias” [8], the opposite of survivor bias.

Bias in the process of data collection has had a big role to play in our inability to compile the right data for AI algorithms. To ensure that survivor/succumber bias does not creep into data, all medical and healthcare institutions and related governmental entities of the world will need to train all key people involved in data collection on the fundamental principles of AI&ML or allow data scientists to be closely involved in the data collection process in health care. While this may already be happening to some extent, we are yet to see in health care the same level of attention and scrutiny that is observed in the finance sector. I believe that this is primarily because of concerns about whether a day would come when an AI model, like ChatGPT, outperforms human healthcare personnel in effectively and accurately prescribing medications and curing ailments. But machines and tech can never replace human beings; they will only aid us in becoming better at what we do. We must, therefore, allow AI to fulfil its duty to humanity.

4. How can data democratization benefit from the industrialization of analytics? How important is analytics industrialization in the life sciences space and what would it entail?

Ans. Analytics industrialization is nothing but the automation of AI&ML tech and using it at scale with little or no human intervention to achieve a variety of objectives. Since this advanced AI, once deployed, can train themselves, the tech and its many benefits will essentially be available for all. Transforming the life sciences sector through analytics industrialization will require the creation of what Gartner calls ‘citizen data scientists’ [9] in the medical community. This term refers to those individuals who have an understanding of creating and deploying AI models but whose primary jobs do not fall under the discipline of data science. In order to fulfil the rising need for data scientists in the life sciences industry, it will be crucial to hire those with these skills or upskill existing healthcare professionals in this area. This means that apart from treating patients, such professionals can also help collect data for future use in AI models, which will eventually help provide far superior health care for all.

5. Democratization of data would mean access to personal and health data is made easier and more available. What do you have to say to those who are concerned about the ethical implications thereof?

Ans. Pursuing data democratization strengthens the possibility of humanity being able to cure ourselves of most diseases, delaying aging, and even preventing aging-related complications and ailments. Here, the rewards of such an outcome outweighs, by far, the potential risks of the means. I believe that when we can assuredly predict, prevent, and cure ailments, concerns regarding privacy will naturally ameliorate. But this is a philosophical take. On a more practical level, we should be able to manage healthcare data using the same standards and protocols that are used to secure the data of individuals in the financial sector. As AI models do not need to know the identity of patients during training, de-identified/anonymized data can always be used to develop these AI models.

6. Is achieving complete data democratization a lofty goal? How should one approach this objective for one’s healthcare organization?

Ans. It is undoubtedly a lofty goal, but one, as I said, that is worth far more than its weight in gold. Just as Jonas Salk made the polio vaccine available for free, all healthcare data needs to be made freely accessible for the greater good. Easy and secure data sharing by all healthcare institutions, app developers, health insurance providers, medical labs, and research centers, across international borders, will lead to remarkable progress in global health care. In a day and age where the monetization of data is a key pursuit amongst corporations, the democratization of healthcare data needs to be given top priority by the governments around the world.
The challenge lies mainly in fulfilling the steps involved in the process of unifying healthcare data: first, identifying patients using their data that is dispersed across disparate data systems; then, connecting the data pertaining to each patient from across these systems; afterwards, aggregating that data at the patient level; and finally, minimizing information loss from said aggregation. Heavy research, focus on regulatory decisions, development of innovative tech applications, and a complete redesign of data management tools would all be necessary for this. A good first step to take would be to run pilot programs. This will help us understand which data is important to collect and for what objectives; following this, full-fledged systems can be deployed to achieve said objectives in full. After all, no one can stop an idea when its time comes. For AI in health care, that time is now!


[1] https://www.linkedin.com/pulse/20140605190959-11405899-why-doctors-should-no-longer-write-handwritten-prescriptions/

[2] https://www.linkedin.com/pulse/20140605191509-11405899-doctor-s-tech-assistant-algorithms-instead-of-primary-care-physicians/

[3] https://www.mobihealthnews.com/11453/official-google-health-shuts-down-because-it-couldnt-scale

[4] https://www.healthcareitnews.com/news/privacy-hindering-ehr-progress-say-researchers

[5] https://www.livemint.com/news/world/how-humans-can-attain-immortality-in-future-ex-google-engineer-predicts-this-11680492655098.html

[6] https://www.ynharari.com/book/homo-deus/

[7] https://coronavirus.jhu.edu/map.html

[8] https://www.linkedin.com/pulse/succumber-bias-aiml-american-warplanes-world-war-ii-how-mukherjee-1f/

[9] https://www.gartner.com/smarterwithgartner/how-to-use-citizen-data-scientists-to-maximize-your-da-strategy