Tutorial Presentors
Affiliations
Corresponding Author: Muhammad Aurangzeb Ahmad maahmad@uw.edu
Responsible AI is a fundamental requirement for application of AI and machine learning (ML) in healthcare. With the increased adoption of AI/ML in this domain, there is an growing recognition and demand to regulate AI/ML in healthcare to avoid potential harm and unfair bias against vulnerable populations. The regulatory bodies like the FDA, European Union’s GDPR, China’s New Generation AI Governance Expert Committee etc. have either promulgated or put forward regulatory frameworks for responsible AI. The challenge of implementing a responsible AI system which complies with potential regulations thus becomes a daunting task given the complexity of the problem. This is especially difficult for researchers and practitioners who are starting out or who may not have domain expertise in both healthcare and AI. A survey of the various regulations proposed by around a hundred governmental bodies and commissions as well as leaders in the tech sector like Google, Facebook, Amazon, Baidu etc. reveal that many of these proposals are short on specifics. This has also led to charges of ethics washing where guidelines for ethical or responsible AI are used as a cover to not invest in meaningful AI/ML infrastructure and systems. In this tutorial we offer a guide to help navigate through the complex regulations and explain the various constituent practical elements of a responsible AI system in healthcare in the light of proposed regulations. Additionally, we breakdown and emphasize that the recommendations from regulatory bodies like FDA or the EU are necessary but not sufficient elements of creating a responsible AI system.
In the tutorial we will elucidate how regulations and guidelines often focus on epistemic concerns to the detriment of practical concerns e.g., requirement for fairness without explicating what fairness constitutes in a given use case. We posit that responsible AI/ML in healthcare is a systems level problem. It constitutes the IT-infrastructure, the AI/ML components, healthcare delivery and services tied to the ML models, and monitoring the outcomes. Regulatory regimes have traditionally required expectation of well defined and somewhat deterministic behavior from software. AI/ML in general does not fit into this mold given the stochastic and probabilistic nature of such systems. Preliminary guidelines released by the FDA in early 2021 for AI/ML based software as a medical device (henceforth SaMD document) extends regulatory frameworks by incorporating pre-determined change control plans in software regulation i.e., defining beforehand the scope of what is likely to change in a model. In this tutorial we will go through a use case regarding what are the technical requirements for such a change control plan and what limitations it may have post-deployment.
FDA’s SaMD document and EU’s GDPR among other AI governance documents talk about the need for implementing sufficiently good machine learning practices. In this tutorial we seek to spell out what that would mean from a practical perspective for real world use cases in healthcare throughout the machine learning cycle i.e., Data Management, Data Specification, Feature Engineering, Model Evaluation, Model Specification, Model Explainability, Model Fairness, Reproducibility, checks for data leakage and model leakage. We will illustrate these with real world use cases and example code using open-source and publicly available datasets and libraries like InterpretML, ErrorAnalysis, fairMLHealth, fairLearn and AIF360. As an example, consider feature engineering which would need to address questions like, whether it is lossy vs. Lossless, how does data missingness look like, what features are available at runtime vs. training time etc.
To further illustrate, consider the problem of predicting the length of stay in a hospital. We will start with pros and cons of treating it as a classification vs. regression problem and how the academic literature addressing this problem mostly ignores how insights derived from such prediction models are used in practice e.g., for measuring model performance MAE for regression or precision and recall for classification on the whole model is not really useful. Optimizing for length of stay prediction may end up deoptimizing for risk of readmission to the hospital for the population as whole and thus detrimental if taken in isolation. Lastly, model performance disparities across racial and ethnic groups may reveal systematic problems that would need to be rectified via mitigation strategies that involve clinicians, discharge planners and vulnerable populations who may be negatively impacted.
We note that conceptualizing responsible AI as a process rather than an end product accords well with how AI/ML systems are used in practice. AI governance documents like FDA’s SaMD strongly recommend taking a stake-holder centric view of AI/ML problems. Consider a disease prediction model: A clinician may want to optimize for a model that maximizes overall performance, a patient would want to know that since they belong to a minority group how likely is it that they would be scored incorrectly by the model. Creating equitable models may require balancing requirements for multiple competing optimization criteria. The creation of fair and unbiased models may require agreeing upon which notion of fairness (Statistical Parity vs. Equalized odds vs. Sufficiency) is applicable. The same applies for model transparency, we will discuss a taxonomy of use cases in healthcare AI/ML where we discuss the trade-off needed to balance between post-hoc explanations, fairness measurements and practical constraints of model deployment.
While governance documents the need for addressing real world issues related to model performance, they hardly go into details, In the tutorial we will discuss and illustrate use cases related to model performance in training vs. production, real world usage monitoring and how that should be used as feedback to enhance the models. To summarize, the focus of the tutoria is on responsible AI/ml in healthcare is a systems level phenomenon which requires compliance at the level of the IT infrastructure, technical aspects of AI/ML system, the services and outcomes that may be associated with the intended use case, and the human factors.
Muhammad Aurangzeb Ahmad is a Research Scientist at KenSci Inc. a Machine learning/AI healthcare informatics company focused on risk prediction in healthcare. He is also Affiliate Associate Professor in the Department of Computer Science at University of Washington Bothell. He has had academic appointments at University of Washington, Center for Cognitive Science at University of Minnesota, Minnesota Population Center, and the Indian Institute of Technology at Kanpur. He has published more than 50 research papers in top machine learning and data mining conferences KDD, AAAI, SDM, PAKDD etc.
Dr. Carly Eckert MD, MPH, is a preventive medicine physician and epidemiologist. Carly has worked in industry for 4 years, collaborating closely with data scientists developing and implementing machine learning solutions in healthcare. Carly is the Chief Clinical Officer at Greenlight Ready where she works on last-mile implementation for public health practice. She is also a clinical advisor at KenSci and a doctoral student in epidemiology at University of Washington where she studies transfer learning in trauma outcomes prediction.
Christine received her MS in Computer Science and Systems from the University of Washington after starting her data career as an analyst and researcher at the Institute for Health Metrics and Evaluation (IHME). Her work has been published by the Lancet, the Journal of the American Medical Association (JAMA), JAMA Oncology, the Lancet Neurology, and others. He current focuses are ML interpretability and responsible AI in the clinical domain.
Vikas Kumar is a Data Scientist working at KenSci. In this role, Vikas works with a team of data scientists and clinicians to build consumable and trustable machine learning solutions for healthcare. His focus is in building explainable models in healthcare and application of recommendation systems in clinical settings. Vikas holds a Ph.D. with a major in Computer Science and minor in Statistics from the University of Minnesota, Twin Cities. He has worked on modeling and application of recommendation systems in various domains, such as media, location, and healthcare. His focus has been to interpret the balance users seek between known (or familiarity) and unknown (or novel) items to build adaptive recommendations. Prior to his Ph.D., he completed his Bachelor’s at the National Institute of Technology, India and worked as a software engineer in Microsoft India.
Ankur Teredesai is a Professor in the Department of Computer Science at University of Washington Tacoma, and founder and director of the Center for Data Science at University of Washington. He is also the founder and CTO of KenSci, a vertical machine learning/AI healthcare informatics company focused on risk prediction in healthcare. Professor Teredesai has published more than 70 research papers in top machine learning and data mining conferences like KDD, AAAI, CIKM, SDM, PKDD etc. He is also the information officer of KDD., Ankur Teredesai Deep Explanations in Machine Learning via Interpretable Visual Methods PAKDD 2020 Singapore June 11-16, 2020