Introduction to Responsible AI: Unpacking bias
Part of what makes Responsible AI difficult is the vast set of ideas, theories, and practices that it interacts with. The CallMiner Research Lab unpac...
November 16, 2021
Every day, it seems like there’s another company in trouble for a model that they’ve built and their approach to Responsible AI.
You can look at Amazon's 2018 scandal that involved using AI to select resumes that would be good fits for the company – that happened to include only men. In 2020, Uber's failure to recognize pedestrians outside of a crosswalk put them in the hot seat with respect to the moral implications of a self-driving car. Between December 2020 and March 2021, Google reportedly fired the two co-leads of their AI Ethics team over concerns that their research might impact the company.
The list of these scandals could go on and on – but they’re not the only concern. While no company wants to be in the news for something negative, there are also real implications about what happens when a model prediction changes someone’s life. These range across the spectrum of concerns, from the 2016 ProPublica pushback against machines that gave Black people longer jail sentences than other races to the Apple Card that gave men higher credit limits than women with identical qualifications.
In the last few years, nearly every large company developing AI that touches people has aimed to develop some sort of framework through which to discuss the ethical implications of their projects. Some, like the guidelines released by the AI Ethics group led by VDE, focus on how to implement a company’s AI values into quantifiable and implementable metrics. Others, like those released by the Software Engineering Institute at Carnegie Mellon University or implemented in Deon, a python package, create a checklist to help data scientists think about ethics throughout the development process. More still, like those from the Association for Computing Machinery, act more like a set of rules that all members must adhere to.
Interestingly, nearly all of the examples above, and many more that we looked at, focus on looking at the ethics and responsibilities involved throughout the development process. While important and in many ways related to our work, we’ve found that the research lifecycle works differently than that of development.
We, at the CallMiner Research Lab, understand that an approach to ethics in our space means we need to address potential ethical concerns at each stage of the process. We also understand that each stage of the process has different goals, and while we need to consider all possible ramifications, it might not always be necessary to solve every problem. To that end, we have developed a framework for acknowledging the potential ethical questions that arise at each stage of the process.
Our framework was developed as a team with diverse perspectives and is heavily based on the many examples in industry, as well as our internal steps of the research process. After spending time learning about many of the concerns in the field and reviewing existing material, our team decided that the best framework for us would be one that allowed us to have meaningful and constructive conversation around our findings, concerns, next steps, and potential adversarial actions. This allowed us to make sure each member of the team was heard.
The work from our team gets passed on to others before it gets used in product or with a customer, and it was imperative for us that our findings, assumptions, and concerns be documented in a form that was easily passed between teams. This allows us to create the necessary guardrails and interfaces to ensure that tools are used for their intended purpose, and helps us educate others about any concerns or warnings we may have.
Our framework consists of a minimal set of questions for each step of the lifecycle, helping guide our conversations and ensure we’re thinking about those things that might not be obvious to us (or in many cases, are so obvious that we wouldn’t think to talk about them).
Here, we outline the definitions and concerns of each stage, as well as some of the driving questions around Responsible AI. These aren’t supposed to be all encompassing, but rather serve as a starting point. We have found that our framework leads to asking much deeper questions about our tools, models, and datasets – which is exactly what it is meant to do.
In research, a proof of concept (POC) helps us decide whether a path of research is even worth pursuing. By definition they are small scale, restrictive, and work on limited scope and data. They also happen quickly so that a researcher’s time can be spent where it will be most impactful. In developing our framework for Responsible AI, we didn’t want to change what doing a POC would mean, but we did want to begin the process of thinking about possible concerns, implementations, and documentation as early as possible.
At this stage, some of our key questions are:
In asking these questions early and often, we are able to achieve a few goals. First, we begin to concretely decide how much effort goes into doing the research “right” the first time. If a tool works well, but requires a ton of research to be accurate and responsible at production scale, it may not be a good research project for us to take on. Second, identifying these assumptions and potential concerns early often drives the next set of research tasks if the project continues.
In this stage, a researcher has decided that it is worth moving forward with a POC. Now, we begin to consider features to be used in our model, how we will tune it, and the impacts of more or different data on the model or tool. Often, we’ve identified a use case for the tool, and we will begin evaluating different potential setups to improve the accuracy of the model.
Some of our key questions at this stage are:
In this stage, we also make sure that we’ve captured any and all decisions that we’ve made that might impact model performance, such as data availability or steps that we skipped or simplified due to research time concerns.
These questions mean that sometimes a project will stay here for a period of time, and we try to go back and address questions we’ve added or left hanging through each time we go over our framework so we have a complete picture of what’s happened, what we’ve learned, and where we need to be going.
While we don’t have a good name for this step internally, its definition is straightforward. At this stage, something else in research or product is or will be relying on the outputs of this model. Now that someone is relying on it, any hypothetical concerns identified in steps 1 and 2 are now tangible and might have real consequences. At this stage, we focus on how other people use and understand our tool.
Some of the key questions are:
Here, we focus of some of the more tangible aspects of Responsible AI. If someone disagrees with our model, we need to be able to explain why it happened (in a way that everyone understands), and then override it as necessary. This is the last step where changes along these lines are easy to handle, so this conversation is incredibly important. That said, these concerns continue to be prevalent throughout the rest of the process as the tool grows in size and scope.
At this point, the researcher knows that their tool will be moving into customer hands sooner or later. This means that we need to test our tools on a much larger set of data to confirm that our model will work on the vast number of clients, industries, and use cases that customers might use.
Here, our questions tend to focus on what happens when we scale our model. Some examples are:
It is crucial that the research team accounts for what it means to move to “production scale” tools. We will not have the same amount of granular control once handoff has completed, nor will we be there to help customers understand the model all the time. These questions are intended to start a conversation around these practical steps.
Now that product is finally ready for our tool, we pass it off to the developers, architects, and other teams that will help our tool become reality. While this is exciting, it’s also the final test for our process. We need to make sure that we’ve documented everything we’ve discussed about possible concerns and how to handle them, and make sure that the new teams feel prepared to continue our work. These questions include:
After we complete our work on the research and handoff our work to product, the work on Responsible AI is not finished. Building and maintaining Responsible AI is everyone’s responsibility, and the process cannot end with us. Product and other future users rely on the work that we’ve done to implement tools responsibly, and it’s up to all of us to continuously monitor, revise, and reconsider the implications of our tools and their output. As a product-facing research team, we know that we cannot control how end users ultimately use our tool on our data, but we can do our part to ensure that it is easy and intuitive to understand and use our tools.
We know that we don’t have perfect answers every time, and we know that our work doesn’t happen in a bubble. For everything that we research, we talk through the questions for each section as a team. By thinking about building Responsible AI early and often, it becomes easier over time for us to catch and address possible problems. As the body of research around Responsible AI continues to grow, and as the field continues to adapt to the societies around us, it is imperative that we continue to reevaluate our existing and new tools so that we can catch potential problems before they arise.
To apply this framework within your own team, download our worksheets that will help you acknowledge the potential ethical questions that arise at each stage of the process.