Blog Home

Introduction to Responsible AI: The CallMiner Research Lab Responsible AI Framework


Micaela Kaplan

November 16, 2021

Responsible AI framework
Responsible AI framework

Every day, it seems like there’s another company in trouble for a model that they’ve built and their approach to Responsible AI.

You can look at Amazon's 2018 scandal that involved using AI to select resumes that would be good fits for the company – that happened to include only men. In 2020, Uber's failure to recognize pedestrians outside of a crosswalk put them in the hot seat with respect to the moral implications of a self-driving car. Between December 2020 and March 2021, Google reportedly fired the two co-leads of their AI Ethics team over concerns that their research might impact the company.

The list of these scandals could go on and on – but they’re not the only concern. While no company wants to be in the news for something negative, there are also real implications about what happens when a model prediction changes someone’s life. These range across the spectrum of concerns, from the 2016 ProPublica pushback against machines that gave Black people longer jail sentences than other races to the Apple Card that gave men higher credit limits than women with identical qualifications.

In the last few years, nearly every large company developing AI that touches people has aimed to develop some sort of framework through which to discuss the ethical implications of their projects. Some, like the guidelines released by the AI Ethics group led by VDE, focus on how to implement a company’s AI values into quantifiable and implementable metrics. Others, like those released by the Software Engineering Institute at Carnegie Mellon University or implemented in Deon, a python package, create a checklist to help data scientists think about ethics throughout the development process. More still, like those from the Association for Computing Machinery, act more like a set of rules that all members must adhere to.

Interestingly, nearly all of the examples above, and many more that we looked at, focus on looking at the ethics and responsibilities involved throughout the development process. While important and in many ways related to our work, we’ve found that the research lifecycle works differently than that of development.

We, at the CallMiner Research Lab, understand that an approach to ethics in our space means we need to address potential ethical concerns at each stage of the process. We also understand that each stage of the process has different goals, and while we need to consider all possible ramifications, it might not always be necessary to solve every problem. To that end, we have developed a framework for acknowledging the potential ethical questions that arise at each stage of the process.

Our framework was developed as a team with diverse perspectives and is heavily based on the many examples in industry, as well as our internal steps of the research process. After spending time learning about many of the concerns in the field and reviewing existing material, our team decided that the best framework for us would be one that allowed us to have meaningful and constructive conversation around our findings, concerns, next steps, and potential adversarial actions. This allowed us to make sure each member of the team was heard.

The work from our team gets passed on to others before it gets used in product or with a customer, and it was imperative for us that our findings, assumptions, and concerns be documented in a form that was easily passed between teams. This allows us to create the necessary guardrails and interfaces to ensure that tools are used for their intended purpose, and helps us educate others about any concerns or warnings we may have.

Our framework consists of a minimal set of questions for each step of the lifecycle, helping guide our conversations and ensure we’re thinking about those things that might not be obvious to us (or in many cases, are so obvious that we wouldn’t think to talk about them).

Here, we outline the definitions and concerns of each stage, as well as some of the driving questions around Responsible AI. These aren’t supposed to be all encompassing, but rather serve as a starting point. We have found that our framework leads to asking much deeper questions about our tools, models, and datasets – which is exactly what it is meant to do.

Step 1: Proof of Concept

In research, a proof of concept (POC) helps us decide whether a path of research is even worth pursuing. By definition they are small scale, restrictive, and work on limited scope and data. They also happen quickly so that a researcher’s time can be spent where it will be most impactful. In developing our framework for Responsible AI, we didn’t want to change what doing a POC would mean, but we did want to begin the process of thinking about possible concerns, implementations, and documentation as early as possible.

At this stage, some of our key questions are:

  • Who are we leaving out of this first test set? Where might our edge cases be? Are there assumptions related to the language that we are testing (e.g. sentence structure) that might prevent this from working in other languages?
  • How expensive would it be to get an inclusive and representative data set and to evaluate the model’s performance? From a financial standpoint? From an environmental standpoint? From a time standpoint?
  • What metrics am I using to prove success? Is there confirmation bias in that metric?
  • Who are the stakeholders in this project? Who might we need to talk to about the use case, tangible impact, and implementation of this tool?

In asking these questions early and often, we are able to achieve a few goals. First, we begin to concretely decide how much effort goes into doing the research “right” the first time. If a tool works well, but requires a ton of research to be accurate and responsible at production scale, it may not be a good research project for us to take on. Second, identifying these assumptions and potential concerns early often drives the next set of research tasks if the project continues.

Step 2: Design

In this stage, a researcher has decided that it is worth moving forward with a POC. Now, we begin to consider features to be used in our model, how we will tune it, and the impacts of more or different data on the model or tool. Often, we’ve identified a use case for the tool, and we will begin evaluating different potential setups to improve the accuracy of the model.

Some of our key questions at this stage are:

  • Are any features a proxy for socially sensitive features (e.g. location = socioeconomic status)?
  • Is our data an accurate, inclusive representation of our users? Are there any biases in the model’s output? What groups don’t get reliable outputs?
  • What common features are in our expanded dataset (distributions)? What outliers exist? Have we made a specific effort to find a potentially biased feature?
  • How could a bad actor use this model? Could the model or its outputs be manipulated to cause any form of harm? What if I give my model a skewed dataset?

In this stage, we also make sure that we’ve captured any and all decisions that we’ve made that might impact model performance, such as data availability or steps that we skipped or simplified due to research time concerns.

These questions mean that sometimes a project will stay here for a period of time, and we try to go back and address questions we’ve added or left hanging through each time we go over our framework so we have a complete picture of what’s happened, what we’ve learned, and where we need to be going.

Step 3: Model is Being Used

While we don’t have a good name for this step internally, its definition is straightforward. At this stage, something else in research or product is or will be relying on the outputs of this model. Now that someone is relying on it, any hypothetical concerns identified in steps 1 and 2 are now tangible and might have real consequences. At this stage, we focus on how other people use and understand our tool.

Some of the key questions are:

  • Is the model explainable? How do we explain it? Can all users of the system understand the explanation (data scientists and people who are not data scientists)?
  • What is the cost/harm/consequence of a False Positive? A False Negative?
  • How might these outputs be used in other use cases? What harm might they cause? Be very diabolical!
  • Have we put in safeguards and/or identified best practice to prevent misuse?
  • How “in the loop” is the human in the loop? Are they just a rubber stamp, or are they evaluating the output? What would happen if there is no human (this was Robotic Process Automation)?

Here, we focus of some of the more tangible aspects of Responsible AI. If someone disagrees with our model, we need to be able to explain why it happened (in a way that everyone understands), and then override it as necessary. This is the last step where changes along these lines are easy to handle, so this conversation is incredibly important. That said, these concerns continue to be prevalent throughout the rest of the process as the tool grows in size and scope.

Step 4: Preparing for Product

At this point, the researcher knows that their tool will be moving into customer hands sooner or later. This means that we need to test our tools on a much larger set of data to confirm that our model will work on the vast number of clients, industries, and use cases that customers might use.

Here, our questions tend to focus on what happens when we scale our model. Some examples are:

  • What are we sacrificing for speed and efficiency? Does this have adverse effects on our data or explainability? On our ability to understand complexities of the data/outcomes on that data?
  • Is this intended to be a universal model? How is it performing on data different from what it was trained on? What new edge cases appear in scaling?
  • Have we established a clear way of evaluating the outputs of our model? What would happen if someone misinterpreted this output? What harm might that cause? How can we prevent it?
  • Have we clearly defined the limitations for use cases for this model? What about the metrics for success?

It is crucial that the research team accounts for what it means to move to “production scale” tools. We will not have the same amount of granular control once handoff has completed, nor will we be there to help customers understand the model all the time. These questions are intended to start a conversation around these practical steps.

Step 5: Product

Now that product is finally ready for our tool, we pass it off to the developers, architects, and other teams that will help our tool become reality. While this is exciting, it’s also the final test for our process. We need to make sure that we’ve documented everything we’ve discussed about possible concerns and how to handle them, and make sure that the new teams feel prepared to continue our work. These questions include:

  • Have we passed all of our knowledge of ethical concerns to the other teams? Does the person who is consuming the knowledge understand it?
  • How will we educate our users about these possible harms? Are there ways for our users to lessen the possibility of these harms?
  • Will humans be in control of the use of the output of this model? Can we override model decisions?

What’s Next?

After we complete our work on the research and handoff our work to product, the work on Responsible AI is not finished. Building and maintaining Responsible AI is everyone’s responsibility, and the process cannot end with us. Product and other future users rely on the work that we’ve done to implement tools responsibly, and it’s up to all of us to continuously monitor, revise, and reconsider the implications of our tools and their output. As a product-facing research team, we know that we cannot control how end users ultimately use our tool on our data, but we can do our part to ensure that it is easy and intuitive to understand and use our tools.

We know that we don’t have perfect answers every time, and we know that our work doesn’t happen in a bubble. For everything that we research, we talk through the questions for each section as a team. By thinking about building Responsible AI early and often, it becomes easier over time for us to catch and address possible problems. As the body of research around Responsible AI continues to grow, and as the field continues to adapt to the societies around us, it is imperative that we continue to reevaluate our existing and new tools so that we can catch potential problems before they arise.

To apply this framework within your own team, download our worksheets that will help you acknowledge the potential ethical questions that arise at each stage of the process.

North America EMEA CallMiner Research Lab Artificial Intelligence