Data is often referred to as the world’s most valuable resource. But despite efforts to make data more open and accessible, girls remain routinely underrepresented in datasets and the technologies they drive. Only 48% of data needed to monitor SDG 5 (“Achieve gender equality and empower all women and girls”) is available, and recent reports have found AI, one of the most quickly advancing areas of tech, to be sexist.

Open Data Day 2023 asks what role open data plays in emerging technologies. Building open-source, replicable tools is the foundation to Girl Effect’s approach to technology, and has driven our work in AI from the start: our architecture for Big Sis, our first, AI chatbot for girls in South Africa, laid the blueprint for the design of Bol Behen (meaning “Speak! Sister”), our menu-based chatbot for girls in India.

As Open Data Day comes to a close, our Director of Product and Technology, Lisa Adams, and Global TEGA Tech Manager, Naijeria Toweett, discuss what more can be done to make data more open to and for girls.

Q: For those new to the concept of open data, what does it mean?

Naijeria: For me, open data is the information collected about the things that happen around us that we can access publicly.

Lisa: To me, open data is accessible to the public.

Q: A lot of AI is built using open datasets. Why is representation in these open datasets so important now?

Lisa: Representation and intersectionality within data is sadly still an issue. AI might be the next big thing but many intelligences today are powered by open datasets predominantly from the global north, which rely on data in high-resource languages (like English) and are built by male-dominated teams. All of this creates bias within the technology.

It’s well understood that sex-disaggregated data helps us make better decisions by giving us the ability to better see people who are often invisible in data, yet people are still being excluded because data stakeholders are not applying a gender lens to their research.

So how do we fix this? The team involved in the data process is really important. I recently read that by 2050, 70% of all jobs will be software jobs and only 22% of the current workforce are women.

We need people who are able to cut through intersectionalities and tell useful stories working in and with data. Otherwise you could easily overlook minorities and pack them into a larger demographic.

Naijeria: In addition to all Lisa said, I think it’s important that during the data validation phase you actually involve the people who you interviewed initially to collect this data and not make assumptions on the insights you gathered. Involvement of stakeholders in the whole analytics process can often be overlooked.

Lisa: Yes, and if we don’t get it right, the risk is poorly-designed products that don’t reach the people they are intended for. We protect against this at Girl Effect by involving people in our product design who are closest to the communities we work with and have experience of the issues discussed. For instance, when we retrain the AI in Big Sis, which enables the chatbot to recognise and respond to girls’ statements, we have a team who checks if the phrases are accurate. That team is made of South African women, gender experts and content specialists who understand the communities we work with.

Q: How else do we capture girls’ voices in our data collection at Girl Effect?

Naijeria: My question with data is always: who is collecting the data and is the community involved? If you are going to collect data about me, involve me from the beginning and help me understand the context. That’s the approach we take at Girl Effect through our Technology-Enabled Girl Ambassadors, for example. TEGA is Girl Effect’s girl-operated research tool, co-created with young women, that enables girls to collect real-time insights into the lives of their peers.

Lisa: Research methods are such an important component. What does intention and compensation mean for someone who is essentially a co-author of the research? I’ve always been so fascinated by TEGA. The programme is designed to upskill and economically empower, and each TEGA gains an internationally-recognised market research qualification from the Market Research Society. They are trained and paid accredited researchers.

Q: Finally, what particular struggles do you think NGOs face in terms of how to use, organise and understand data? What could be done to overcome them?

Lisa: Many NGOs are trying to solve the same problems. But we don’t always have the time, budget and scope to collect data so that it can be shared with each other, in a shared repository, for instance. So one of the calls-to-action for non-profits could be to say: let’s make sure we’re standardising our approaches to data for global usage

Naijeria: The challenge I see is that data is often attached to a specific project, to answer a specific question. If, for example, you were doing a project on the use of sanitary towels in Kenya, the final analysis and report would only focus on that. But there are other nuggets of information that you might leave out of the report, because it is not directly relevant to the research question. When you’re collecting data about access to sanitary towels, for example, there are many other aspects that affect a girls’ access to them. Are we reporting on this as well? We forget that a lot of data has been collected over time in various projects by different organizations.

Lisa: The other challenge we all experience working in tech in the nonprofit space, is mapping user journeys across complex digital ecosystems of multiple products and platforms, from Google to Facebook, to TikTok. It is really hard to tell if a user has asked a specific question, accessed a service provider, or is now ready for my next recommendation to support them on their journey. But it is exciting and so important. That’s why we are working on our new Data Platform Mission at Girl Effect to try to understand the unique user journeys that exist through our products and product ecosystems, and I strongly suspect that we’re going to find something that surprises us in terms of learning how to tell stories differently about data and making connection points. Ultimately we want to make sure we’re providing girls with the most relevant content and experiences possible for them.

Thank you both!

At Girl Effect, we take girls’ data very seriously. All of our tech products require active consent from girls before they engage with our content, and the data is anonymised and has no PII. We are transparent about how we will use girls’ data and have designed our consent process with data protection and youth participation experts to ensure it is as youth friendly as possible. We regularly review our terms and conditions to ensure they are up-to-date and user friendly.