I am not your dataset
There is a quiet assumption embedded in many conversations about data, artificial intelligence, and predictive systems: that patterns in data tell us something definitive about people.
They don’t.
They tell us something about systems.
I know this because, according to most datasets, my life should have followed a very different trajectory.
I am a woman. I am a first generation American. I am a first generation college student. I was in the child welfare system. I have disabilities. I was a teen mother. I was a homeless teen. I am a single parent.
If you fed those variables into many predictive models used today—in hiring systems, education risk models, credit scoring, or social service forecasting—the statistical prediction would likely be bleak. The data would absolutely categorize me as “high risk,” “low probability of success,” or “structurally disadvantaged.”
And in one sense, that data would be telling the truth.
Those experiences do correlate with real structural barriers. Foster youth face enormous instability. Teen parents often encounter interrupted education pathways. Disabilities can make navigating institutions more difficult. Single parents frequently carry disproportionate economic and emotional burdens.
Those patterns are real.
But here is the problem: population-level patterns are not destiny.
And when systems built on statistical prediction begin to treat them as destiny, they quietly begin to misclassify people.
Data systems are probabilistic. They estimate likelihoods based on patterns in historical information. They might say, “People with these characteristics often experience these outcomes.” But probability is not the same thing as inevitability.
Every dataset contains people like me—people who fall outside the prediction.
In statistics we are called outliers.
Outliers are not mistakes. They are not anomalies to be ignored. In fact, outliers are often the most important data points in a system. They reveal where models are oversimplifying reality.
They expose what the model failed to capture.
Human lives contain variables that are difficult to quantify:
Resilience.
Community support.
Creativity.
Opportunity at the right moment.
A mentor who intervenes.
A decision that changes everything.
Predictive systems rarely see these things.
They see zip codes, education histories, income levels, family structure, and employment records. They compress complex lives into columns of variables and attempt to forecast outcomes.
But those columns do not capture the full architecture of human possibility.
When automated systems are used in hiring, education, lending, or social services, the danger is not just technical error. The danger is that systems begin to mistake correlation for character.
If a dataset reflects historical inequality, models trained on that data can reproduce those inequalities. Past patterns become signals of future “risk,” even when those patterns were created by structural barriers rather than individual potential.
This is why conversations about artificial intelligence governance matter so deeply.
We are not simply debating technology. We are deciding how much power statistical prediction should have over human opportunity.
Data can illuminate injustice. It can reveal patterns that demand policy solutions. It can help us understand where systems are failing communities.
But data should never be allowed to define the boundaries of a person’s future.
My life is both a confirmation and a contradiction of the data.
The hardships were real. The structural barriers were real. The statistics describing those challenges are real.
And yet the prediction would have been wrong.
That is the lesson outliers teach us: models can describe the world, but they cannot fully capture it.
Human lives are always larger than the variables used to measure them.
I am not an exception to be ignored in the dataset.
I am proof that the dataset was never the whole story.