Simple Complexity: The Realities of Data Science Decision Making

7 min readSep 26, 2020

Lessons learned amidst working with a stakeholder on complexity metrics

In the midst of a pandemic, how can parents facilitate their children’s education while trying to minimize screentime but without adding “teacher” to their list of constant responsibilities?

As counterintuitive as it may sound, a web application called Story Squad is trying to provide a solution for that very problem.

So what is Story Squad, and what does any of this have to do with data science?

The Prologue

Story Squad founder and former teacher, Graig Peterson, has a goal to provide 3rd — 7th graders with a creative outlet off-screen, and this is how:

Each week, children read an installment of a story.
Then, they’re prompted to take time off-screen to handwrite a creative story response and illustration based on prompts from the story.
Photos of the story and drawing are then uploaded to the app, transcribed, and matched up with other children’s stories based on an analysis of writing complexity.
Every week each “squad” of four matched users will allot points to each others’ stories and drawings, and then vote on the winners of individual paired matchups.

Check out the intro video Graig created for the product:

Story Squad founder Graig Peterson speaks about his vision for Story Squad

Now it becomes clear where data science comes into this application: through the handwriting transcription and text analysis.

I’m currently halfway through an 8-week project cycle of working to bring Story Squad to life, with a cross-functional team of five web developers and two other data scientists.

For full transparency, our data science team’s expectation coming into the project was that we would be coding our own OCR (Optical Character Reader), and implementing a complex NLP (Natural Language Processing) analysis in Python, as a greenfield project (or one with no previous code-base) in order to meet the stakeholder’s goals.

Long story short, that’s not quite what we’ve ended up implementing.

The Story

Our primary tasks as a data science team are as follows:

Implement an OCR to transcribe children’s handwritten stories
Build some kind of writing complexity metric or algorithm
Deploy an API to create an endpoint for DS integration with the website
Future releases of the product will also include a clustering algorithm to match users into “squads” for voting

Within the first week, we learned that a previous team had begun working on the same project, but that none of their code has yet to be integrated into the project. It is our job to approach the task from scratch, and create a usable codebase.

Also within this first week, we determined that we did not have enough training data to create our own OCR, and recommended to our stakeholder that they permit us to continue with the first team’s baseline implementation of using Google Cloud’s Vision API.

To cap off our shifting understanding of the project, we realized that our first approach to a complexity metric would probably not consist of any complex NLP methods. I have been spearheading the data science team’s complexity metric approach, so it’s the decision-making process around this specific technical problem that I want to focus on.

The purpose of the complexity metric(s) is twofold:

Provide a future release of the application with some basis on which to cluster users based on complexity to form their weekly “squads”
A new feature based on a suggestion I made, which will allow parents of the child users to see children’s progress in writing complexity

We’re tackling the second use-case first, because it will be incorporated into the product on an earlier release.

Here are the requirements we are aiming to fulfill:

Identify a metric that will reliably capture writing complexity. Our stakeholder initially suggested replicating a Lexile-like score
Quantify growth that our stakeholder predicts will occur by means of exposure to reading and writing through Story Squad
Create high resilience to inevitable errors both in children’s writing and in the transcription process (in which errors are very correlated to handwriting clarity)

Of course, no data science project comes without constraints. Here are the primary constraints we’ve identified that we have to operate within:

8-week timeframe
< 200 collected stories in our dataset
Unlabeled data (i.e. no kind of score pre-assigned to training set)
The average story length in the dataset is shorter than the predicted average length of stories collected from the completed Story Squad model
The stories in our dataset represent only two of the five targeted grade levels
Complexity metrics that we could use out-of-the-box are trained on professionally edited work, some explicitly specified not to be used on student writing

The first few weeks of the project I spent entirely on planning: reading through our onboarding and product documentation, scouring the previous team’s GitHub repositories, researching writing complexity metrics, learning the Figma documents, facilitating data science meetings to create outlines and timelines, and most importantly: initiating continual back-and-forth dialogue with our stakeholder at each step of our research and planning, to ensure that our final product will be accurately aligned with his vision.

Through all of this planning, and given the goals and constraints I mentioned, our redefined scope has become clear. Instead of “code up an OCR and train a high-level NLP model to generate complexity”, our goal is to:

Select individual text analysis metrics that are the least impacted by errors in child writing and transcription from the Google Cloud Vision API, and determine what weights they should each be given in order to generate our own writing complexity formula from scratch.

This newly refined goal that I worked closely with the stakeholder to hone in on is much clearer in its scope, even if it is also much more limited than the expectations we brought into the project.

Since clarifying our scope and our steps forward, we’ve come up with a list of factors that we are going to work to incorporate into our formula. These are inspired by other well-established complexity metrics, but cherry-picked to be most appropriate for our specific context, and least impacted by transcription errors. The factors include:

average length of word
length of story
number of unique words
number of quotation marks (used as a representation of amount of dialogue)
some others, such as number of misspellings, number of complex words, and average length of sentences, are factors that are more prone to errors, but that we may work to try to include

Our stakeholder recently also gave us a sample of stories, ranked 1–25 based on writing quality, that we can use as a small sampling of labels for a rough comparison against the formula. Of course, this is still an incredibly low number of labeled data, and we know that we have a small corpus anyway. This means it will be important for us to continue to find other ways of validating our formula, such as through weak supervision techniques or relying on external research of writing complexity.

The Ending

While I never expected to spend the better part of three weeks of an eight week project planning out our approach, especially when the end result of that plan is nothing more than a linear equation, rather than a cutting edge NLP model, I standby our process and I believe we’re at the best place we could be for the project.

Our next steps are to finalize our formula, with some guidance from the small number of labels that we have, and ultimately work to create a generalizable and reliable model based on what we know of our data and of writing complexity as its already been studied. We also plan to explore weak supervision techniques in this next phase of implementation.

What I learned from this process is that, when working with stakeholders, constant communication is key. At any step of the way, we could have chosen to have started building on our assumptions or our own personal visions, and we would have created something that either wasn’t aligned with Story Squad’s goals, or that was built on a false assumption of appropriate application for our given problem. It may have taken more time to plan than we expected, but I would much prefer that to producing a product that ultimately was not usable or agreeable for our client.

Complex and fancy models aren’t always the best fit for the problem, and in this case, a simple model has become the best approach to quantify complexity.

The Epilogue

Somehow four more weeks have already passed, and our project is coming to a close! Check out our final GitHub repo, or watch our product video here:

Here’s are some highlights of steps we took and features we added in the last month:

Researched weak supervision techniques (particularly Snorkel), but ultimately found that since we were dealing with a regression, rather than a classification problem, and that we had no formalized way of providing any “objectively correct” labels, that our use case was not a good fit.
Released two versions of the Squad Score — one entirely in Pure Python, and the next using nltk as well to get a count of adjectives. We looked at the correlation coefficient with the hand-rankings the stakeholder gave us, and got a -.63, which was better than any of the out-of-the-box complexity metrics in the textstat package.
Completed visualizations for the parent dashboard to show their students’ performance:

We completed the gamification component of Story Squad with our web team! That means we generated a clustering algorithm based on Squad Score, and our web team built out the game place interface. We were really proud of what we accomplished.
Read more about all of our implemented features in our README!

Coming Soon: The Sequel

Stay tuned — our stakeholder had rave reviews for our team, and he’s invited us to stay and and continue working with his team as they work to get this product in the hands of kiddos and parents across the country this year!

Portfolio | Lori Schlatter

Hello! I'm Lori, and I'm an analytical nonprofit professional turned empathy-driven data scientist. I'm actively…

www.lorischlatter.com

lorischl-otter - Overview

I'm a curious, quick-learning data scientist with a drive for bringing my problem solving skills to collaboratively…

github.com

Simple Complexity: The Realities of Data Science Decision Making

The Prologue

The Story

The Ending

The Epilogue

Coming Soon: The Sequel

Portfolio | Lori Schlatter

Hello! I'm Lori, and I'm an analytical nonprofit professional turned empathy-driven data scientist. I'm actively…

lorischl-otter - Overview

I'm a curious, quick-learning data scientist with a drive for bringing my problem solving skills to collaboratively…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Lori Schlatter

No responses yet

Simple Complexity: The Realities of Data Science Decision Making

The Prologue

The Story

The Ending

The Epilogue

Coming Soon: The Sequel

Portfolio | Lori Schlatter

Hello! I'm Lori, and I'm an analytical nonprofit professional turned empathy-driven data scientist. ​ I'm actively…

lorischl-otter - Overview

I'm a curious, quick-learning data scientist with a drive for bringing my problem solving skills to collaboratively…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Lori Schlatter

No responses yet

Hello! I'm Lori, and I'm an analytical nonprofit professional turned empathy-driven data scientist. I'm actively…