Quickly onboarding vast amounts of new data to start generating meaningful insights
The Zephyr platform ingests data from countless unique data sets, with unstandardized formats and codes. The more high-quality, accurate data that Zephyr can use, the better the outcomes they can provide to their clients. As Amy Sheide, vice president of data platform and partnerships, summed it up, “More data, more better.”
However, Sheide noted, “When the data lands, it’s very disorganized and algorithms can’t make sense of it as is.”
Her colleague, data specialist Samantha Pindak, added, “Clinical data is messy, and nobody uses the same codes. In the clinical data space, EHR records are not standardized. The more data we see in different spaces regarding different diseases, there’s going to be a lot more work with obscure coding systems.”
Before Zephyr can begin applying its machine learning expertise, it must first standardize the data. That means:
- Mapping text strings to standard codes
- Validating any codes that were sent
- Confirming formatting
- Knowledge modeling for implicit features
Zephyr’s “small but mighty” data team of three data specialists needed help. Sheide said, “We’re not going to be passing spreadsheets around. We’re not uploadingand downloading. We wanted our workflow to be in the tool.”
The team also wanted to use its own APIs to protect data security and create efficiencies for their engineering and data science teams.
However, with Zephyr’s plans for rapid growth, the data team must onboard and standardize new customer data to be applied across models — at scale. Since machine learning models are only effective when they receive enough data to learn from, that scaling must happen quickly to enable Zephyr to start generating useful insights for new customers.
“We needed a solution that allowed us to leverage our content management expertise, meaning it needed to be easy and flexible,” Sheide said. This means letting Zephyr’s terminologists and data specialists manage custom code sets, generate their own mappings, and build their own mapping algorithms.