The special challenges of data analytics with health care

Data analytics is a challenge for businesses in all industries. Organizations often struggle with issues around data storage and access, data quality, data integration, pipeline reliability, security, and privacy. But the health care industry faces more challenges than most, in areas such as privacy and security, data retention, and data management:

Privacy and security are of particular importance for health care businesses. Successful attacks on health care data can be extremely lucrative for criminals and extremely damaging for organizations. And the financial costs of data breaches may be just the beginning – reputational costs are harder to measure but may linger for long periods. And individuals whose data is stolen may suffer most of all, since health records contain personal data ranging from credit card numbers to details about diagnoses and lab tests, raising threats of identity theft and even blackmail.

In the US, the Health Insurance Portability and Accountability Act (HIPAA) prescribes how health care businesses must protect their data, whether they store it on their own premises, in shared data centers, or in SaaS applications.

Data retention: Health data must stay accessible for at least five years. That means businesses need to take a long-term approach to data stewardship and keep track of when the data gets accessed, by whom, and for what purpose. Medical data management software allows users to establish access privileges and processes, such as those that give temporary data viewing capabilities to representatives in different departments in a hospital. These products can index data and notes and track when data entered the system. Organizations must put processes in place to periodically sort through the data to delete it when appropriate, or modify and anonymize it to use it in new ways, such as to gauge trends across several years.

Data management: Health organizations face big-data-related challenges that can impact patient safety. All data that health organizations collect needs to be described, formatted, deduped and checked for accuracy, and made accessible for various uses — medical, billing, administrative — and the volume and velocity of big data makes this task more difficult.

Some hospitals now employ patient safety experts, but in addition to having medical expertise, people in these roles must understand how data management practices can improve or hinder patient safety.

Data accessibility: All data management strategies fall short if they don't result in content that's accessible and in the correct format for reporting. Data analysts must be empowered to access the data they need and share what the data reveals.

And more: Other difficulties may arise in specific facilities. For example, if a hospital has a tiny IT budget, making data analysis improvements may require making small changes that result in gradual progress. Knowing those limitations allows for coming up with solutions that make sense.

Improvement starts with awareness

Health care organizations need to be aware of these obstacles and research which tools and strategies can help minimize them.

Cloud data warehouses are attractive options for businesses that run analytics against data from multiple sources. Cloud data warehouses like Amazon Redshift, Google BigQuery, and Snowflake can scale compute and storage resources with latency measured in seconds or minutes, so businesses can get just the performance they need when they need it – there's no need to provision for peak load when the cloud can autoscale on demand.

A cloud data warehouse's ability to scale has another benefit: Data engineers can skip preload transformations and load raw data into the data warehouse, then define transformations in SQL and run them in the data warehouse at query time as needed. A centralized raw data store can support different transformations for different tools, analyses, and business processes.

Getting data into a cloud data warehouse is the role of extract, transform, and load (ETL) tools like Stitch. To meet HIPAA requirements, an ETL solution must encrypt data both at rest and in transit, provide access logs, and offer other features to guard protected health information (PHI). Stitch's ETL platform is HIPAA-compliant.

Of course businesses can use ETL on data of all kinds. Organizations that are just ramping up their data analytics infrastructure can start by replicating their marketing or advertising data to a data warehouse, and once they're comfortable with the process they can add sources that contain PHI. But even when health care businesses start with other kinds of data, they should know that their ETL provider can support HIPAA compliance when their use case grows.

If you're a health care business that's beginning its journey with data analytics, read about setting the data strategy for your organization, and sign up for Stitch to try it for free for 14 days.

Image credit: Bob Nichols