Goal
To make timely information about public health operations and community health more readily available to people making decisions that affect the community's health, including program managers, public health leadership, community partners, policy makers, and others.
Objectives
To establish processes and systems which:
- foster information "self-service";
- preserve an information audit trail;
- speed and simplify epidemiologists' work in generating information;
- increase epidemiologists' productivity;
- improve the quality of information;
- minimize duplication of effort; and
- minimize disruptions to productivity from position turnover
Tools
You can implement these tools one (or a few) at a time; I would not try to implement them all at once. Some of them, like analytic datasets, can be built upon over time, so you can start small. I keep the "build for today and next year" principle in mind, making small improvements, but in a way that can be sustained and built upon.
We continue to develop all of these tools, adjusting them as we figure out how do things better.
Analytic data sets
Rather than having data analysts start with the raw data sources, our analysts use well-organized, standardized "analytic data sets" which we create from the raw data. For each data source, we have a "data prep" program that automatically runs on an appropriate schedule to add new data to that data source's analytic data set. Then data prep program
- runs quality checks on the data and generates reports, email messages, and "unexpected value" data sets to preserve and communicate the data quality results,
- assigns standard names and values for variables the curtain multiple data sets, such as race or gender
- converts variables into standard types with standard values that are easy to use in analyses, such as (0, 1) for ("Yes", "No") type variables.
- assigns meaningful variable names, labels, and data value formats to each variable in the analytic data set, and
- labels the dataset, makes it read-only, and stores it where all analysts can access it.
The data prep programs can include code that addresses many of the subtleties and complexities in our data sources. For instance, we drop duplicate or invalid records, keeping only the most accurate record, we set unexpected values to missing, and we add useful computed variables like BMI or the Kotelchuck index. This results in simpler analysis programs, more consistent results across analysts, and helps them avoid the kind of errors that occur when someone is using a data set that is not as straightforward as it appears.
Poor man's data warehouse
All of our frequently used data sets are stored as analytic data sets in one place, accessible to all of our analysts. Most data sets are not linked together yet, although we are working toward that. But at least all analysts have access to shared, core datasets, documented and set up for easy analysis, and automatically updated frequently (usually daily or monthly).
We currently store our analytic data sets as SAS data sets, but we are assessing whether to store them in an SQL database, so that they are more readily accessible via software other than SAS.
Limited number of tools
In order to foster shared expertise and to assure that one epidemiologist can readily pass their work to another, we try to limit the number of software tools we use.
- We try to assure that any new data system implemented within the agency allows backend access to the data be ODBC, so that we can easily integrate it into our current data flow.
- We encourage programs to use Crystal Reports or Viya (see " Interactive reporting tool, suitable for the computer-shy" below) for getting data from their systems, rather than using reporting tools within each system, so that we can more easily support them and integrate data across programs.
- Within epidemiology, everyone uses SAS or R, and within R, we try to use the same set of packages, so that we can understand and use each other's code.
If it makes a lot of sense to use some other tool, we might, but we have a strong bias toward figuring out how to get things done with our main tools, rather than using something new.
Interactive reporting tool, suitable for the computer-shy
One of best things we've done to P decision support interagency is to implement a user-friendly, interactive reporting tool, which allows staff throughout the agency to view and analyze data about their activities. The tool allows them to cut their data in almost any way they would like, to drill down to record level information, and to create dashboards. We have put a notable amount of effort into developing the tool, and training agency staff and how to use it, and creating initial reports for them to use. Ongoing maintenance and training takes about ¼ FTE. Epidemiologist who support specific programs, like Chronic Disease, also support that program's use of the tool. We recently switched from using Futrix (which is no longer available) to using SAS Viya.
Data request ID ("DR number")
The "DR number" is the spine that holds our operations together. Each work product we produce is assigned a data request ID (the "DR number"), which is identifies that product. The DR number is included in the filename of almost any file associated with that request, such as analytic program, the program long, program output, or documents containing results. The DR number is also included in the results, such as as a footnote on any graphs, or in page footers of documents. So in someone has a question about something we have produced, we can ask them to look for the DR number on that result, and then quickly find the program that produce that result or other relevant information.
Data request database
Our data request database simplifies use of the DR number and preservation of an information audit trail. When we receive request for information, the request is logged in the data request database, which generates the unique DR number. The database includes a button which creates a folder with the DR number, creates some program templates within that folder, and creates a parallel folder in our code repository.
The database includes tools to help us lookup past requests that might be similar to the current request, and to find analytic code, output, and other work products from those past requests. Often, rather than creating a new request, we use the database to find something we've produced already that satisfies the current need.
The database also helps us assess our productivity, operations, and distribution of clients, to help us with continuous quality improvement.
The data request database was created is Microsoft Access. A version set up to be deployed by other analysis groups is available at https://drive.google.com/file/d/1ic8lzUNSCMTqPjbbMPCoy-U_ZQFDVqQn/view?usp=sharing
Version control for analysis programs
We use Subversion to maintain a history of changes to each of our analytic programs. Subversion is a freely downloadable, open source version control system (see https://subversion.apache.org/). This is part of our "information audit trail". Through it, program versions that produced old work products can be restored, and we can quickly recover from erroneous changes to analytic programs. Good use of version control requires staff training and cultivating good version control habits.
Structured work product storage
Files associated with each work product are stored in that work product's system folder, with the folder name beginning with the request's DR number. All of the DR folders are e place, being subfolders of our Data Requests folder.
A folder with the same name is created on our Subversion version control server, where the analytic program files are then archived.
Automated request folder setup
Creating system and version control folders adds steps that tempting to skip, when you get an urgent data request. Our data request database includes a button which creates those folders, and creates some program templates within the system folder, so we can minimize the "overhead" required to conform to our group standards.
Standard analysis program headers
We have a standard header that which we include in each of our SAS or R analytic programs. The header makes it easier for coworkers to understand each other's programs, automates the inclusion of the DR number in results, automates saving the log file, and prompts other standard programming practices we have adopted. It also includes questions to complete for a code audit.
Preserve program logs
Our standard headers automatically generated analysis program log files, documenting how the output was created, and any potential errors. These are saved in that date request's system folder.
Code audits
For the most part, each analytic program is reviewed by another member of the Epidemiology staff, as a quality check. This helps identify errors, assures conformance to the group standards, and, most of all, helps us learn coding skills from each other, speeding how quickly folks in the group move up the learning curve.
As with other processes, we tested and modified the audit process a few times to trim it down to its essentials, and continue to modify it to keep it quick but valuable.
Semi-annual training
Twice a year the Epidemiology group meets to review our standard processes, to assure that everyone understands them the same way, and did discuss how we can trim or otherwise improve them. Even after several years, the sessions always produce changes in our processes or follow-up sessions for folks who want more focused training.
Orientation of new staff
Our checklist for orienting new staff includes several items to orient the staff to some of the tools and processes described here. It also includes reviewing a fairly extensive document describing our processes, and another document with information about our main data sets.
Thorough orientation and periodic re-training in the group's standard processes and the use of the group's core tools.
Consumer input
Over the years, we've had several attempts to formalize consumer input about the value of our work products and our services in general. We haven't found a way to do this it's been both informative and easily sustained. We currently have at least annual, informal discussions with key customers, and (sometimes) send out an annual survey. We have tried per-work product surveys, but haven't maintained them, gotten a response rate, nor found them to be very informative.
Rules versus Guidelines
As they said in Pirates of the Caribbean, these are "more what you'd call 'guidelines' than actual rules." We try to use these tools and processes consistently, but staff have leeway to do what they think is best.
Tools per objective
- foster information "self-service": Interactive reporting tool
- preserve an information audit trail: Data request ID, Data request database, Version control for analysis programs, Preserve program logs
- speed and simplify epidemiologists' work in generating information: Analytic data sets, Poor man's data warehouse, Data request database
- increase epidemiologists' productivity: Data request database, Automated request folder setup
- improve the quality of information: Analytic data sets, Poor man's data warehouse, Consumer input
- minimize duplication of effort: Analytic data sets, Limited number of tools, Standard analysis program headers
- minimize disruptions to productivity from position turnover, Limited number of tools, Structured work product storage, Standard analysis program headers' Code audits, Semi-annual training, Orientation of new staff