Study design

  • State your hypothesis in the beginning
  • Select population of interest to guide your database selection
    • Ages, demographics
    • Clinical diagnosis or procedures
    • Need for clinical detail
    • Inpatient, outpatient, all spectrum of care
    • State-level or national
    • One year or across time
    • Perspective of interest – facilities, insurance providers, physicians, patients
  • Availability and costs of databases
  • Narrow your database options
  • Define cases of interest and independent variables of interest – BE EXPLICIT!
    • Learn to love codes! ICD and CPT/HCPCS codes are the clinical backbone of administrative data
    • Detailed codes review and classification important ESSENTIAL
      • Must review CODEBOOKS in depth
      • Review the ICD coding manual / CPT and HCPCS coding manual – don’t just rely on the index
      • Look at exclusions and inclusions, read the definitions carefully
      • Understand the codes you are using – look at coding guidelines if necessary
      • Talk to providers familiar with the codes you want to use
      • Always check coding guidelines when doing trends
      • Key resource: CDC National Center for Health Statistics guidelines »
    • Avoid double-counting cases
      • Similar numeric and E codes – many cases may have both
      • Only count cases once, don’t count diagnoses
    • Often use hierarchy or combination of diagnoses and procedure codes
    • When using coding algorithms from others, always review them critically
    • Consider whether you are interested in principal/primary or secondary diagnoses codes
    • Think carefully about inclusion and exclusion criteria (e.g. transfer cases, age, gender)
    • Consider potential confounders and whether you can adequately control for them
      • Clinical data elements
      • Severity adjustments – use carefully, not always a good alternative to clinical diagnoses
  • KNOW YOUR DATABASE …. READ the documentation to be sure your database can support your analysis AND how you need to define your case
    • Summary statistics (missing observations, distribution of values)
      • Presence of data element does not mean it has been validated
    • Data elements definitions and exceptions
    • Survey questionnaires
    • Data elements may or may not be present in all databases or years
  • Develop research protocol
  • BE EXPLICIT about methods