The National Cancer Institute (NCI) recently announced that it had removed all prostate specific antigen (PSA) data from the SEER (Surveillance, Epidemiology and End Results) and SEER-Medicare programs.
The PSA data were removed after quality control checks revealed that a substantial number of PSA values included in the programs were incorrect. An editorial published in The Journal of Urology explores the ramifications of the removal of these data for researchers, clinicians, and administrators within the health care community, as well as the use and accuracy of large administrative datasets in general.
The SEER program, initiated by NCI in 1973 and one of the oldest and most highly regarded cancer registries in the world, is legislatively mandated to collect cancer incidence and survival data from 17 population-based cancer registries across the United States, representing roughly 28% of the U.S. population. The SEER-Medicare dataset links the cancer information in SEER to administrative claims data for patients in SEER covered under the Medicare program.
David F. Penson, MD, MPH, Director of the Center for Surgical Quality and Outcomes Research, Professor and Chair, Department of Urologic Surgery at Vanderbilt University, and the VA Tennessee Valley Geriatric Research, Education, and Clinical Center, Nashville, TN, cautions that withdrawal of these data from SEER will have two major impacts on the field of prostate cancer research.
“First, ongoing analyses using SEER and SEER-Medicare that include PSA data will have to be redesigned in light of the problems with these data. Simply put, journals will not be able to accept SEER studies that rely on the PSA data as a primary variable of interest, including those that use PSA in risk stratification systems to adjust for confounding or in cohort identification. This effect is relatively straightforward and should not cause great problems in the field going forward.”
According to the author, “The greater problem, however, is the impact of the flawed PSA data on the existing urological literature. SEER and SEER-Medicare data have been used to address questions about screening and effectiveness of treatments for localized and advanced disease. How can we now trust these studies given the problems with the PSA data?”
Dr. Penson cautions that while large administrative databases like SEER have tremendous value when answering difficult clinical and health care policy questions if used properly, researchers should reconsider publishing secondary data analyses just because the data are relatively easy to obtain and analyze. “We cannot ask these datasets to answer questions that they are not capable of answering. In that situation we have to do the really hard work and collect primary data. It’s time for us to stop doing big data fishing expeditions and taking the easy way out.”[hr] Reference: “The Power and the Peril of Large Administrative Databases,” by David F. Penson. DOI:dx.doi.org/10.1016/j.juro.2015.05.002. Published online in advance of The Journal of Urology, Volume 194, Issue 1 (July 2015)