SCENT Study: Identifying primary and recurrent cancer diagnosis with SAS computer program

Accurate identification of primary and recurrent cancer diagnoses is critically
important to clinical researchers. Traditional identification methods and electronic
diagnosis codes have significant limitations. To overcome these limitations and
further the science of clinical cancer research, Kaiser Permanente Southern California
researchers have developed a SAS-based coding, extraction, and nomenclature tool
(SCENT). SCENT uses natural language processing to identify and extract information
from the text of electronic pathology reports. The popularity of SAS statistical
software in clinical research settings will make SCENT highly accessible.

To assess the accuracy of SCENT, researchers conducted a validation study using
pathology reports of randomly selected breast and prostate cancer patients. The
tool successfully identified 97 percent (111/115) of confirmed cancer diagnoses
and produced only a few false positives (3/792). Additional information about SCENT
is available in a peer-reviewed publication at the Journal of the American Medical Informatics Association.

SCENT Program

Licensed under the Apache License, Version 2.0. See the notice embedded within the source code for additional detail. 
Execution requires access to a licensed copy of SAS software, for which the licensee is solely responsible.


 Slides presented at the 2012 HMORN conference in Seattle, WA

 Link to online JAMIA publication and manuscript detailing SCENT’s methodology and validation.


 The Clinical Concept Dictionary in both Excel and SAS formats contained in a Zip File.

    (An Excel file for consolidating anatomical site coding. A “Developmental” folder with a partially complete streamlined version of SCENT)