Consensus Segmentation for Positron Emission Tomography: Development and Applications in Radiation Therapy

Thumbnail Image




Das, Shiva K

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



The use of positron emission tomography (PET) in radiation therapy has continued to grow, especially since the development of combined computed tomography (CT) and PET imaging system in the early 1990s. Today, the biggest use of PET-CT is in oncology, where a glucose analog radiotracer is rapidly incorporated into the metabolic pathways of a variety of cancers. Images representing the in-vivo distribution of this radiotracer are used for the staging, delineation and assessment of treatment response of patients undergoing chemotherapy or radiation therapy. While PET offers the ability to provide functional information, the imaging quality of PET is adversely affected by its lower spatial resolution. It also has unfavorable image noise characteristics due to radiation dose concerns and patient compliance. These factors result in PET images having less detail and lower signal-to-noise (SNR) properties compared to images produced by CT. This complicates the use of PET within many areas of radiation oncology, but particularly the delineation of targets for radiation therapy and the assessment of patient response to therapy. The development of segmentation methods that can provide accurate object identification in PET images under a variety of imaging conditions has been a goal of the imaging community for years. The goal of this thesis are to: (1) investigate the effect of filtering on segmentation methods; (2) investigate whether combining individual segmentation methods can improve segmentation accuracy; (3) investigate whether the consensus volumes can be useful in aiding physicians of different experience in defining gross tumor volumes (GTV) for head-and-neck cancer patients; and (4) to investigate whether consensus volumes can be useful in assessing early treatment response in head-and-neck cancer patients.

For this dissertation work, standard spherical objects of volumes ranging from 1.15 cc to 37 cc and two irregularly shaped objects of volume 16 cc and 32 cc formed by deforming high density plastic bottles were placed in a standardized image quality phantom and imaged at two contrasts (4:1 or 8:1 for spheres, and 4.5:1 and 9:1 for irregular) and three scan durations (1, 2 and 5 minutes). For the work carried out into the comparison of images filters, Gaussian and bilateral filters matched to produce similar image signal to noise (SNR) in background regions were applied to raw unfiltered images. Objects were segmented using thresholding at 40% of the maximum intensity within a region-of-interest (ROI), an adaptive thresholding method which accounts for the signal of the object as well as background, k-means clustering, and a seeded region-growing method adapted from the literature. Quality of the segmentations was assessed using the Dice Similarity Coefficient (DSC) and symmetric mean absolute surface distance (SMASD). Further, models describing how DSC varies with object size, contrast, scan duration, filter choice and segmentation method were fitted using generalized estimating equations (GEEs) and standard regression for comparison. GEEs accounted for the bounded, correlated and heteroscedastic nature of the DSC metric. Our analysis revealed that object size had the largest effect on DSC for spheres, followed by contrast and scan duration. In addition, compared to filtering images with a 5 mm full-width at half maximum (FWHM) Gaussian filter, a 7 mm bilateral filter with moderate pre-smoothing (3 mm Gaussian (G3B7)) produced significant improvements in 3 out of the 4 segmentation methods for spheres. For the irregular objects, time had the biggest effect on DSC values, followed by contrast.

For the study of applying consensus methods to PET segmentation, an additional gradient based method was included into the collection individual segmentation methods used for the filtering study. Objects in images acquired for 5 minute scan durations were filtered with a 5 mm FWHM Gaussian before being segmented by all individual methods. Two approaches of creating a volume reflecting the agreement between the individual methods were investigated. First, a simple majority voting scheme (MJV), where individual voxels segmented by three or more of the individual methods are included in the consensus volume, and second, the Simultaneous Truth and Performance Level Estimation (STAPLE) method which is a maximum likelihood methodology previously presented in the literature but never applied to PET segmentation. Improvements in accuracy to match or exceed the best performing individual method were observed, and importantly, both consensus methods provided robustness against poorly performing individual methods. In fact, the distributions of DSC and SMASD values for the MJV and STAPLE closely match the distribution that would result if the best individual method result were selected for all objects (the best individual method varies by objects). Given that the best individual method is dependent on object type, size, contrast, and image noise and the best individual method is not able to be known before segmentation, consensus methods offer a marked improvement over the current standard of using just one of the individual segmentation methods used in this dissertation.

To explore the potential application of consensus volumes to radiation therapy, the MJV consensus method was used to produce GTVs in a population of head and neck cancer patients. This GTV and one created using simple 40% thresholding were then available to be used as a guidance volume for an attending head and neck radiation oncologist and a resident who had completed their head and neck rotation. The task for each physician was to manually delineate GTVs using the CT and PET images. Each patient was contoured three times by each physician- without guidance and with guidance using either the MJV consensus volume or 40% thresholding. Differences in GTV volumes between physicians were not significant, nor were differences between the GTV volumes regardless of the guidance volume available to the physicians. However, on average, 15-20% of the provided guidance volume lay outside the final physician-defined contour.

In the final study, the MJV and STAPLE consensus volumes were used to extract maximum, peak and mean SUV measurements in two baseline PET scans and one PET scan taken during patients' prescribed radiation therapy treatments. Mean SUV values derived from consensus volumes showed smaller variability compared to maximum SUV values. Baseline and intratreatment variability was assessed using a Bland-Altman analysis which showed that baseline variability in SUV was lower than intratreatment changes in SUV.

The techniques developed and reported in this thesis demonstrate how filter choice affects segmentation accuracy, how the use of GEEs more appropriately account for the properties of a common segmentation quality metric, and how consensus volumes not only provide an accuracy on par with the single best performing individual method in a given activity distribution, but also exhibit a robustness against variable performance of individual segmentation methods that make up the consensus volume. These properties make the use of consensus volumes appealing for a variety of tasks in radiation oncology.





McGurk, Ross (2013). Consensus Segmentation for Positron Emission Tomography: Development and Applications in Radiation Therapy. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.