Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (227)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Cutter, G. R.
Right arrow Articles by Willoughby, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cutter, G. R.
Right arrow Articles by Willoughby, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Brain, Vol. 122, No. 5, 871-882, May 1999
© 1999 Oxford University Press

Development of a multiple sclerosis functional composite as a clinical trial outcome measure

Gary R. Cutter1, Monika L. Baier1, Richard A. Rudick2, Diane L. Cookfair3, Jill S. Fischer2, John Petkau11, Karl Syndulko6, Brian G. Weinshenker8, Jack P. Antel12, Christian Confavreux13, George W. Ellison7, Fred Lublin9, Aaron E. Miller4, Stephen M. Rao10, Stephen Reingold5, Alan Thompson14 and Ernest Willoughby15

1 AMC Cancer Research Center, Lakewood, Colarado, 2 Mellen Center for Multiple Sclerosis Treatment and Research, Cleveland Clinic Foundation, Cleveland, Ohio, 3 Dept of Neurology, State University of New York, Buffalo, 4 Maimonides Medical Center, Brooklyn, 5 Research Programs, National Multiple Sclerosis Society, New York, New York, 6 Neurology Service, VA Wadsworth Hospital Center, Los Angeles, 7 Department of Neurology, UCLA Medical Center, Los Angeles, California, 8 Department of Neurology, Mayo Clinic, Rochester, Minnesota, 9 Allegheny University for the Health Sciences, Philadelphia, Pennsylvania, 10 Section of Neuropsychology, MCW Clinic at Froedtert, Milwaukee, Wisconsin, 11 Department of Statistics, University of British Columbia, Vancouver, 12 Montreal Neurological Institute, Montreal, Quebec, Canada, 13 Service De Neurologie, Hopital de l'Antiquaille, Lyon, France, 14 The National Hospital for Neurology and Neurosurgery, London, UK, 15 Department of Neurology, Auckland Hospital, Auckland, New Zealand

Correspondence to: Dr Gary Cutter, AMC Cancer Research Center, Center for Research Methodology and Biometrics, 1600 Pierce St, Lakewood, CO 80214, USA E-mail: cutterg{at}amc.org


    Abstract
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
The primary clinical outcome measure for evaluating multiple sclerosis in clinical trials has been Kurtzke's expanded disability status scale (EDSS). New therapies appear to favourably impact the course of multiple sclerosis and render continued use of placebo control groups more difficult. Consequently, future trials are likely to compare active treatment groups which will most probably require increased sample sizes in order to detect therapeutic efficacy. Because more responsive outcome measures will be needed for active arm comparison studies, the National Multiple Sclerosis Society's Advisory Committee on Cinical Trials of New Agents in Multiple Sclerosis appointed a Task Force that was charged with developing improved clinical outcome measures. This Task Force acquired contemporary clinical trial and historical multiple sclerosis data for meta-analyses of primary and secondary outcome assessments to provide a basis for recommending a new outcome measure. A composite measure encompassing the major clinical dimensions of arm, leg and cognitive function was identified and termed the multiple sclerosis functional composite (MSFC). The MSFC consists of three objective quantitative tests of neurological function which are easy to administer. Change in this MSFC over the first year of observation predicted subsequent change in the EDSS, suggesting that the MSFC is more sensitive to change than the EDSS. This paper provides details concerning the development and testing of the MSFC.

multiple sclerosis clinical outcome measure

EDSS = expanded disability status scale; MSFC = multiple sclerosis functional composite; PASAT = paced auditory serial addition test; SDMT = symbol digit modality test


    Introduction
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Kurtzke's expanded disability status scale (EDSS) is the primary clinical outcome for evaluating multiple sclerosis in clinical trials. Kurtzke originally detailed a disability status scale (Kurtzke, 1955Go) and later revised it to encompass a more refined classification system of patients, now known as the EDSS (Kurtzke, 1983Go). The EDSS is based on neurological examination of eight functional systems, usually performed by a neurologist. While problems of standardization, sensitivity, reliability and rater-to-rater variability have been documented, the EDSS remains a useful tool for classifying multiple sclerosis patients by disease severity and has been used extensively to assess disability and its changes in virtually every therapeutic drug trial for multiple sclerosis.

Because of the problems associated with its use, continued reliance upon the EDSS as an outcome measure in randomized clinical trials is a concern to the multiple sclerosis clinical research community. For the first time, there are therapeutic agents which appear to have a favourable impact on the course of disease. With the introduction of these new treatments, the use of placebo control groups in further therapeutic trials is questionable, both ethically and practically. Comparison of experimental treatments against actively treated controls will replace placebo controlled randomized clinical trials for some forms of the disease. The movement from placebo control groups to active treatment patient control groups will likely give rise to smaller between-group differences. This will escalate the required sample size or lengthen the necessary duration of trials to achieve sufficient statistical power using currently available outcome assessment tools. In the absence of more sensitive and responsive outcome measures, the burden of trials will escalate and this will be reflected in increased costs to pharmaceutical company sponsors, outside funding agencies and ultimately, to patients and third-party payers, as costs of developing successful products are passed on to end users.

In February 1994, the National Multiple Sclerosis Society sponsored an international workshop entitled `Outcomes Assessment in Multiple Sclerosis Clinical Trials' in Charleston, South Carolina to review and evaluate the variety of outcome tools then available for use in multiple sclerosis. From this meeting there emerged a recommendation that a refined or new clinical rating scale be developed which met the following criteria: (i) it should be multidimensional to reflect the varied clinical expression of multiple sclerosis across patients and over time; (ii) these dimensions should change relatively independently over time; and (iii) important clinical dimensions not emphasized in current rating scales (such as cognitive function) should be measured (Whitaker et al., 1995Go).

Subsequent to this workshop, the National Multiple Sclerosis Society USA's Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis appointed a Task Force on Clinical Outcomes Assessment to address this recommendation. Two key components recommended by the Task Force were: (i) to focus on quantitative functional measures as components of a composite outcome measure (Tourtellotte et al., 1965Go; Potvin et al., 1985; Goodkin et al., 1995Go; Syndulko et al., 1996Go); and (ii) to develop specifications for a `meta-analysis' of primary and secondary outcome assessments in existing multiple sclerosis clinical and historical data sets that would provide an objective basis for developing the recommended multidimensional outcome assessment tool.

In setting these guidelines, the Task Force recognized that it would require access to extant data from pharmaceutical companies (primarily from sponsored therapeutic trials) and from independent investigators (from natural history studies, independently performed clinical studies and therapeutic trials). The use of a pooled data set would allow the development of common methods and criteria for comparisons among multiple candidate measures. Furthermore, use of a pooled data set derived from a variety of sources would reduce the likelihood of bias introduced by dependence on a single data set.

The Task Force developed six guiding principles for the composite development and analyses that are reported in this paper: (i) to use measures that reflect the major clinical dimensions of multiple sclerosis; (ii) to avoid redundancy; (iii) to use simple rather than complex measures; (iv) to improve on the valuable characteristics of the EDSS; (v) to emphasize measures sensitive to change; and (vi) to develop an outcome measure that will be useful in clinical trials (and may or may not be directly useful for clinical care). These principles helped to structure the analysis plan. The process drew by necessity on the experiences of prior investigations.

In this paper the rationale, methods and primary results of the data-gathering and meta-analysis that was performed by the Task Force are described. The recommendations emerging from this effort—a composite clinical rating scale that satisfies the requirements set out initially—have been presented separately (Rudick et al., 1995Go).


    Material and methods
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Obtaining data sets
Members of the National Multiple Sclerosis Society Task Force on Clinical Outcomes Assessments in multiple sclerosis identified existing longitudinal data sets from multiple sclerosis clinical trials and natural history studies that contained both clinical and functional measures. In most instances the request for data was met with a favourable response. Pharmaceutical companies were generally supportive, but had specific proprietary concerns that had to be formally addressed. One or two independent investigators were opposed to relinquishing data or were restricted by policies of their institution on release of data to outside sources. Some data were not in computer-readable format and were not pursued. Table 1Go provides a list of the contributors to the data repository used in these analyses with the primary type of patients included in each data set. Data collection was completed in 15 months.


View this table:
[in this window]
[in a new window]
 
Table 1 Data sets which were analysed
 
Restrictions and agreements
The Task Force chose to analyse only variables that appeared in at least two different data sets in order to allow generalization and to guarantee that the source of the data could be masked, as well as ensure the focus of analyses would be on the development of the outcome measure and not a reassessment of clinical trial results. Furthermore, only the placebo arms of the clinical trials studies were used to ensure the appropriate focus and estimation of changes with limited impact from treatment (treatments differed widely over the study data collected). For a data element to be used, the variable had to be measured at least twice, at baseline and at 1 year follow-up.

Processing and pooling data sets
Datasets were received on magnetic media, archived and examined to ensure that basic data could be used. Data dictionaries were catalogued to provide documentation for all data received.

Common naming conventions were created for each measurement scale or variable across all datasets (Table 2Go). Care was required to ensure similar definitions across datasets. Although two variables could have the same label on the magnetic media or documentation, it was not always the case that they had the same meaning. For example, duration of disease was computed from the time since initial diagnosis in some datasets but from onset of symptoms in others. This required the creation of two separate variables. For some common variables, including the EDSS, potential differences between studies in the definitions and administration of the instruments were ignored. The resultant pooled variable was accepted with its potential increase in variability (noise) due to these differences. Datasets were masked as to their origin in the pooled dataset.


View this table:
[in this window]
[in a new window]
 
Table 2 Common naming conventions used for measurement scale or variable across datasets
 
Guidelines for evaluation
The Task Force, based on clinical expertise, identified the major clinical dimensions of multiple sclerosis and specified criteria by which to proceed (Rudick et al., 1996Go). The major clinical dimensions identified for evaluation were arm, leg, cognitive and visual.

Analysis process
The criteria established by the Task Force to select candidate component measures included: good correlation with the biologically relevant clinical dimensions; good reliability of the measurement; the ability to show change over time; and availability of a minimum of two data points 1 year apart. Once the combined dataset was complete, we examined means, ranges and standard deviations of candidate variables. We next examined the correlations among candidate measures. Based on the descriptive data and these correlations, we used construct validity (the extent to which the measure of interest correlates with other measures in predicted ways, but for which no true criterion exists) to reduce the number of candidate measures. That is, individual measures within the same clinical dimension should correlate with each other (convergent validity) and not with measures of different clinical dimensions (discriminant validity). Applying these criteria, we identified a subset of candidate measures. We compared reliability estimates observed from the literature and examined means and standard deviations of change and the relationship between changes in these candidate variables and changes in the EDSS.

Strategies to increase sample sizes
Despite the quantity of prospective multiple sclerosis data available, measurements of the reduced set of candidate assessments [EDSS, nine hole peg test, box and blocks test, Purdue pegboard, timed walk, gait test, ambulation index, paced auditory serial addition test (PASAT – 2 and 3 minute versions) symbol digit modality test (SDMT), visual function (visual acuity and visual functional system)] were only available in four of the datasets which made up the primary subset of the pooled dataset used for these analyses (Smith, 1973Go; Gronwall, 1977Go; Conover, 1980Go; Hauser et al., 1983Go; Mathiowetz et al., 1985aGo, bGo; Goodkin et al., 1988Go; Desrosiers et al., 1994Go; Schwid et al., 1997Go; Wiens et al., 1997Go). In addition, the range of baseline EDSS scores for any specific variable was different among datasets. For each clinical dimension, we used any of the related candidate variables to represent that clinical dimension for a patient (i.e. for arm function, one study might have results from a Purdue pegboard test only, whereas another had only a nine hole peg test). If there was more than one variable available, we used a hierarchy of choice which was based on the results of preliminary analyses of the individual measures (Table 3Go). The EDSS was not considered as part of the MSFC as the objective was to have a standard to assess performance and an outcome measure based on quantitative measures.


View this table:
[in this window]
[in a new window]
 
Table 3 Hierarchy of variables evaluated
 
Univariate statistics were computed using Spearman rank correlations and {chi}2 tests. Concurrent and predictive validity were obtained from a logistic regression model.

Creation of a Z-score
To use different variables within the same dimension that inherently measure a dimension differently, it was necessary to define a common metric. For example, for the nine hole peg test, the time required to put nine pegs into nine holes and then remove them is measured. To measure arm function using the box and block test, the number of blocks put into the box in a specified time (usually 60 s) was counted. Thus, with one test we have a count and the other an average time. To make the metrics from tests comparable, we considered how much `above' or `below' the average of the population in the pooled dataset the patient performed on the respective test. A patient who performed below average on the box and blocks test would be expected to perform below average on the nine hole peg test and vice versa.

A common statistical approach which relates an individual's performance to the average performance in the population is a standardized score, also called Z-score. A Z-score is the number of standard deviation units a patient's score is below or above the average score. Provided the underlying measurements are continuous or count variables over a sizeable range, a Z-score can be created for each. Such a computation creates a `unitless' score that is no longer related to the original units of analysis (i.e. blocks or seconds) as it measures number of standard deviation units and therefore can more readily be used for comparisons.

Quantitative composites
By using a Z-score of an arm function test to represent the quantitative performance of the upper limb and another Z-score measuring leg function to quantify performance of the lower limb, we are able to sum these two Z-scores to form an overall average score of limb functions for the patient. Patients who have poor arm function but good leg function will score closer to zero than patients who have deteriorated in both arm and leg function and are thus likely to exhibit Z-scores below average on both tests. The MSFC evaluates the performance of a patient on these two tests against the average of all patients in our cohort. Similarly, we can add test scores to include information on other important clinical dimensions of disease.

To combine variables, Z-scores across all time points were computed. Some of the candidate tests measured deterioration in function as an increase in score (e.g. timed tests such as the nine hole peg test) while in other cases (e.g. the box and block test) a decrease in score reflected a deterioration in function. Therefore, the directions of Z-scores for different candidate variables were adjusted so that in all cases higher Z-scores corresponded to a better outcome. For example, the Z-score for a timed test, like the nine hole peg test, where a higher score is worse, was multiplied by –1 to make the direction of the Z-score the same as for the box and blocks test, where a lower score is worse. For data that was ordered or ranked, we used Van der Waarden scores, which are Z-scores based on the ranks of the values (Conover, 1980Go).

Standardizing each variable involved a decision about what population means and standard deviations (the standard) to use in creating the Z-score. Obvious candidates included: results from normal subjects, all patients in our study cohort and all multiple sclerosis patients in the pooled dataset. The population choice used to standardize the Z-score does not affect comparisons on a single variable over time (e.g. assessing whether there has been significant change in the box and blocks test results), but does have impact when Z-scores are combined across clinical dimensions. The impact is equivalent to selecting different weighting factors for each component. We opted to standardize using baseline values for all patients from the datasets under consideration. Primarily this choice was based on the clinical relevance of the measure and ease in interpretation. Using normal reference values skewed the Z-scores making most patients worse than normals. Interpretation in terms of better or worse than an average patient seemed more intuitive.

Special considerations in the computation of the Z-score were necessary for patients who were unable to complete a test. Even though these patients' values are missing they are not missing in the usual sense. They provide important information as they are in some ways the worst score possible. We evaluated the use of several procedures to substitute a value to use as an indicator of non-performance, including an inverse score (1/time of the measure) that makes numbers closer to zero correspond to poorer performance and larger values to better performance. We also considered assigning a maximum Z-score for persons unable to complete the task. This would be the largest Z-score observed among those providing data on a test. For these analyses, the data substitution was done only for arm and leg functions since the reason for missing values for the cognitive and visual tests were not designated in the dataset. The use of a maximum Z-score results in a highly skewed distribution. When this occurs statisticians often transform the data by taking the inverse of the raw data value, i.e. 1/time. This was done for the nine hole peg test, but the timed walk did not provide a sufficiently extreme value for the Z-score to indicate how poor the patient's ambulation had become (this Z-score was only –2). In most datasets the inability to perform the nine hole peg test was coded as 777. Keeping that convention, 1/777 was used to represent the inability to perform the test, providing a value sufficiently small but still informative. For the timed walk the actual time was used to create the Z-score. When a patient was unable to complete the test, the maximum time observed in the pooled dataset was used to create the Z-score to represent this inability. A manual with information on the use of the MSFC, detailing the algorithm, will be available from the National Multiple Sclerosis Society (Fisher et al., 1999).

A composite based on such Z-scores can also measure the change in performance over time. By computing the composite at one point in time and measuring the patient at a later point in time, the difference between the Z-scores can be used to measure improvement or worsening.

Validation strategies
To ensure that we had satisfied Task Force recommendations for a refined clinical outcome measure, we examined both the concurrent and predictive validity of the MSFC. Concurrent validity was defined as change in the MSFC compared with sustained change in the EDSS over the same 1 year period (measured over 15 months to achieve sustained change at 12 months). Predictive validity was defined as change in the MSFC occurring over the first year of follow-up compared with subsequent change in EDSS among those patients with no sustained change in EDSS during the first year. Predictive validity was felt to best illustrate and validate the MSFC construction. Statistical significance tests were performed using logistic regression and {chi}2 tests.


    Results
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Not surprisingly, the variable we found in more datasets than any other was the EDSS. This reflects the heavy reliance on this measure in most contemporary clinical trials. Figure 1Go shows the distribution of baseline EDSS values in the pooled dataset. This distribution is similar to those reported in most natural history studies of the EDSS, with a paucity of values between EDSS 4 and 5.5 (Weinshenker et al., 1989Go). The relative lack of observations at EDSS 0 reflects the fact that patients normally seen in clinic-based populations have experienced their disease for some time prior to coming to the clinic. Figure 1Go also presents the distribution of patients by duration of the disease from time of diagnosis. The majority of patients are more than 10 years post-onset. The average duration of disease increased as EDSS increased, but not in a smooth linear fashion. There is also a great deal of overlap in the time it takes to attain a particular EDSS level, indicative of the differing patient types included in the pooled dataset (Weinshenker et al., 1996Go).



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 1 Distribution of patients by baseline EDSS and duration of disease (total n = 3657 subjects).

 
Since in the concurrent validation strategy change in the MSFC will be compared with a 3 month sustained change (an increase of 1 step sustained for 3 months if the EDSS score was <5.5 and half a step if the EDSS score was >=5.5) in the EDSS occurring during the first 15 months, the amount of change occurring in 1 year by baseline EDSS was examined. Figure 2Go shows that on average 10% of patients experience a sustained change over the first 15 months of observation but that the rate of change varies with EDSS level.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2 Percent of patients with 3 months sustained change in EDSS over baseline within 1 year. Numbers above each column represents the number of patients at each baseline EDSS (total n = 337 subjects).

 
Analysis of individual variables
We examined the univariate results for each candidate variable to assess whether it appeared to change over time, was present in a sufficient number of patients to provide meaningful generalization and had the desired measurement properties. We found that we did not have a measure of visual function that seemed to meet these criteria. The visual acuity values did not show sufficient change over time and did not correlate with changes in the EDSS. This left us with the remaining clinical dimensions identified by the Task Force (arm, leg and cognitive function). These three dimensions formed the basis of our 3D composite score. We then examined the performance of the variables within each clinical dimension in terms of change over time, correlation with each other and reliable measurement characteristics (data not shown).

Spearman rank correlations based on all pairs of measurements available were computed. Brainstem and cerebellar functional system scores were positively correlated (r = 0.49). Among the functional tests, there was a high correlation between the nine hole peg and box and block tests (r = –0.73). The Purdue pegboard test was highly correlated with the quantitative examination of neurological function upper extremity composite (r = 0.85) which was not surprising given that the Purdue test pegboard is a component of this composite. The correlation between quantitative test measures and neurological rating scores were highest between the nine hole peg test, the box and block test and the cerebellar functional system score (r's of 0.59 and –0.58, respectively). The correlations of the nine hole peg test and box and block test with the brainstem functional system were slightly lower (r's of 0.41 and –0.40, respectively). Thus, it appeared that these two different functional tests (nine hole peg test and the box and block test) were roughly equivalent in terms of their ability to represent the relevant Kurtzke functional system scores and most likely provide similar information. The SDMT was negatively correlated with the mental functional system (r = –0.36), whereas both the PASAT (2 minute version) and the PASAT (3 minute version) showed a correlation of r = –0.18.

The box plots provided in Fig. 3AGo and C show the distribution of the three component measures that were ultimately included in the MSFC by baseline EDSS. The line inside each box represents the median score for that measure. Data for the timed walk of 25 feet is presented in Fig. 3AGo. There was limited variation and little change in score in the lower range of the EDSS but a much larger variability in the timed walk score above EDSS 5.5. Figure 3BGo presents the average nine hole peg test by levels of the baseline EDSS. There is a roughly linear average worsening in the nine hole peg test score as the EDSS level increases. Furthermore, in the lower range of the EDSS (<=3.5) we observed a symmetric distribution of the time taken to complete the nine hole peg test with a gradual increase in times and skewness as EDSS or duration of disease increased (not shown), as well as increased variation in time. Finally, Fig. 3CGo shows the PASAT (3 minute version) scores, which have a wide overlap across EDSS and slight downward trend indicating declining performance as EDSS increases.





View larger version (66K):
[in this window]
[in a new window]
 
Fig. 3 The bottom and top edges of the boxes are located at the sample 25th and 75th percentiles. The centre horizontal line is drawn at the sample median. The central vertical lines extend from the box as far as the data extend, to a distance of, at most, 1.5 interquartile ranges. (A) Patients with timed walk of 25 ft measurements versus EDSS at baseline (n = 343 subjects).(B) Patients with nine hole peg test versus EDSS at baseline(n = 343 subjects). (C) Patients with PASAT–3 min version measurements versus EDSS at baseline (n = 310 subjects).

 
The correlation between the 1 year change in the nine hole peg test results and the change in EDSS was modest (r = 0.27). As expected, the correlation between the box and block test score changes and the change in EDSS is of the same magnitude and direction (r = –0.26). The correlation for the change in the timed walk with the EDSS change was slightly higher at r = 0.41. With regards to the cognitive variables the largest correlation found was for change in the PASAT (2 min version) versus change in EDSS (r = –0.15). The other two cognitive test variables have correlations of –0.13 and –0.06 for the PASAT (3 min version) and SDMT, respectively. However, ~1% more patients had PASAT (3 min version) scores than PASAT (2 min version) scores.

Analysis of composites
The three component composite which showed the most change over the entire range of EDSS consisted of the timed walk of 25 ft, the nine hole peg test and the PASAT (3 min version). Table 4Go provides the Spearman rank correlations among the individual composite components, the overall composite and the EDSS. As expected each of the individual dimensions of the composite (arm, leg and cognitive) were correlated as expected with the overall composite (0.81, 0.67 and 0.68, respectively). Individual correlations between the various dimensions were much lower; for arm versus leg and arm versus cognitive components (r's = 0.39 and 0.36, respectively) and for leg versus cognitive (r = 0.21). The correlation between the overall composite score and the EDSS was –0.47. As expected, the leg dimension was more highly correlated with the EDSS than the arm dimension with an even smaller correlation for the cognitive component versus the EDSS (–0.52, –0.33 and –0.23, respectively). This provides strong evidence of face validity as well as convergent and divergent validity. That is, the MSFC and its components correlate reasonably well with the EDSS while containing additional information on the variability of patients from the arm dimension and even more from the cognitive dimension which is not measured by the EDSS.


View this table:
[in this window]
[in a new window]
 
Table 4 Correlations of components with three-part composite score and EDSS (placebo group)
 
The MSFC score was assessed for changes over time using baseline, 1 year and 2 year assessments as the time points. The changes over time are subject to disease progression, patient drop-outs and practice effects that occur when patients are performing these tests for the first time. We did not have data from which we could separately identify the magnitude of each component of change. The average composite score for all measurements available was –0.07 at baseline, –0.07 at 1 year and –0.16 at 2 years. These values show no change from baseline to 1 year and then a decline between year 1 and year 2. We re-examined these averages for a cohort of patients who had all three measurements. The average values for these subjects were –0.04 at baseline, 0.01 at 1 year and –0.14 at 2 years. These data suggest that the patients who did not have all three measures had lower composite scores, lowering the average values slightly for the baseline values. The cohort remaining in the studies for 2 full years show an initial improvement (an increase in the MSFC from baseline to 1 year) and then a substantial decline from year 1 to year 2.

The improvement could be real but is most likely the result of the practice effect that may be occurring in these tests. Practice effects, that is improvement in test scores because of increasing familiarity with the test, cause patients to appear as if they improve clinically when in fact it is only due to repeat testing. These practice effects can dampen change over time because of a lower than actual baseline value due to patient inexperience with the functional tests. The subsequent values provide more realistic assessments of the MSFC values. To attempt to assess whether this was a real improvement or a practice effect, we assessed the mean of all the measurements taken 6 months after baseline. This value was 0.04, higher than either the baseline or 1 year results, suggesting that the lack of 1 year change was in fact influenced by practice effects.

Concurrent and predictive validity
For concurrent validity, we compared the 1 year change in the MSFC score (baseline to first year annual visit) with a 3 month sustained change in the EDSS, defined as a 1 step change in EDSS at or below 5.5 and a 1/2 step change above 5.5 occurring during the first year. Figure 4Go shows the mean change over 1 year in the MSFC score compared with change in EDSS during the same time period (Spearman r = –0.22, P < 0.0001), illustrating the greater changes in the MSFC as the EDSS changes. The relationship of the changes in the MSFC versus EDSS stratified by the group with EDSS <=3.5 at baseline versus those patients with EDSS >3.5 at baseline is also presented. The rank correlations between these changes are –0.18 (P = 0.0186) and –0.30 (P < 0.0001), respectively, which would indicate that concurrent validity is better for more disabled patients.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 4 Mean change in composite versus change in EDSS within 1 year (n = 378 subjects). Diamonds = EDSS; squares = EDSS <= 3.5; triangles = EDSS > 3.5.

 
Figure 5Go shows the percentage of patients experiencing a sustained EDSS change in the first year by quintile of change in the MSFC. Only 11.8% of the patients in the lowest MSFC change quintile experienced a sustained change in the EDSS over the first year compared with 38.7% of those patients in the highest quintile of composite change. This 3.5-fold increase in percentage change supports concurrent validity. Placebo patients worsening by at least 1 SD unit on the MSFC (5.5% of all the placebo patients considered) were approximately two times more likely (odds ratio of 2.1, P < 0.0001) to experience a sustained change in the EDSS (e.g. a 3 month sustained increase in the EDSS) within the same year.



View larger version (45K):
[in this window]
[in a new window]
 
Fig. 5 Percentage of patients with a sustained change in EDSS by quintile of composite change, concurrent (open columns) and predictive (stippled columns) validity (n = 378 subjects).

 
We assessed predictive validity by estimating the ability of change in the MSFC to predict subsequent changes in the EDSS using the rules for a sustained change described above. In this analysis, we eliminated all patients who experienced a sustained change in the EDSS during the first year and assessed patients over their remaining time (mostly up to 2 years) in the dataset. Figure 5Go also shows the predictive validity results by the original quintiles of composite change; 19.7% of the placebo patients in the quintile with the least amount of composite change experienced a subsequent sustained change in their EDSS. Of those patients in the highest quintile of composite change, 39.0% had a subsequent sustained change on EDSS. Again, this pattern suggests good predictive validity. The odds ratio for experiencing a subsequent sustained increase in EDSS was 1.6 (P = 0.0044) for a patient who experienced at least one standard deviation unit change in the MSFC in the first year. When follow-up was restricted to 15 months from the first year, the results were not altered appreciably.

To assess the percentage worsening on the MSFC we dichotomized the composite change into worsening or not worsening. Approximately 40% of patients worsened in the first year (change in Z-score <0), while 60% did not (change in Z-score >=0). The percentage worsening in the first year among those patients whose EDSS was 3.5 or below at baseline was 35.1%, whereas 43.2% of those with baseline EDSS > 3.5 worsened in the first year.


    Discussion
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
A clinical outcome measure is the result by which to judge an intervention. In the best cases the clinical outcome is both relevant and unequivocal. An example would be the impact of treatment for a fatal disease on mortality where death would be the primary outcome measure. Such simple outcome measures are highly desirable, but may not be achievable for a complex chronic disease like multiple sclerosis. Therefore, a narrow measure of a single clinical dimension of multiple sclerosis may be inappropriate and is very likely to be misleading. The Clinical Outcomes Assessment Task Force identified four major clinical dimensions of multiple sclerosis that should be considered in any clinical trial outcome measure: arm function, leg function and ambulation, cognitive function and visual function (Rudick et al., 1996Go). Clinical dimensions excluded are still considered important for patient care. The available measures of visual function were not sufficiently informative to add to this composite. Other candidates, such as contrast sensitivity, were not available. This dimension could be reconsidered in future studies.

From these measures, the Task Force has defined a composite score (MSFC) with both concurrent and predictive validity against an accepted although imperfect standard. The MSFC is designed to capture relevant information about the status of disease within and among patients, even though it does not measure every aspect of the disease that is meaningful to the patient or to the treating physician.

We found that performance on simple quantitative composites that included arm, leg and cognitive function declined with disease duration and with EDSS, the current standard clinical trial measure of neurological disability in multiple sclerosis patients. Furthermore, we found that performance on the individual components of the MSFC and the MSFC itself correlated with EDSS, and that change in the MSFC correlated with change in EDSS. Of considerable importance, the MSFC change predicted subsequent change in EDSS. Among patients with no change in their EDSS, a number exhibited changes in the MSFC suggesting the possibility that composite change may be informative in a clinical trial setting earlier than the EDSS.

We used a combination of studies and measures to reduce the candidate measures for each multiple sclerosis clinical dimension outlined by the Task Force. Subsequently, it was only the trial data that provided the actual measures for evaluation. These measures were pooled using Z-scores within clinical dimensions to increase the sample size, and consequently to have more power and generalization within which to assess performance across studies and for a broader range of EDSS. This mixing of different studies and performance measures adds noise to the composite tested, thereby making it more difficult to detect a change in the MSFC. Therefore, we believe that the concurrent and predictive validity demonstrated in this study may actually underestimate the predictive value of the MSFC.

An important design consideration is the elimination of practice effects. Such effects, if not controlled for in the design of a study, will lead to artificially poorer performance at baseline masking subsequent change. While most drugs today are directed at modifying disease activity, the future hopefully will produce therapies that lead to improvement.

While there are limitations in any meta-analysis, we believe the approach we took to be both conservative and valid. We were limited to data collected in the past, and the datasets available were dominated by the EDSS. Therefore, there was a relatively limited amount of data on alternative measures. Furthermore, validation of candidate measures against the EDSS results in a circular argument. Demonstrating concurrent and predictive validity of a functional composite tested against the EDSS was encouraging. However, validating against changes in the EDSS may not be an optimal test of the MSFC performance as the EDSS is subject to error as well as variation. Patients are often entered into studies based on entrance criteria that include the EDSS. This can create EDSS misclassification that is subsequently seen as an EDSS change, but is actually unrelated to any biological progression. How often this type of misclassification occurred in the pooled datasets is unknown. Also, the EDSS is known to be insensitive to arm and cognitive changes, while the functional composite contains these components. This too might lead to reduced sensitivity and specificity in detecting EDSS change, suggesting that the composite does not adequately detect change over time when the problem potentially lies with the EDSS and not the composite. Additionally, failure to eliminate the learning curve can overestimate improvement and underestimate worsening. Furthermore, the learning curve is likely to differ among patients adding noise to the MSFC.

Despite these concerns and limitations, we were able to construct a quantitative functional composite that has the ability to identify EDSS change concurrently and also to exhibit changes in function that precede changes in the EDSS. The approach we used to identify a more sensitive outcome measure is not new. It has been used in the development of outcome measures for use in rheumatoid arthritis (Goldsmith et al., 1993Go) and in amytrophic lateral sclerosis (Brooks et al., 1994Go).

Use of composites of quantitative functional tests within the multiple sclerosis community is also not new, but problems remain in the search for an optimal composite for multiple sclerosis (Syndulko et al., 1996Go). The number of variables in any composite should be limited because of statistical parsimony. That is, including too many variables of which only a few exhibit change leads to composite scores that on average show little change. Also, there is increasing cost to adding more measures. Averaging of Z-scores, as presented here, may enable the simple substitution of different components in various disease severity ranges, thereby increasing the utility of this approach. However, a most encouraging finding is the good performance of the single composite over the wide range of EDSS represented in this study.

It must be kept in mind that we are searching for a composite that will work as a clinical outcome measure in a clinical trial. While the patients included spanned the entire EDSS range from low to high, this composite measure may not be suitable for individual patient care or evaluation and may not demonstrate a meaningful clinical change per se but be linked to clinical change. The predictive validity shown here supports the use of such a composite as a surrogate for clinically meaningful change in the setting of a clinical trial. While several pooled datasets were utilized for this validation, analyses within study datasets confirmed similar performance. These results are not shown because of our agreements only to use data in a masked fashion. Future studies should validate the MSFC against patient perceptions of improvement or worsening.

An attractive characteristic of the functional composite outcome measure is that measures within a clinical dimension are interchangeable; for example, if a performance test superior to the nine hole peg test or the box and block test is developed it may be substituted for the arm dimension. Similarly, the MSFC may possibly be improved by adding a component for visual function. However, clear guidelines need to be developed for comparing performance of new measures.

The MSFC is a continuous variable that tends to follow a normal distribution, although this is not essential. Normal distribution facilitates various types of analyses. The simplest approach is to treat change in the composite measure as a single post-treatment index of overall performance, and compare two randomized groups using a t test. However, the measure can also be used as a time dependent outcome where each group is assessed using survival analyses until the time of a specified amount of change has occurred. More sophisticated and generally more powerful forms of analysis are facilitated by newer longitudinal data methods capable of handling repeated measures over time.

The results presented herein demonstrate an evidence based concept of a specific type of composite useful for assessing longitudinal changes in multiple sclerosis patients. This composite was developed using the guidelines set forth by a National Multiple Sclerosis Society Task Force on Clinical Outcomes Assessment and meets the criteria set forth by this committee for defining a new outcome measure (Rudick et al., 1997Go). This paper is not meant to be exhaustive in the evaluation of all potential measures, as the study was limited by the information collected by prior investigators. The measure should be tested in prospective studies to confirm our finding. The results do set forth a promising simple quantitative, functional composite of the major multiple sclerosis clinical dimensions as a potential outcome measure for future multiple sclerosis clinical trials.


    Acknowledgments
 
The work of the Task Force was supported by the US National Multiple Sclerosis Society with an unrestricted educational grant from Berlex Laboratories. The Task Force could not have completed its work without support from investigators and study sponsors who made their data available: Berlex Pharmaceuticals; Biogen, Inc., The Canadian Cooperative Multiple Sclerosis Study Group (John Noseworthy, MD); EDMUS (Christian Confavreux, MD); Elan Pharmaceutical Research Corporation; George Ellison, MD; Donald Goodkin, MD; The Multiple Sclerosis Collaborative Research Group (Lawrence Jacobs, MD); The Multiple Sclerosis Study Group; COSTAR (Donald Paty, MD, Donald Studney, MD); Stephen Rao, Ph.D.; Richard Rudick, MD; Sandoz Pharmaceuticals; Karl Syndulko, Ph.D.; Teva Pharmaceutical Industries, Ltd; Howard Weiner, MD.


    References
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Brooks BR, Lewis D, Rawling J, Sanjak M, Belden D, Hakim H, et al. The natural history of amyotrophic lateral sclerosis. In: Williams AC, editor. Motor neuron disease. London: Chapman & Hall; 1994. p. 131–169.

Canadian Cooperative Multiple Sclerosis Study Group. The Canadian cooperative trial of cyclophosphamide and plasma exchange in progressive multiple sclerosis [see comments]. Lancet 1991; 337: 441–6.[ISI][Medline]

Confavreux C, Aimard G, Devic M. Course and prognosis of multiple sclerosis assessed by the computerized data processing of 349 patients. Brain 1980; 103: 281–300.[Free Full Text]

Conover WJ. Practical nonparametric statistics. 2nd ed. New York: John Wiley; 1980.

Desrosiers J, Bravo G, Hebert R, Dutil E, Mercier L. Validation of the Box and Block Test as a measure of dexterity of elderly people: reliability, validity, and norms studies. Arch Phys Med Rehabil 1994; 75: 751–5.[ISI][Medline]

Ellison GW, Myers LW, Mickey MR, Graves MC, Tourtellotte WW, Syndulko K, et al. A placebo-controlled, randomized, double-masked, variable dosage, clinical trial of azathioprine with and without methylprednisolone in multiple sclerosis. Neurology 1989; 39: 1018–26.[Abstract/Free Full Text]

Ellison GW, Myers LW, Leake BD, Mickey MR, Ke D, Syndulko K, et al. Design strategies in multiple sclerosis clinical trials. The Cyclosporine Multiple Sclerosis Study Group. Ann Neurol 1994; 36 Suppl: S108–12.

Fischer J, Jack AJ, Kniker JE, Cutter G, National Multiple Sclerosis Society Outcome Assessment Task Force. Administration and Scoring Manual for the Multiple Sclerosis Functional Composite. Demos Publications, New York. In press 1999.

Goldsmith CH, Smythe HA, Helewa A. Interpretation and power of a pooled index. J Rheumatol 1993; 20: 575–8.[ISI][Medline]

Goodkin DE, Hertsgaard D, Seminary J. Upper extremity function in multiple sclerosis: improving assessment sensitivity with Box-and-Block and Nine-Hole Peg Tests. Arch Phys Med Rehabil 1988; 69: 850–4.[ISI][Medline]

Goodkin DE, Bailly RC, Teetzen ML, Hertsgaard D, Beatty WW. The efficacy of azathioprine in relapsing-remitting multiple sclerosis. Neurology; 1991, 41: 20–5.[Abstract/Free Full Text]

Goodkin DE, Rudick RA, VanderBrug Medendorp S, Daughtry MM, Schwetz KM, Fischer J, et al. Low-dose (7.5 mg) oral methotrexate reduces the rate of progression in chronic progressive multiple sclerosis [see comments]. Ann Neurol 1995; 37: 30–40. Comment in: Ann Neurol 1995; 37: 5–6, Comment in: Ann Neurol 1995; 38: 832–3, Comment in: Ann Neurol 1996; 39: 684.[ISI][Medline]

Gronwall DMA. Paced auditory serial-addition task: a measure of recovery from concussion. Percept Mot Skills 1977; 44: 367–73.[ISI][Medline]

Hauser SL, Dawson DM, Lehrich JR, Beal MF, Kevy SV, Propper RD, et al. Intensive immunosuppression in progressive multiple sclerosis: a randomized three-arm study of high-dose intravenous cyclophosphamide, plasma exchange, and ACTH. N Engl J Med 1983; 308: 173–80.[Abstract]

IFNB Multiple Sclerosis Study Group. Interferon beta-1b is effective in relapsing-remitting multiple sclerosis. I. Clinical results of a multicenter, randomized, double-blind, placebo-controlled trial [see comments]. Neurology 1993; 43: 655–61. Comment in: Neurology 1993; 43: 641–3.[Abstract/Free Full Text]

Jacobs LD, Cookfair DL, Rudick RA, Herndon RM, Richert JR, Salazar AM, et al. Intramuscular interferon beta-1a for disease progression in relapsing multiple sclerosis [see comments] [published erratum appears in Ann Neurol 1996; 40: 480]. Ann Neurol 1996; 39(3): 285–294. Comment in: ACP J Club 1996; 125: 35, Comment in: Ann Neurol 1996; 40: 951–3, Comment in: Ann Neurol 1997; 41: 560, Comment in: Ann Neurol 1997; 42: 982.

Johnson KP, Brooks BR, Cohen JA, Ford CC, Goldstein J, Lisak RP, et al. Copolymer 1 reduces relapse rate and improves disability in relapsing-remitting multiple sclerosis: results of a phase III multicenter, double-blind placebo-controlled trial. The Copolymer 1 Multiple Sclerosis Study Group [see comments]. Neurology 1995; 45: 1268–76. Comment in: Neurology 1995; 45: 1245–7, Comment in: ACP J Club 1996; 124: 2–3.[Abstract]

Kurtzke JF. A new scale for evaluating disability in multiple sclerosis. Neurology 1955; 5: 580–3.

Kurtzke JF. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology 1983; 33: 1444–52.[Abstract/Free Full Text]

Mathiowetz V, Weber K, Kashman N, Volland G. Adult norms for 9 Hole Peg Test of finger dexterity. Occup Ther J Res 1985a; 5: 24–38.

Mathiowetz V, Volland G, Kashman N, Weber K. Adult norms for Box and Block Test of manual dexterity. Am J Occup Ther 1985b; 39: 386–91.[ISI][Medline]

Multiple Sclerosis Study Group. Efficacy and toxicity of cyclosporine in chronic progressive multiple sclerosis: a randomized, double-blinded, placebo-controlled clinical trial [see comments]. Ann Neurol 1990; 27: 591–605. Comment in: Ann Neurol 1991; 29: 226.[ISI][Medline]

Paty D, Studney D, Redekop K, Lublin F. MS COSTAR: a computerized patient record adapted for clinical research purposes. Ann Neurol 1994; 36 Suppl: S134–35.

Potvin AR, Tourtellotte WW. Quantitative examination of neurologic function. Volume II. Methodology for test and patient assessments and design of a computer-automated system. Boca Raton (FL): CRC Press; 1985.

Rao SM, Leo GJ, Bernardin L, Unverzagt F. Cognitive dysfunction in multiple sclerosis. I. Frequency, patterns, and prediction [see comments]. Neurology 1991; 41: 685–91. Comment in: Neurology 1991; 41: 2014–5[ISI][Medline]

Rudick RA, Medendorp SV, Namey M, Boyle S, Fischer J. Multiple sclerosis progression in a natural history study: predictive value of cerebrospinal fluid free kappa light chains. Mult Scler 1995; 1: 150–5.[Medline]

Rudick R, Antel J, Confavreux C, Cutter G, Ellison G, Fischer J, et al. Clinical outcomes assessment in multiple sclerosis. [Review]. Ann Neurol 1996; 40: 469–79.[ISI][Medline]

Rudick R, Antel J, Confavreux C, Cutter G, Ellison G, Fischer J, et al. Recommendations from the National Multiple Sclerosis Society Clinical Outcome Assessment Task Force. Ann Neurol 1997, 42: 379–82.[ISI][Medline]

Schwid SR, Goodman AD, Mattson DH, Mihai C, Donohoe KM, Petrie MD, et al. The measurement of ambulatory impairment in multiple sclerosis. Neurology 1997; 49: 1419–24.[Abstract/Free Full Text]

Sipe JC, Knobler RL, Braheny SL, Rice GP. Panitch HS, Oldstone MB. A neurologic rating scale (NRS) for use in multiple sclerosis. Neurology 1984; 34: 1368–72.[Abstract/Free Full Text]

Smith A. Symbol-Digit Modalities Test Manual. Los Angeles: Western Psychological Services; 1973.

Syndulko K, Ke D, Ellison GW, Baumhefner RW, Myers LW, Tourtellotte WW, and the Multiple Sclerosis Study Group. Comparative evaluations of neuroperformance and clinical outcome assessments in chronic progressive multiple sclerosis. I. Reliability, validity and sensitivity to disease progression. Mult Scler 1996; 2: 142–56.[Medline]

Tourtellotte WW, Haerer AF, Simpson JF, Kuzma JW, Sikorski J. Quantitative clinical neurological testing. I. A study of a battery of tests designed to evaluate in part the neurological function of patients with multiple sclerosis and its use in a therapeutic trial. Ann NY Acad Sci 1965; 122: 480–505.

Weiner HL, Dau PC, Khatri BO Petajan JH, Birnbaum G, McQuillen MP, et al. Double-blind study of true vs. sham plasma exchange in patients treated with immunosuppression for acute attacks of multiple sclerosis [see comments]. Neurology 1989; 39: 1143–9. Comment in: Neurology 1990; 40: 864–6.[Abstract/Free Full Text]

Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study 1. Clinical course and disability. Brain 1989; 112: 133–46.[Abstract/Free Full Text]

Weinshenker BG, Issa M, Baskerville J. Meta-analysis of the placebo-treated groups in clinical trials of progressive MS. Neurology 1996; 46: 1613–9.[Abstract/Free Full Text]

Whitaker JN, McFarland HF, Rudge P, Reingold SC. Outcomes assessment in multiple sclerosis clinical trials: a critical analysis. Mult Scler 1995; 1: 37–47.[Medline]

Wiens AN, Fuller KH, Crossen JR. Paced Auditory Serial Addition Test: adult norms and moderator variables. J Clin Exp Neuropsychol 1997; 19: 473–83.[ISI][Medline]

Received June 9, 1998. Revised October 21, 1998. Accepted December 14, 1998.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Neurol. Neurosurg. PsychiatryHome page
C E Teunissen, J Killestein, J J Kragt, C H Polman, C D Dijkstra, and H J Blom
Serum homocysteine levels in relation to clinical progression in multiple sclerosis
J. Neurol. Neurosurg. Psychiatry, December 1, 2008; 79(12): 1349 - 1353.
[Abstract] [Full Text] [PDF]


Home page
Mult SclerHome page
B Brochet, M. Deloire, M Bonnet, E Salort-Campana, J. Ouallet, K. Petry, and V Dousset
Should SDMT substitute for PASAT in MSFC? A 5-year longitudinal study
Multiple Sclerosis, November 1, 2008; 14(9): 1242 - 1249.
[Abstract] [PDF]


Home page
Arch NeurolHome page
R. Bakshi, M. Neema, B. C. Healy, Z. Liptak, R. A. Betensky, G. J. Buckle, S. A. Gauthier, J. Stankiewicz, D. Meier, S. Egorova, et al.
Predicting Clinical Progression in Multiple Sclerosis With the Magnetic Resonance Disease Severity Scale
Arch Neurol, November 1, 2008; 65(11): 1449 - 1453.
[Abstract] [Full Text] [PDF]


Home page
Mult SclerHome page
J Furby, T Hayton, V Anderson, D Altmann, R Brenner, J Chataway, R. Hughes, K. Smith, D. Miller, and R Kapoor
Magnetic resonance imaging measures of brain and spinal cord atrophy correlate with clinical impairment in secondary progressive multiple sclerosis
Multiple Sclerosis, September 1, 2008; 14(8): 1068 - 1075.
[Abstract] [PDF]


Home page
NeurologyHome page
T. Schmitz-Hubsch, P. Giunti, D. A. Stephenson, C. Globas, L. Baliko, F. Sacca, C. Mariotti, M. Rakowicz, S. Szymanski, J. Infante, et al.
SCA Functional Index: A useful compound performance measure for spinocerebellar ataxia
Neurology, August 12, 2008; 71(7): 486 - 492.
[Abstract] [Full Text] [PDF]


Home page
J. Neurol. Neurosurg. PsychiatryHome page
S Olindo, A Signate, A Richech, P Cabre, Y Catonne, D Smadja, and H Pascal-Mousselard
Quantitative assessment of hand disability by the Nine-Hole-Peg test (9-HPT) in cervical spondylotic myelopathy
J. Neurol. Neurosurg. Psychiatry, August 1, 2008; 79(8): 965 - 967.
[Full Text] [PDF]


Home page
Mult SclerHome page
C Heesen, J Bohm, C Reich, J Kasper, M Goebel, and S. Gold
Patient perception of bodily functions in multiple sclerosis: gait and visual function are the most valuable
Multiple Sclerosis, August 1, 2008; 14(7): 988 - 991.
[Abstract] [PDF]


Home page
Arch NeurolHome page
C. Krishnan, A. I. Kaplin, R. A. Brodsky, D. B. Drachman, R. J. Jones, D. L. Pham, N. D. Richert, C. A. Pardo, D. M. Yousem, E. Hammond, et al.
Reduction of Disease Activity and Disability With High-Dose Cyclophosphamide in Patients With Aggressive Multiple Sclerosis
Arch Neurol, August 1,