The problem space in epidemiological research is characterized by large datasets with many variables as candidates for logistic regression model building. Out of these variables the variable combinations which form a sufficient logistic regression model have to be selected. Usually methods like stepwise logistic regres`sion apply.
These methods deliver suboptimal results in most cases, because they cannot screen the entire problem space which is formed by different variable combinations with their resulting case set. Screening the entire problem space causes an enormous effort in computing power. Furthermore the resulting models have to be judged. This paper describes an approach for calculating the complete problem space using a computer grid as well as quality indicators for judgement of every particular model in order to find the best fitting models.
We are using this system for epidemiological studies addressing specific problems in human epidemiology.
[1]
Frank E. Harrell,et al.
Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis
,
2001
.
[2]
Bernd Schuller,et al.
Grid-enabled data warehousing for molecular engineering
,
2004,
Parallel Comput..
[3]
D. Hosmer,et al.
Applied Logistic Regression
,
1991
.
[4]
Ian Foster,et al.
The Globus toolkit
,
1998
.
[5]
Daniel S. Myers,et al.
Necessity is the mother of invention: a simple grid computing system using commodity tools
,
2003,
J. Parallel Distributed Comput..