疫学と医療統計学と遺伝学と時々、大学院生活

疫学を専門とする大学院生の研究に関する備忘録的ページ。

Power Programの紹介

今回はsingle marker analysisでのサンプルサイズとパワーアナリシスに有用なNational Cancer Instituteによって開発されたフリーツールを紹介する。

This introduction to the Power program, which is a useful tool to calculate smaple size and power for genetic epi research is quoted from the book "Analysis of gentic association stuides".
The software can be downloaded form http://dceg.cancer.gov/tools/design/power

To enter the parameter values, a user can choose "Default Values" for a new computation or "Previous Run" to repeat the previous computation with different parameter values. The program can also read a parameter files. "Case-control" is then chosen as the study design and a control to case ratio needs to be specified. This ratio, in terms of our notation, is r/s = ψ0 / ψ1. Up to two exposure variables can be used. For single marker analysis, only one exposure variable is used (treating as a candidate gene). The expousre level is from 2 to 10, which is chosen by the user. For REC , (recessive) or DOM (dominant) models, two-level is chosen with the score 0, 1, and 2. The significance level α (type I error) is entered and a two-sided test is chosen (do not choose this if one-sided alternative hypothesis is to be used). The program does not allow the user to specify the frequency of the risk allele or the minor allele frequency. Instead, it asks for probabilities for all exposure levels. Note that the three genotypes are denoted as (G0, G1, G2) = (BB, Bb, bb). Under the REC model, the score for genotypes BB and Bb is 0 and for genotype bb is 1 (if b is a risk allele). The probabilities entered in the program are the sum of genotype frequencies for BB and Bb for level (score) 0 and the genotype frequency for bb for level (score) 1. For example, if the allele frequency for b is 0.3, then 0.09 is entered for score 1 and 1-0.09 = 0.91 (or 0.42+0.49 = 0.91) is entered for score 0. The baseline disease probability Pr (case | G0) is specified, which is the reference penetrance f0. The OR is specified. For exposure with more than two levels, the OR is specified for the top-to-bottom (bb versus BB). Only a single OR is spesified. Thus, the program assumes that one knows the genetic model. The user then decides the objective of the calculation by choosing "Sample size" with a target power, or "Power" with a given sample size (the number of the cases). After cliking "Finish", results are output in a new window.

後日、この計算に関わる考え方についてはまとめることにする。

20160131
RF