Sample size calculation – Is there an easy way out?

We have all learned a lot about clin­i­cal per­for­mance stud­ies for IVD val­i­da­tion in the past cou­ple of years: With thou­sands of Covid tests enter­ing the EU-mar­ket, their val­i­da­tion stud­ies became one of the most sought-after ser­vices from CROs and biobanks. There were plen­ty of prac­tice oppor­tu­ni­ties to opti­mise process­es and tight­en col­lab­o­ra­tions. One vital step of clin­i­cal per­for­mance study design, how­ev­er, was tak­en off the cur­ricu­lum: At an ear­ly stage of the pan­dem­ic, com­mon­ly accept­ed guide­lines for Covid test val­i­da­tion were pub­lished which con­tained clear require­ments regard­ing the num­ber of sam­ples to be includ­ed in the study. As a result, we didn’t have to both­er with the infa­mous issue of sam­ple size planning.

Today, MDCG-guide­line 2021 – 21, around which all our val­i­da­tion efforts revolved for over two years, is part of EU Reg­u­la­tion 2022/1107 (‘Com­mon Spec­i­fi­ca­tions for cer­tain class D in vit­ro diag­nos­tic med­ical devices’). This doc­u­ment com­pris­es twelve annex­es con­tain­ing tables with offi­cial (and quite chal­leng­ing) sam­ple size require­ments for the state-of-the-art val­i­da­tion of devices intend­ed for detec­tion of blood group anti­gens, HIV, hepati­tis, and oth­er infec­tious dis­eases. These Com­mon Spec­i­fi­ca­tions can great­ly facil­i­tate study design, as the num­ber and spec­i­fi­ca­tions of sam­ples to be pro­cured are clear from the start. How­ev­er, they place high demands on man­u­fac­tur­ers and apply to only a dozen med­ical indications.

This means that for the vast major­i­ty of IVD devices, clin­i­cal per­for­mance study design still relies on sta­tis­ti­cal sam­ple size cal­cu­la­tion. We don’t want to delve into the depths of this math­e­mat­i­cal field here (which fills count­less books and pub­li­ca­tions), but rather give a cou­ple of hints to make life eas­i­er of a non-statistician.
First of all, there is not one cor­rect way to con­duct a sam­ple size cal­cu­la­tion. There are mul­ti­ple sta­tis­ti­cal tests that apply to dif­fer­ent study designs. For IVD devices, how­ev­er, it usu­al­ly (but not always) comes down to two types of sta­tis­ti­cal tests that aim at achiev­ing the fol­low­ing study goals:

  1. Approach: I want to esti­mate the per­for­mance of my prod­uct with a rea­son­able mar­gin of uncer­tain­ty, OR
  2. Approach: I want to show that the per­for­mance of my prod­uct does not fall below a cer­tain value.

These goals sound sim­i­lar, but the for­mu­las they use and the claims that they pro­vide evi­dence for are dif­fer­ent. If the intend­ed pur­pose of your prod­uct stip­u­lates being non-infe­ri­or to a state-of-the-art device, an esti­ma­tion of the per­for­mance with a two-sided mar­gin of uncer­tain­ty (the first approach) may not be the best way to cal­cu­late sam­ple size for its clin­i­cal study. What you want to know after all is, how many donors you need to recruit to show that your device is no less sen­si­tive than a competitor’s device. This cir­cum­stance is much bet­ter account­ed for by the sec­ond approach.

Know­ing which sta­tis­ti­cal approach for sam­ple size cal­cu­la­tion is appro­pri­ate for my study is not much use to me if I don’t know the for­mu­lae or how to apply them (which is true for most of us). For­tu­nate­ly, there are sev­er­al sta­tis­ti­cal pub­li­ca­tions that pro­vide tables with pre-cal­cu­lat­ed sam­ple sizes for dif­fer­ent expect­ed sen­si­tiv­i­ty/speci­fici­ty-val­ues, mar­gins of uncer­tain­ty (= width of the con­fi­dence inter­val; applies to the first approach), lev­els of dis­ease preva­lence, or min­i­mal accept­ed sen­si­tiv­i­ty/speci­fici­ty-val­ues (applies to the sec­ond approach).

[Infobox: Yes, this means that the out­come of the study (i.e., the per­for­mance of your prod­uct) must already be known dur­ing sam­ple size plan­ning. Remem­ber that the goal of sam­ple size plan­ning is find­ing a num­ber of sam­ples that is large enough to suf­fi­cient­ly sup­port your per­for­mance claim, but not so large that the study becomes eco­nom­i­cal­ly and eth­i­cal­ly unrea­son­able. The expect­ed out­come can be deduced from pre-stud­ies or find­ings from your prod­uct devel­op­ment phase, but also from com­peti­tors’ per­for­mance stud­ies. For instance, if the state-of-the-art in med­i­cine requires that your prod­uct has a cer­tain min­i­mum sen­si­tiv­i­ty, you know that this sen­si­tiv­i­ty should be the expect­ed out­come of your study, oth­er­wise your prod­uct would not be usable (AND you know that you should choose the sec­ond approach to sam­ple size calculation).]

Some exem­plary pub­li­ca­tions are:

  • K. Hajian-Tila­ki / Jour­nal of Bio­med­ical Infor­mat­ics 48 (2014) 193 – 204
  • A. Fla­hault et al. / Jour­nal of Clin­i­cal Epi­demi­ol­o­gy 58 (2005) 859 – 862
  • F. Krum­me­nauer, H‑U. Kauc­zor / Fortschr Rönt­gen­str 174 (2002) 1438 – 1444 (Ger­man)

Using sam­ple sizes from these or sim­i­lar pub­li­ca­tions will not com­pro­mise your con­for­mi­ty assess­ment as long as you pro­vide jus­ti­fi­ca­tion for the sam­ple size cho­sen. The cur­rent state-of-the-art of your prod­uct type or find­ings from pre-stud­ies can be used as justification.
Of course, mul­ti­ple fac­tors not men­tioned here have an influ­ence on sam­ple size esti­ma­tion (e.g., pow­er and p‑value). Also, this arti­cle can­not apply to quan­ti­ta­tive or semi-quan­ti­ta­tive assays. For sim­ple qual­i­ta­tive IVD assays, how­ev­er, the sta­tis­ti­cal com­plex­i­ty is lim­it­ed, which allows the gen­er­al­i­sa­tions made here in the first place. For any­thing more com­plex, con­sult­ing a sta­tis­ti­cian is always the only safe way.