- Clojure for Data Science
- Henry Garner
- 256字
- 2025-02-24 01:23:13
The t-statistic
While using the t-distribution, we look up the t-statistic. Like the z-statistic, this value quantifies how unlikely a particular observed deviation is. For a dual sample t-test, the t-statistic is calculated in the following way:

Here, is the pooled standard error. We could calculate the pooled standard error in the same way as we did earlier:

However, the equation assumes knowledge of the population parameters σa and σb, which can only be approximated from large samples. The t-test is designed for small samples and does not require us to make assumptions about population variance.
As a result, for the t-test, we write the pooled standard error as the square root of the sum of the standard errors:

In practice, the earlier two equations for the pooled standard error yield identical results, given the same input sequences. The difference in notation just serves to illustrate that with the t-test, we depend only on sample statistics as input. The pooled standard error can be calculated in the following way:
(defn pooled-standard-error [a b] (i/sqrt (+ (i/sq (standard-error a)) (i/sq (standard-error b)))))
Although they are represented differently in mathematical notation, in practice, the calculation of t-statistic is identical to z-statistic:
(def t-stat z-stat) (defn ex-2-15 [] (let [data (->> (load-data "new-site.tsv") (:rows) (group-by :site) (map-vals (partial map :dwell-time))) a (get data 0) b (get data 1)] (t-stat a b))) ;; -1.647
The difference between the two statistics is conceptual rather than algorithmic—the z-statistic is only applicable when the samples follow a normal distribution.