- Clojure for Data Science
- Henry Garner
- 256字
- 2025-02-24 01:23:13
The t-statistic
While using the t-distribution, we look up the t-statistic. Like the z-statistic, this value quantifies how unlikely a particular observed deviation is. For a dual sample t-test, the t-statistic is calculated in the following way:
data:image/s3,"s3://crabby-images/a663e/a663ee1c5184877b5d08bf72895ed70f63e43b15" alt="The t-statistic"
Here, is the pooled standard error. We could calculate the pooled standard error in the same way as we did earlier:
data:image/s3,"s3://crabby-images/7f17f/7f17f16c138ebf8dae95e477d78cbd572a9fe021" alt="The t-statistic"
However, the equation assumes knowledge of the population parameters σa and σb, which can only be approximated from large samples. The t-test is designed for small samples and does not require us to make assumptions about population variance.
As a result, for the t-test, we write the pooled standard error as the square root of the sum of the standard errors:
data:image/s3,"s3://crabby-images/a87ad/a87ad80b28cf076dcad959d6298273eeceff3567" alt="The t-statistic"
In practice, the earlier two equations for the pooled standard error yield identical results, given the same input sequences. The difference in notation just serves to illustrate that with the t-test, we depend only on sample statistics as input. The pooled standard error can be calculated in the following way:
(defn pooled-standard-error [a b] (i/sqrt (+ (i/sq (standard-error a)) (i/sq (standard-error b)))))
Although they are represented differently in mathematical notation, in practice, the calculation of t-statistic is identical to z-statistic:
(def t-stat z-stat) (defn ex-2-15 [] (let [data (->> (load-data "new-site.tsv") (:rows) (group-by :site) (map-vals (partial map :dwell-time))) a (get data 0) b (get data 1)] (t-stat a b))) ;; -1.647
The difference between the two statistics is conceptual rather than algorithmic—the z-statistic is only applicable when the samples follow a normal distribution.