coverpage
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Support files eBooks discount offers and more
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Chapter 1. Statistics
Downloading the sample code
Running the examples
Downloading the data
Inspecting the data
Data scrubbing
Descriptive statistics
Variance
Quantiles
Binning data
Histograms
The normal distribution
Poincaré's baker
Skewness
Comparative visualizations
The importance of visualizations
Adding columns
Comparative visualizations of electorate data
Visualizing the Russian election data
Comparative visualizations
Summary
Chapter 2. Inference
Introducing AcmeContent
Download the sample code
Load and inspect the data
Visualizing the dwell times
The exponential distribution
The central limit theorem
Standard error
Samples and populations
Confidence intervals
Visualizing different populations
Hypothesis testing
Testing a new site design
The t-statistic
Performing the t-test
One-sample t-test
Resampling
Testing multiple designs
Multiple comparisons
The browser simulation
jStat
B1
Plotting probability densities
State and Reagent
Simulating multiple tests
The Bonferroni correction
Analysis of variance
The F-distribution
The F-statistic
The F-test
Effect size
Summary
Chapter 3. Correlation
About the data
Inspecting the data
Visualizing the data
The log-normal distribution
Covariance
Pearson's correlation
Hypothesis testing
Confidence intervals
Regression
Ordinary least squares
Goodness-of-fit and R-square
Multiple linear regression
Matrices
The normal equation
Multiple R-squared
Adjusted R-squared
Collinearity
Prediction
Summary
Chapter 4. Classification
About the data
Inspecting the data
Comparisons with relative risk and odds
The standard error of a proportion
The binomial distribution
Significance testing proportions
Chi-squared multiple significance testing
Classification with logistic regression
Implementing logistic regression with Incanter
Probability
Naive Bayes classification
Decision trees
Classification with clj-ml
Bias and variance
Ensemble learning and random forests
Saving the classifier to a file
Summary
Chapter 5. Big Data
Downloading the code and data
The reducers library
Mathematical folds with Tesser
Multiple regression with gradient descent
Scaling gradient descent with Hadoop
Stochastic gradient descent
Summary
Chapter 6. Clustering
Downloading the data
Extracting the data
Inspecting the data
Clustering text
Creating term frequency vectors
Clustering with k-means and Incanter
Better clustering with TF-IDF
Large-scale clustering with Mahout
Running k-means clustering with Mahout
Cluster evaluation measures
The drawbacks of k-means
The curse of dimensionality
Summary
Chapter 7. Recommender Systems
Download the code and data
Inspect the data
Parse the data
Types of recommender systems
Item-based and user-based recommenders
Slope One recommenders
Building a user-based recommender with Mahout
k-nearest neighbors
Recommender evaluation with Mahout
Probabilistic methods for large sets
Jaccard similarity for large sets with MinHash
Dimensionality reduction
Large-scale machine learning with Apache Spark and MLlib
Machine learning on Spark with MLlib
Summary
Chapter 8. Network Analysis
Download the data
Graph traversal with Loom
Breadth-first and depth-first search
Finding the shortest path
Whole-graph analysis
Scale-free networks
Distributed graph computation with GraphX
Summary
Chapter 9. Time Series
About the data
Fitting curves with a linear model
Time series decomposition
Discrete time models
Maximum likelihood estimation
Time series forecasting
Summary
Chapter 10. Visualization
Download the code and data
Exploratory data visualization
Using Quil for visualization
Visualization for communication
Summary
Index
更新时间:2021-07-16 20:05:34