This course was developed by Jeffrey Girard and Shirley Wang for the Pittsburgh Summer Methodology Series (July 19-23, 2021).


Jeffrey Girard
University of Kansas
Shirley Wang
Harvard University

Whereas statistical methods traditionally used in the social and behavioral sciences emphasize interpretability and quantification of uncertainty, machine learning methods emphasize complexity and accuracy of predictions. Machine learning methods are thus particularly well-suited for applications where (1) there are nonlinear and complex relationships among a large number of predictor variables and (2) accurately predicting the outcome variable is more important than fully understanding the relationships between variables.

This workshop will provide a hands-on introduction to the application of machine learning techniques in R using the {caret} package (with elements drawn from {tidymodels} such as {recipes} and {yardstick}). It will emphasize practical knowledge and conceptual intuitions (e.g., teaching you how to drive a car) rather than technical and theoretical mastery (e.g., teaching you how to build a car). In addition, rather than briefly surveying the full breadth of available machine learning techniques, this workshop will provide a deep dive into three supervised learning methods with broad applicability in the social and behavioral sciences: regularized regression models, decision tree-based models, and support vector machines. The final day of the workshop will also provide an opportunity for attendees to consult with the instructors on implementation of machine learning methods in their own research. Taken together, this workshop’s practical focus will allow attendees to learn about: formulating a good research question, preparing data for analysis, setting up a rigorous cross-validation procedure, evaluating predictive performance, and interpreting and reporting results for a scientific audience.

Although attendees of all backgrounds are welcome and the skills taught will be broadly applicable, example datasets and advice will be tailored specifically to the social and behavioral sciences (e.g., psychology, medicine, education, and related fields). Workshop attendees are not expected to have any background knowledge of machine learning, but some proficiency with R (e.g., knowledge of how to import data and manipulate data frames) will be assumed and some familiarity with statistical modeling (e.g., linear models) will be helpful.


These materials are made freely available and may be re-used according to the CC-BY License.