torchchoice
Authors: Tianyu Du, Ayush Kanodia and Susan Athey; Contact: tianyudu@stanford.edu
Acknowledgements: We would like to thank Erik Sverdrup, Charles Pebereau and Keshav Agrawal for their feedback.
torchchoice
is a library for flexible, fast choice modeling with PyTorch: it has logit and nested logit models, designed for both estimation and prediction. See the complete documentation for more details.
Unique features:
1. GPU support via torch for speed
2. Specify customized models
3. Specify availability sets
4. Maximum Likelihood Estimation (MLE) (optionally, reporting standard errors or MAP inference with Bayesian Priors on coefficients)
5. Estimation via minimization of Cross Entropy Loss (optionally with L1/L2 regularization)
Introduction
Logistic Regression and Choice Models
Logistic Regression models the probability that user \(u\) chooses item \(i\) in session \(s\) by the logistic function
where,
here \(X\), \(W\) are predictors (independent variables) for users and items respectively (these can be constant or can vary across session), and greek letters \(\alpha\), \(\beta\) and \(\gamma\) are learned parameters. \(A_{us}\) is the set of items available for user \(u\) in session \(s\).
When users are choosing over items, we can write utility \(U_{uis}\) that user \(u\) derives from item \(i\) in session \(s\), as
where \(\epsilon\) is an unobserved random error term.
If we assume iid extreme value type 1 errors for \(\epsilon_{uis}\), this leads to the above logistic probabilities of user \(u\) choosing item \(i\) in session \(s\), as shown by McFadden, and as often studied in Econometrics.
Package
We implement a fully flexible setup, where we allow 1. coefficients (\(\alpha\), \(\beta\), \(\gamma\), \(\dots\)) to be constant, userspecific (i.e., \(\alpha=\alpha_u\)), itemspecific (i.e., \(\alpha=\alpha_i\)), sessionspecific (i.e., \(\alpha=\alpha_t\)), or (session, item)specific (i.e., \(\alpha=\alpha_{ti}\)). For example, specifying \(\alpha\) to be itemspecific is equivalent to adding an itemlevel fixed effect. 2. Observables (\(X\), \(Y\), \(\dots\)) to be constant, userspecific, itemspecific, sessionspecific, or (session, item) (such as price) and (session, user) (such as income) specific as well. 3. Specifying availability sets \(A_{us}\)
This flexibility in coefficients and features allows for more than 20 types of additive terms to \(U_{uis}\), which enables modelling rich structures.
As such, this package can be used to learn such models for 1. Parameter Estimation, as in the Transportation Choice example below 2. Prediction, as in the MNIST handwritten digits classification example below
Examples with Utility Form: 1. Transportation Choice (from the Mode Canada dataset) (Detailed Tutorial)
This is also described as a conditional logit model in Econometrics. We note the shapes/sizes of each of the components in the above model. Suppose there are U users, I items and S sessions; in this case there is one user per session, so that U = S
Then,  \(X^{itemsession: (cost, freq, ovt)}_{it}\) is a matrix of size (I x S) x (3); it has three entries for each itemsession, and is like a price; its coefficient \(\beta^{1}\) has constant variation and is of size (1) x (3).  \(X^{session: income}_{it}\) is a matrix which is of size (S) x (1); it has one entry for each session, and it denotes income of the user making the choice in the session. In this case, it is equivalent to \(X^{usersession: income}_{it}\) since we observe a user making a decision only once; its coefficient \(\beta^2_i\) has item level variation and is of size (I) x (1)  \(X_{it}^{itemsession:ivt}\) is a matrix of size (I x S) x (1); this has one entry for each itemsession; it is the price; its coefficent \(\beta^3_i\) has item level variation and is of size (I) x (3)
 MNIST classification (Upcoming Detailed Tutorial)
We note the shapes/sizes of each of the components in the above model. Suppose there are U users, I items and S sessions; in this case, an item is one of the 10 possible digits, so I = 10; there is one user per session, so that U=S; and each session is an image being classified. Then,  \(X^{session:pixelvalues}_{t}\) is a matrix of size (S) x (H x W) where H x W are the dimensions of the image being classified; its coefficient \(\beta_i\) has item level vartiation and is of size (I) x (1)
This is a classic problem used for exposition in Computer Science to motivate various Machine Learning models. There is no concept of a user in this setup. Our package allows for models of this nature and is fully usable for Machine Learning problems with added flexibility over scikitlearn logistic regression
We highly recommend users to go through tutorials we prepared to get a better understanding of what the package offers. We present multiple examples, and for each case we specify the utility form.
Installation
 Clone the repository to your local machine or server.
 Install required dependencies using:
pip3 install r requirements.txt
.  Run
pip3 install torchchoice
.  Check installation by running
python3 c 'import torch_choice; print(torch_choice.__version__)'
.
The installation page provides more details on installation.
Example Usage  Transportation Choice Dataset
In this demonstration, we setup a minimal example of fitting a conditional logit model using our package. We provide equivalent R code as well for reference, to aid replicating from R to this package.
We are modelling people's choices on transportation modes using the publicly available ModeCanada
dataset.
More information about the ModeCanada: Mode Choice for the MontrealToronto Corridor.
In this example, we are estimating the utility for user \(u\) to choose transport method \(i\) in session \(s\) as $$ U_{uis} = \alpha_i + \beta_i \text{income}_s + \gamma \text{cost} + \delta \text{freq} + \eta \text{ovt} + \iota_i \text{ivt} + \varepsilon $$ this is equivalent to the functional form described in the previous section
Mode Canada with TorchChoice
# load packages.
import pandas as pd
import torch_choice
# load data.
df = pd.read_csv('https://raw.githubusercontent.com/gsbDBI/torchchoice/main/tutorials/public_datasets/ModeCanada.csv?token=GHSAT0AAAAAABRGHCCSNNQARRMU63W7P7F4YWYP5HA').query('noalt == 4').reset_index(drop=True)
# format data.
data = torch_choice.utils.easy_data_wrapper.EasyDatasetWrapper(
main_data=df,
purchase_record_column='case',
choice_column='choice',
item_name_column='alt',
user_index_column='case',
session_index_column='case',
session_observable_columns=['income'],
price_observable_columns=['cost', 'freq', 'ovt', 'ivt'])
# define the conditional logit model.
model = torch_choice.model.ConditionalLogitModel(
coef_variation_dict={'price_cost': 'constant',
'price_freq': 'constant',
'price_ovt': 'constant',
'session_income': 'item',
'price_ivt': 'itemfull',
'intercept': 'item'},
num_items=4)
# fit the conditional logit model.
torch_choice.utils.run_helper.run(model, data.choice_dataset, num_epochs=5000, learning_rate=0.01, batch_size=1)
Mode Canada with R
We include the R code for the ModeCanada example as well.
# load packages.
library("mlogit")
# load data.
ModeCanada < read.csv('https://raw.githubusercontent.com/gsbDBI/torchchoice/main/tutorials/public_datasets/ModeCanada.csv?token=GHSAT0AAAAAABRGHCCSNNQARRMU63W7P7F4YWYP5HA')
ModeCanada < select(ModeCanada, X)
ModeCanada$alt < as.factor(ModeCanada$alt)
# format data.
MC < dfidx(ModeCanada, subset = noalt == 4)
# fit the data.
ml.MC1 < mlogit(choice ~ cost + freq + ovt  income  ivt, MC, reflevel='air')
summary(ml.MC1)
What's in the package?
Overall, the torchchoice
package offers the following features:

The package includes a data management module called
ChoiceDataset
, which is built upon PyTorch's dataset module. Our dataset implementation allows users to easily move data between CPU and GPU. Unlike traditional long or wide formats, theChoiceDataset
offers a memoryefficient way to manage observables. 
The package provides a (1) conditional logit model and (2) a nested logit model for consumer choice modeling.

The package leverage GPU acceleration using PyTorch and easily scale to large dataset of millions of choice records. All models are trained using stateoftheart optimizers by in PyTorch. These optimization algorithms are tested to be scalable by modern machine learning practitioners. However, you can rest assure that the package runs flawlessly when no GPU is used as well.

Setting up the PyTorch training pipelines can be frustrating. We provide easytouse PyTorch lightning wrapper of models to free researchers from the hassle from setting up PyTorch optimizers and training loops.