Tutorial: Conditional Logit Model on ModeCanada Dataset

Author: Tianyu Du (tianyudu@stanford.edu)

Update: May. 3, 2022

Reference: This tutorial is modified from the Random utility model and the multinomial logit model in th documentation of mlogit package in R.

Please note that the dataset involved in this example is fairly small (2,779 choice records), so we don't expect the performance to be faster than the R implementation.

We provide this tutorial mainly to check the correctness of our prediction. The fully potential of PyTorch is better exploited on much larger dataset.

The executable Jupyter notebook for this tutorial is located at Random Utility Model (RUM) 1: Conditional Logit Model.

Let's first import essential Python packages.

from time import time
import pandas as pd
import torch

from torch_choice.data import ChoiceDataset, utils
from torch_choice.model import ConditionalLogitModel

from torch_choice import run

/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: dlopen(/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torchvision/image.so, 0x0006): Symbol not found: __ZN2at4_ops19empty_memory_format4callEN3c108ArrayRefIxEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE
  Referenced from: <B3E58761-2785-34C6-A89B-F37110C88A05> /Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torchvision/image.so
  Expected in:     <AE6DCE26-A528-35ED-BB3D-88890D27E6B9> /Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.dylib
  warn(f"Failed to load image Python extension: {e}")

This tutorial will run both with and without graphic processing unit (GPU). However, our package is much faster with GPU.

if torch.cuda.is_available():
    print(f'CUDA device used: {torch.cuda.get_device_name()}')
    device = 'cuda'
else:
    print('Running tutorial on CPU.')
    device = 'cpu'

Running tutorial on CPU.

Load Dataset

We have included the ModeCanada dataset in our package, which is located at ./public_datasets/.

The ModeCanada dataset contains individuals' choice on traveling methods.

The raw dataset is in a long-format, in which the case variable identifies each choice. Using the terminology mentioned in the data management tutorial, each choice is called a purchasing record (i.e., consumer bought the ticket of a particular travelling mode), and the total number of choices made is denoted as \(B\).

For example, the first four row below (with case == 109) corresponds to the first choice, the alt column lists all alternatives/items available.

The choice column identifies which alternative/item is chosen. The second row in the data snapshot below, we have choice == 1 and alt == 'air' for case == 109. This indicates the travelling mode chosen in case = 109 was air.

Now we convert the raw dataset into the format compatible with our model, for a detailed tutorial on the compatible formats, please refer to the data management tutorial.

We focus on cases when four alternatives were available by filtering noalt == 4.

df = pd.read_csv('./public_datasets/ModeCanada.csv')
df = df.query('noalt == 4').reset_index(drop=True)
df.sort_values(by='case', inplace=True)
df.head()

	Unnamed: 0	case	alt	choice	dist	cost	ivt	ovt	freq	income	noalt
0	304	109	train	0	377	58.25	215	74	4	45	4
1	305	109	air	1	377	142.80	56	85	9	45	4
2	306	109	bus	0	377	27.52	301	63	8	45	4
3	307	109	car	0	377	71.63	262	0	0	45	4
4	308	110	train	0	377	58.25	215	74	4	70	4

Since there are 4 rows corresponding to each purchasing record, the length of the long-format data is \(4 \times B\). Please refer to the data management tutorial for notations.

df.shape

(11116, 12)

Construct the `item_index` tensor

The first thing is to construct the item_index tensor identifying which item (i.e., travel mode) was chosen in each purchasing record.

We can now construct the item_index array containing which item was chosen in each purchasing record.

item_index = df[df['choice'] == 1].sort_values(by='case')['alt'].reset_index(drop=True)
print(item_index)

0       air
1       air
2       air
3       air
4       air
       ... 
2774    car
2775    car
2776    car
2777    car
2778    car
Name: alt, Length: 2779, dtype: object

Since we will be training our model using PyTorch, we need to encode {'air', 'bus', 'car', 'train'} into integer values.

Travel Mode Name	Encoded Integer Values
air	0
bus	1
car	2
train	3

The generated item_index would be a tensor of shape 2,778 (i.e., number of purchasing records in this dataset) with values {0, 1, 2, 3}.

item_names = ['air', 'bus', 'car', 'train']
num_items = 4
encoder = dict(zip(item_names, range(num_items)))
print(f"{encoder=:}")
item_index = item_index.map(lambda x: encoder[x])
item_index = torch.LongTensor(item_index)
print(f"{item_index=:}")

encoder={'air': 0, 'bus': 1, 'car': 2, 'train': 3}
item_index=tensor([0, 0, 0,  ..., 2, 2, 2])

Construct Observables

Then let's constrct tensors for observables. As mentioned in the data management tutorial, the session is capturing the temporal dimension of our data. Since we have different values cost, freq and ovt for each purchasing record and for each item, it's natural to say each purchasing record has its own session.

Consequently, these three variables are price observables since they vary by both item and session. The tensor holding these observables has shape \((\text{numer of purchasing records}, \text{number of items}, 3)\)

We do the same for variable ivt, we put ivt into a separate tensor because we want to model its coefficient differently later.

price_cost_freq_ovt = utils.pivot3d(df, dim0='case', dim1='alt',
                                    values=['cost', 'freq', 'ovt'])
print(f'{price_cost_freq_ovt.shape=:}')

price_ivt = utils.pivot3d(df, dim0='case', dim1='alt', values='ivt')
print(f'{price_ivt.shape=:}')

price_cost_freq_ovt.shape=torch.Size([2779, 4, 3])
price_ivt.shape=torch.Size([2779, 4, 1])

In contrast, the income variable varies only by session (i.e., purchasing record), but not by item. income is therefore naturally a session variable.

session_income = df.groupby('case')['income'].first()
session_income = torch.Tensor(session_income.values).view(-1, 1)
print(f'{session_income.shape=:}')

session_income.shape=torch.Size([2779, 1])

To summarize, the ChoiceDataset constructed contains 2779 choice records. Since the original dataset did not reveal the identity of each decision maker, we consider all 2779 choices were made by a single user but in 2779 different sessions to handle variations.

In this case, the cost, freq and ovt are observables depending on both sessions and items, we created a price_cost_freq_ovt tensor with shape (num_sessions, num_items, 3) = (2779, 4, 3) to contain these variables. In contrast, the income information depends only on session but not on items, hence we create the session_income tensor to store it.

Because we wish to fit item-specific coefficients for the ivt variable, which varies by both sessions and items as well, we create another price_ivt tensor in addition to the price_cost_freq_ovt tensor.

Lastly, we put all tensors we created to a single ChoiceDataset object, and move the dataset to the appropriate device.

dataset = ChoiceDataset(item_index=item_index,
                        price_cost_freq_ovt=price_cost_freq_ovt,
                        session_income=session_income,
                        price_ivt=price_ivt
                        ).to(device)

No `session_index` is provided, assume each choice instance is in its own session.

You can print(dataset) to check shapes of tensors contained in the ChoiceDataset.

print(dataset)

ChoiceDataset(label=[], item_index=[2779], user_index=[], session_index=[2779], item_availability=[], price_cost_freq_ovt=[2779, 4, 3], session_income=[2779, 1], price_ivt=[2779, 4, 1], device=cpu)

Create the Model

We now construct the ConditionalLogitModel to fit the dataset we constructed above.

To start with, we aim to estimate the following model formulation:

\[ U_{uit} = \beta^0_i + \beta^{1\top} X^{price: (cost, freq, ovt)}_{it} + \beta^2_i X^{session:income}_t + \beta^3_i X_{it}^{price:ivt} + \epsilon_{uit} \]

We now initialize the ConditionalLogitModel to predict choices from the dataset. Please see the documentation for a complete description of the ConditionalLogitModel class.

At it's core, the ConditionalLogitModel constructor requires the following four components.

Define variation of each \(\beta\) using `coef_variation_dict`

The keyword coef_variation_dict is a dictionary with variable names (defined above while constructing the dataset) as keys and values from {constant, user, item, item-full}.

For instance, since we wish to have constant coefficients for cost, freq and ovt observables, and these three observables are stored in the price_cost_freq_ovt tensor of the choice dataset, we set coef_variation_dict['price_cost_freq_ovt'] = 'constant' (corresponding to the \(\beta^{1\top} X^{price: (cost, freq, ovt)}_{it}\) term above).

The models allows for the option of zeroing coefficient for one item. The variation of \(\beta^3\) above is specified as item-full which indicates 4 values of \(\beta^3\) is learned (one for each item). In contrast, \(\beta^0, \beta^2\) are specified to have variation item instead of item-full. In this case, the \(\beta\) correspond to the first item (i.e., the baseline item, which is encoded as 0 in the label tensor, air in our example) is force to be zero.

The researcher needs to declare intercept explicitly for the model to fit an intercept as well, otherwise the model assumes zero intercept term.

Define the dimension of each \(\beta\) using `num_param_dict`

The num_param_dict is a dictionary with keys exactly the same as the coef_variation_dict. Each of dictionary values tells the dimension of the corresponding observables, hence the dimension of the coefficient. For example, the price_cost_freq_ovt consists of three observables and we set the corresponding to three.

Even the model can infer num_param_dict['intercept'] = 1, but we recommend the research to include it for completeness.

Number of items

The num_items keyword informs the model how many alternatives users are choosing from.

Number of users

The num_users keyword is an optional integer informing the model how many users there are in the dataset. However, in this example we implicitly assume there is only one user making all the decisions and we do not have any user_obs involved, hence num_users argument is not supplied.

model = ConditionalLogitModel(coef_variation_dict={'price_cost_freq_ovt': 'constant',
                                                   'session_income': 'item',
                                                   'price_ivt': 'item-full',
                                                   'intercept': 'item'},
                              num_param_dict={'price_cost_freq_ovt': 3,
                                              'session_income': 1,
                                              'price_ivt': 1,
                                              'intercept': 1},
                              num_items=4)

Then we move the model to the appropriate device.

model = model.to(device)

One can print the ConditionalLogitModel object to obtain a summary of the model.

print(model)

ConditionalLogitModel(
  (coef_dict): ModuleDict(
    (price_cost_freq_ovt[constant]): Coefficient(variation=constant, num_items=4, num_users=None, num_params=3, 3 trainable parameters in total, device=cpu).
    (session_income[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
    (price_ivt[item-full]): Coefficient(variation=item-full, num_items=4, num_users=None, num_params=1, 4 trainable parameters in total, device=cpu).
    (intercept[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
  )
)
Conditional logistic discrete choice model, expects input features:

X[price_cost_freq_ovt[constant]] with 3 parameters, with constant level variation.
X[session_income[item]] with 1 parameters, with item level variation.
X[price_ivt[item-full]] with 1 parameters, with item-full level variation.
X[intercept[item]] with 1 parameters, with item level variation.
device=cpu

Creating Model using Formula

Alternatively, researchers can create the model using a formula like in R.

The formula consists of a list of additive terms separated by + sign, and each term looks like (variable_name|variation). Where variable_name is the name of the variable in the dataset, and variation is one of constant, user, item, item-full. Initializing the model using formula requires you to pass in the ChoiceDataset object as well so that the model can infer the dimension of each variable.

These two ways of creating models lead to equivalent models.

model = model = ConditionalLogitModel(
    formula='(price_cost_freq_ovt|constant) + (session_income|item) + (price_ivt|item-full) + (intercept|item)',
    dataset=dataset,
    num_items=4)

print(model)

ConditionalLogitModel(
  (coef_dict): ModuleDict(
    (price_cost_freq_ovt[constant]): Coefficient(variation=constant, num_items=4, num_users=None, num_params=3, 3 trainable parameters in total, device=cpu).
    (session_income[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
    (price_ivt[item-full]): Coefficient(variation=item-full, num_items=4, num_users=None, num_params=1, 4 trainable parameters in total, device=cpu).
    (intercept[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
  )
)
Conditional logistic discrete choice model, expects input features:

X[price_cost_freq_ovt[constant]] with 3 parameters, with constant level variation.
X[session_income[item]] with 1 parameters, with item level variation.
X[price_ivt[item-full]] with 1 parameters, with item-full level variation.
X[intercept[item]] with 1 parameters, with item level variation.
device=cpu

Train the Model

We provide an easy-to-use helper function run() imported from torch_choice.utils.run_helper to fit the model with a particular dataset.

We provide an easy-to-use model runner for both ConditionalLogitModel and NestedLogitModel (see later) instances.

The run() mehtod supports mini-batch updating as well, for small datasets like the one we are dealing right now, we can use batch_size = -1 to conduct full-batch gradient update.

Here we use the LBFGS optimizer since we are working on a small dataset with only 2,779 choice records and 13 coefficients to be estimated. For larger datasets and larger models, we recommend using the Adam optimizer instead.

start_time = time()
run(model, dataset, num_epochs=500, learning_rate=0.01, model_optimizer="LBFGS", batch_size=-1)
print('Time taken:', time() - start_time)

==================== model received ====================
ConditionalLogitModel(
  (coef_dict): ModuleDict(
    (price_cost_freq_ovt[constant]): Coefficient(variation=constant, num_items=4, num_users=None, num_params=3, 3 trainable parameters in total, device=cpu).
    (session_income[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
    (price_ivt[item-full]): Coefficient(variation=item-full, num_items=4, num_users=None, num_params=1, 4 trainable parameters in total, device=cpu).
    (intercept[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
  )
)
Conditional logistic discrete choice model, expects input features:

X[price_cost_freq_ovt[constant]] with 3 parameters, with constant level variation.
X[session_income[item]] with 1 parameters, with item level variation.
X[price_ivt[item-full]] with 1 parameters, with item-full level variation.
X[intercept[item]] with 1 parameters, with item level variation.
device=cpu
==================== data set received ====================
[Train dataset] ChoiceDataset(label=[], item_index=[2779], user_index=[], session_index=[2779], item_availability=[], price_cost_freq_ovt=[2779, 4, 3], session_income=[2779, 1], price_ivt=[2779, 4, 1], device=cpu)
[Validation dataset] None
[Test dataset] None


GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/setup.py:201: UserWarning: MPS available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='mps', devices=1)`.
  rank_zero_warn(
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:108: PossibleUserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
  rank_zero_warn(

  | Name  | Type                  | Params
------------------------------------------------
0 | model | ConditionalLogitModel | 13    
------------------------------------------------
13        Trainable params
0         Non-trainable params
13        Total params
0.000     Total estimated model params size (MB)
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 10 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1609: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=5). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(


Epoch 499: 100%|██████████| 1/1 [00:00<00:00, 76.72it/s, loss=1.87e+03, v_num=32]

`Trainer.fit` stopped: `max_epochs=500` reached.


Epoch 499: 100%|██████████| 1/1 [00:00<00:00, 69.67it/s, loss=1.87e+03, v_num=32]
Time taken for training: 11.521849155426025
Skip testing, no test dataset is provided.
==================== model results ====================
Log-likelihood: [Training] -1874.3427734375, [Validation] N/A, [Test] N/A

| Coefficient                     |   Estimation |   Std. Err. |    z-value |    Pr(>|z|) | Significance   |
|:--------------------------------|-------------:|------------:|-----------:|------------:|:---------------|
| price_cost_freq_ovt[constant]_0 |  -0.0333421  |  0.00709556 |  -4.69901  | 2.61425e-06 | ***            |
| price_cost_freq_ovt[constant]_1 |   0.0925304  |  0.00509757 |  18.1518   | 0           | ***            |
| price_cost_freq_ovt[constant]_2 |  -0.0430032  |  0.00322472 | -13.3355   | 0           | ***            |
| session_income[item]_0          |  -0.0890796  |  0.0183469  |  -4.8553   | 1.20205e-06 | ***            |
| session_income[item]_1          |  -0.0279925  |  0.00387254 |  -7.22846  | 4.88498e-13 | ***            |
| session_income[item]_2          |  -0.038146   |  0.00408307 |  -9.34248  | 0           | ***            |
| price_ivt[item-full]_0          |   0.0595089  |  0.0100727  |   5.90794  | 3.46418e-09 | ***            |
| price_ivt[item-full]_1          |  -0.00678188 |  0.00443289 |  -1.5299   | 0.126042    |                |
| price_ivt[item-full]_2          |  -0.00645982 |  0.00189848 |  -3.40262  | 0.000667424 | ***            |
| price_ivt[item-full]_3          |  -0.00145029 |  0.00118748 |  -1.22132  | 0.221965    |                |
| intercept[item]_0               |   0.697311   |  1.28022    |   0.544681 | 0.585973    |                |
| intercept[item]_1               |   1.8437     |  0.708514   |   2.6022   | 0.0092627   | **             |
| intercept[item]_2               |   3.27381    |  0.624416   |   5.24299  | 1.57999e-07 | ***            |
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Time taken: 11.617464065551758

Parameter Estimation from `R`

The following is the R-output from the mlogit implementation, the estimation, standard error, and log-likelihood from our torch_choice implementation is the same as the result from mlogit implementation.

We see that the final log-likelihood of models estimated using two packages are all around -1874.

The run() method calculates the standard deviation using \(\sqrt{\text{diag}(H^{-1})}\), where \(H\) is the hessian of negative log-likelihood with repsect to model parameters.

Names of coefficients are slightly different, one can use the following conversion table to compare estimations and standard deviations reported by both packages.

R Output

install.packages("mlogit")
library("mlogit")
data("ModeCanada", package = "mlogit")
MC <- dfidx(ModeCanada, subset = noalt == 4)
ml.MC1 <- mlogit(choice ~ cost + freq + ovt | income | ivt, MC, reflevel='air')

summary(ml.MC1)

Call:
mlogit(formula = choice ~ cost + freq + ovt | income | ivt, data = MC, 
    reflevel = "air", method = "nr")

Frequencies of alternatives:choice
      air     train       bus       car 
0.3738755 0.1666067 0.0035984 0.4559194 

nr method
9 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.00014 
successive function values within tolerance limits 

Coefficients :
                    Estimate Std. Error  z-value  Pr(>|z|)    
(Intercept):train  3.2741952  0.6244152   5.2436 1.575e-07 ***
(Intercept):bus    0.6983381  1.2802466   0.5455 0.5854292    
(Intercept):car    1.8441129  0.7085089   2.6028 0.0092464 ** 
cost              -0.0333389  0.0070955  -4.6986 2.620e-06 ***
freq               0.0925297  0.0050976  18.1517 < 2.2e-16 ***
ovt               -0.0430036  0.0032247 -13.3356 < 2.2e-16 ***
income:train      -0.0381466  0.0040831  -9.3426 < 2.2e-16 ***
income:bus        -0.0890867  0.0183471  -4.8556 1.200e-06 ***
income:car        -0.0279930  0.0038726  -7.2286 4.881e-13 ***
ivt:air            0.0595097  0.0100727   5.9080 3.463e-09 ***
ivt:train         -0.0014504  0.0011875  -1.2214 0.2219430    
ivt:bus           -0.0067835  0.0044334  -1.5301 0.1259938    
ivt:car           -0.0064603  0.0018985  -3.4029 0.0006668 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -1874.3
McFadden R^2:  0.35443 
Likelihood ratio test : chisq = 2058.1 (p.value = < 2.22e-16)