# Tutorial: Conditional Logit Model on ModeCanada Dataset

**Author: Tianyu Du (tianyudu@stanford.edu)**

**Update: May. 3, 2022**

**Reference:** This tutorial is modified from the Random utility model and the multinomial logit model in th documentation of `mlogit`

package in R.

Please note that the dataset involved in this example is fairly small (2,779 choice records), so we don't expect the performance to be faster than the R implementation.

We provide this tutorial mainly to check the correctness of our prediction. The fully potential of PyTorch is better exploited on much larger dataset.

The executable Jupyter notebook for this tutorial is located at Random Utility Model (RUM) 1: Conditional Logit Model.

Let's first import essential Python packages.

```
from time import time
import pandas as pd
import torch
from torch_choice.data import ChoiceDataset, utils
from torch_choice.model import ConditionalLogitModel
from torch_choice import run
```

```
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: dlopen(/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torchvision/image.so, 0x0006): Symbol not found: __ZN2at4_ops19empty_memory_format4callEN3c108ArrayRefIxEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE
Referenced from: <B3E58761-2785-34C6-A89B-F37110C88A05> /Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torchvision/image.so
Expected in: <AE6DCE26-A528-35ED-BB3D-88890D27E6B9> /Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.dylib
warn(f"Failed to load image Python extension: {e}")
```

This tutorial will run both with and without graphic processing unit (GPU). However, our package is *much* faster with GPU.

```
if torch.cuda.is_available():
print(f'CUDA device used: {torch.cuda.get_device_name()}')
device = 'cuda'
else:
print('Running tutorial on CPU.')
device = 'cpu'
```

```
Running tutorial on CPU.
```

## Load Dataset

We have included the `ModeCanada`

dataset in our package, which is located at `./public_datasets/`

.

The `ModeCanada`

dataset contains individuals' choice on traveling methods.

The raw dataset is in a long-format, in which the `case`

variable identifies each choice.
Using the terminology mentioned in the data management tutorial, each choice is called a *purchasing record* (i.e., consumer bought the ticket of a particular travelling mode), and the total number of choices made is denoted as \(B\).

For example, the first four row below (with `case == 109`

) corresponds to the first choice, the `alt`

column lists all alternatives/items available.

The `choice`

column identifies which alternative/item is chosen. The second row in the data snapshot below, we have `choice == 1`

and `alt == 'air'`

for `case == 109`

. This indicates the travelling mode chosen in `case = 109`

was `air`

.

Now we convert the raw dataset into the format compatible with our model, for a detailed tutorial on the compatible formats, please refer to the data management tutorial.

We focus on cases when four alternatives were available by filtering `noalt == 4`

.

```
df = pd.read_csv('./public_datasets/ModeCanada.csv')
df = df.query('noalt == 4').reset_index(drop=True)
df.sort_values(by='case', inplace=True)
df.head()
```

Unnamed: 0 | case | alt | choice | dist | cost | ivt | ovt | freq | income | urban | noalt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 304 | 109 | train | 0 | 377 | 58.25 | 215 | 74 | 4 | 45 | 0 | 4 |

1 | 305 | 109 | air | 1 | 377 | 142.80 | 56 | 85 | 9 | 45 | 0 | 4 |

2 | 306 | 109 | bus | 0 | 377 | 27.52 | 301 | 63 | 8 | 45 | 0 | 4 |

3 | 307 | 109 | car | 0 | 377 | 71.63 | 262 | 0 | 0 | 45 | 0 | 4 |

4 | 308 | 110 | train | 0 | 377 | 58.25 | 215 | 74 | 4 | 70 | 0 | 4 |

Since there are 4 rows corresponding to each *purchasing record*, the length of the long-format data is \(4 \times B\).
Please refer to the data management tutorial for notations.

```
(11116, 12)
```

### Construct the `item_index`

tensor

The first thing is to construct the `item_index`

tensor identifying which item (i.e., travel mode) was chosen in each purchasing record.

We can now construct the `item_index`

array containing which item was chosen in each purchasing record.

```
item_index = df[df['choice'] == 1].sort_values(by='case')['alt'].reset_index(drop=True)
print(item_index)
```

```
0 air
1 air
2 air
3 air
4 air
...
2774 car
2775 car
2776 car
2777 car
2778 car
Name: alt, Length: 2779, dtype: object
```

Since we will be training our model using `PyTorch`

, we need to encode `{'air', 'bus', 'car', 'train'}`

into integer values.

Travel Mode Name | Encoded Integer Values |
---|---|

air | 0 |

bus | 1 |

car | 2 |

train | 3 |

The generated `item_index`

would be a tensor of shape 2,778 (i.e., number of purchasing records in this dataset) with values `{0, 1, 2, 3}`

.

```
item_names = ['air', 'bus', 'car', 'train']
num_items = 4
encoder = dict(zip(item_names, range(num_items)))
print(f"{encoder=:}")
item_index = item_index.map(lambda x: encoder[x])
item_index = torch.LongTensor(item_index)
print(f"{item_index=:}")
```

```
encoder={'air': 0, 'bus': 1, 'car': 2, 'train': 3}
item_index=tensor([0, 0, 0, ..., 2, 2, 2])
```

### Construct Observables

Then let's constrct tensors for observables.
As mentioned in the data management tutorial, the *session* is capturing the temporal dimension of our data.
Since we have different values `cost`

, `freq`

and `ovt`

for each purchasing record and for each item, it's natural to say each purchasing record has its own session.

Consequently, these three variables are `price`

observables since they vary by both item and session.
The tensor holding these observables has shape \((\text{numer of purchasing records}, \text{number of items}, 3)\)

We do the same for variable `ivt`

, we put `ivt`

into a separate tensor because we want to model its coefficient differently later.

```
price_cost_freq_ovt = utils.pivot3d(df, dim0='case', dim1='alt',
values=['cost', 'freq', 'ovt'])
print(f'{price_cost_freq_ovt.shape=:}')
price_ivt = utils.pivot3d(df, dim0='case', dim1='alt', values='ivt')
print(f'{price_ivt.shape=:}')
```

```
price_cost_freq_ovt.shape=torch.Size([2779, 4, 3])
price_ivt.shape=torch.Size([2779, 4, 1])
```

In contrast, the `income`

variable varies only by session (i.e., purchasing record), but not by item. `income`

is therefore naturally a `session`

variable.

```
session_income = df.groupby('case')['income'].first()
session_income = torch.Tensor(session_income.values).view(-1, 1)
print(f'{session_income.shape=:}')
```

```
session_income.shape=torch.Size([2779, 1])
```

To summarize, the `ChoiceDataset`

constructed contains 2779 choice records. Since the original dataset did not reveal the identity of each decision maker, we consider all 2779 choices were made by a single user but in 2779 different sessions to handle variations.

In this case, the `cost`

, `freq`

and `ovt`

are observables depending on both sessions and items, we created a `price_cost_freq_ovt`

tensor with shape `(num_sessions, num_items, 3) = (2779, 4, 3)`

to contain these variables.
In contrast, the `income`

information depends only on session but not on items, hence we create the `session_income`

tensor to store it.

Because we wish to fit item-specific coefficients for the `ivt`

variable, which varies by both sessions and items as well, we create another `price_ivt`

tensor in addition to the `price_cost_freq_ovt`

tensor.

Lastly, we put all tensors we created to a single `ChoiceDataset`

object, and move the dataset to the appropriate device.

```
dataset = ChoiceDataset(item_index=item_index,
price_cost_freq_ovt=price_cost_freq_ovt,
session_income=session_income,
price_ivt=price_ivt
).to(device)
```

```
No `session_index` is provided, assume each choice instance is in its own session.
```

You can `print(dataset)`

to check shapes of tensors contained in the `ChoiceDataset`

.

```
ChoiceDataset(label=[], item_index=[2779], user_index=[], session_index=[2779], item_availability=[], price_cost_freq_ovt=[2779, 4, 3], session_income=[2779, 1], price_ivt=[2779, 4, 1], device=cpu)
```

## Create the Model

We now construct the `ConditionalLogitModel`

to fit the dataset we constructed above.

To start with, we aim to estimate the following model formulation:

We now initialize the `ConditionalLogitModel`

to predict choices from the dataset.
Please see the documentation for a complete description of the `ConditionalLogitModel`

class.

At it's core, the `ConditionalLogitModel`

constructor requires the following four components.

### Define variation of each \(\beta\) using `coef_variation_dict`

The keyword `coef_variation_dict`

is a dictionary with variable names (defined above while constructing the dataset) as keys and values from `{constant, user, item, item-full}`

.

For instance, since we wish to have constant coefficients for `cost`

, `freq`

and `ovt`

observables, and these three observables are stored in the `price_cost_freq_ovt`

tensor of the choice dataset, we set `coef_variation_dict['price_cost_freq_ovt'] = 'constant'`

(corresponding to the \(\beta^{1\top} X^{price: (cost, freq, ovt)}_{it}\) term above).

The models allows for the option of zeroing coefficient for one item.
The variation of \(\beta^3\) above is specified as `item-full`

which indicates 4 values of \(\beta^3\) is learned (one for each item).
In contrast, \(\beta^0, \beta^2\) are specified to have variation `item`

instead of `item-full`

. In this case, the \(\beta\) correspond to the first item (i.e., the baseline item, which is encoded as 0 in the label tensor, `air`

in our example) is force to be zero.

The researcher needs to declare `intercept`

explicitly for the model to fit an intercept as well, otherwise the model assumes zero intercept term.

### Define the dimension of each \(\beta\) using `num_param_dict`

The `num_param_dict`

is a dictionary with keys exactly the same as the `coef_variation_dict`

.
Each of dictionary values tells the dimension of the corresponding observables, hence the dimension of the coefficient.
For example, the `price_cost_freq_ovt`

consists of three observables and we set the corresponding to three.

Even the model can infer `num_param_dict['intercept'] = 1`

, but we recommend the research to include it for completeness.

### Number of items

The `num_items`

keyword informs the model how many alternatives users are choosing from.

### Number of users

The `num_users`

keyword is an optional integer informing the model how many users there are in the dataset. However, in this example we implicitly assume there is only one user making all the decisions and we do not have any `user_obs`

involved, hence `num_users`

argument is not supplied.

```
model = ConditionalLogitModel(coef_variation_dict={'price_cost_freq_ovt': 'constant',
'session_income': 'item',
'price_ivt': 'item-full',
'intercept': 'item'},
num_param_dict={'price_cost_freq_ovt': 3,
'session_income': 1,
'price_ivt': 1,
'intercept': 1},
num_items=4)
```

Then we move the model to the appropriate device.

One can print the `ConditionalLogitModel`

object to obtain a summary of the model.

```
ConditionalLogitModel(
(coef_dict): ModuleDict(
(price_cost_freq_ovt[constant]): Coefficient(variation=constant, num_items=4, num_users=None, num_params=3, 3 trainable parameters in total, device=cpu).
(session_income[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
(price_ivt[item-full]): Coefficient(variation=item-full, num_items=4, num_users=None, num_params=1, 4 trainable parameters in total, device=cpu).
(intercept[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
)
)
Conditional logistic discrete choice model, expects input features:
X[price_cost_freq_ovt[constant]] with 3 parameters, with constant level variation.
X[session_income[item]] with 1 parameters, with item level variation.
X[price_ivt[item-full]] with 1 parameters, with item-full level variation.
X[intercept[item]] with 1 parameters, with item level variation.
device=cpu
```

## Creating Model using Formula

Alternatively, researchers can create the model using a `formula`

like in R.

The formula consists of a list of additive terms separated by `+`

sign, and each term looks like `(variable_name|variation)`

. Where `variable_name`

is the name of the variable in the dataset, and `variation`

is one of `constant`

, `user`

, `item`

, `item-full`

. Initializing the model using `formula`

requires you to pass in the `ChoiceDataset`

object as well so that the model can infer the dimension of each variable.

These two ways of creating models lead to equivalent models.

```
model = model = ConditionalLogitModel(
formula='(price_cost_freq_ovt|constant) + (session_income|item) + (price_ivt|item-full) + (intercept|item)',
dataset=dataset,
num_items=4)
```

```
ConditionalLogitModel(
(coef_dict): ModuleDict(
(price_cost_freq_ovt[constant]): Coefficient(variation=constant, num_items=4, num_users=None, num_params=3, 3 trainable parameters in total, device=cpu).
(session_income[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
(price_ivt[item-full]): Coefficient(variation=item-full, num_items=4, num_users=None, num_params=1, 4 trainable parameters in total, device=cpu).
(intercept[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
)
)
Conditional logistic discrete choice model, expects input features:
X[price_cost_freq_ovt[constant]] with 3 parameters, with constant level variation.
X[session_income[item]] with 1 parameters, with item level variation.
X[price_ivt[item-full]] with 1 parameters, with item-full level variation.
X[intercept[item]] with 1 parameters, with item level variation.
device=cpu
```

## Train the Model

We provide an easy-to-use helper function `run()`

imported from `torch_choice.utils.run_helper`

to fit the model with a particular dataset.

We provide an easy-to-use model runner for both `ConditionalLogitModel`

and `NestedLogitModel`

(see later) instances.

The `run()`

mehtod supports mini-batch updating as well, for small datasets like the one we are dealing right now, we can use `batch_size = -1`

to conduct full-batch gradient update.

Here we use the LBFGS optimizer since we are working on a small dataset with only 2,779 choice records and 13 coefficients to be estimated. For larger datasets and larger models, we recommend using the Adam optimizer instead.

```
start_time = time()
run(model, dataset, num_epochs=500, learning_rate=0.01, model_optimizer="LBFGS", batch_size=-1)
print('Time taken:', time() - start_time)
```

```
==================== model received ====================
ConditionalLogitModel(
(coef_dict): ModuleDict(
(price_cost_freq_ovt[constant]): Coefficient(variation=constant, num_items=4, num_users=None, num_params=3, 3 trainable parameters in total, device=cpu).
(session_income[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
(price_ivt[item-full]): Coefficient(variation=item-full, num_items=4, num_users=None, num_params=1, 4 trainable parameters in total, device=cpu).
(intercept[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
)
)
Conditional logistic discrete choice model, expects input features:
X[price_cost_freq_ovt[constant]] with 3 parameters, with constant level variation.
X[session_income[item]] with 1 parameters, with item level variation.
X[price_ivt[item-full]] with 1 parameters, with item-full level variation.
X[intercept[item]] with 1 parameters, with item level variation.
device=cpu
==================== data set received ====================
[Train dataset] ChoiceDataset(label=[], item_index=[2779], user_index=[], session_index=[2779], item_availability=[], price_cost_freq_ovt=[2779, 4, 3], session_income=[2779, 1], price_ivt=[2779, 4, 1], device=cpu)
[Validation dataset] None
[Test dataset] None
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/setup.py:201: UserWarning: MPS available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='mps', devices=1)`.
rank_zero_warn(
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:108: PossibleUserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
rank_zero_warn(
| Name | Type | Params
------------------------------------------------
0 | model | ConditionalLogitModel | 13
------------------------------------------------
13 Trainable params
0 Non-trainable params
13 Total params
0.000 Total estimated model params size (MB)
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 10 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
/Users/tianyudu/miniforge3/envs/dev/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1609: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=5). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Epoch 499: 100%|██████████| 1/1 [00:00<00:00, 76.72it/s, loss=1.87e+03, v_num=32]
`Trainer.fit` stopped: `max_epochs=500` reached.
Epoch 499: 100%|██████████| 1/1 [00:00<00:00, 69.67it/s, loss=1.87e+03, v_num=32]
Time taken for training: 11.521849155426025
Skip testing, no test dataset is provided.
==================== model results ====================
Log-likelihood: [Training] -1874.3427734375, [Validation] N/A, [Test] N/A
| Coefficient | Estimation | Std. Err. | z-value | Pr(>|z|) | Significance |
|:--------------------------------|-------------:|------------:|-----------:|------------:|:---------------|
| price_cost_freq_ovt[constant]_0 | -0.0333421 | 0.00709556 | -4.69901 | 2.61425e-06 | *** |
| price_cost_freq_ovt[constant]_1 | 0.0925304 | 0.00509757 | 18.1518 | 0 | *** |
| price_cost_freq_ovt[constant]_2 | -0.0430032 | 0.00322472 | -13.3355 | 0 | *** |
| session_income[item]_0 | -0.0890796 | 0.0183469 | -4.8553 | 1.20205e-06 | *** |
| session_income[item]_1 | -0.0279925 | 0.00387254 | -7.22846 | 4.88498e-13 | *** |
| session_income[item]_2 | -0.038146 | 0.00408307 | -9.34248 | 0 | *** |
| price_ivt[item-full]_0 | 0.0595089 | 0.0100727 | 5.90794 | 3.46418e-09 | *** |
| price_ivt[item-full]_1 | -0.00678188 | 0.00443289 | -1.5299 | 0.126042 | |
| price_ivt[item-full]_2 | -0.00645982 | 0.00189848 | -3.40262 | 0.000667424 | *** |
| price_ivt[item-full]_3 | -0.00145029 | 0.00118748 | -1.22132 | 0.221965 | |
| intercept[item]_0 | 0.697311 | 1.28022 | 0.544681 | 0.585973 | |
| intercept[item]_1 | 1.8437 | 0.708514 | 2.6022 | 0.0092627 | ** |
| intercept[item]_2 | 3.27381 | 0.624416 | 5.24299 | 1.57999e-07 | *** |
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Time taken: 11.617464065551758
```

### Parameter Estimation from `R`

The following is the R-output from the `mlogit`

implementation, the estimation, standard error, and log-likelihood from our `torch_choice`

implementation is the same as the result from `mlogit`

implementation.

We see that the final log-likelihood of models estimated using two packages are all around `-1874`

.

The `run()`

method calculates the standard deviation using \(\sqrt{\text{diag}(H^{-1})}\), where \(H\) is the hessian of negative log-likelihood with repsect to model parameters.

Names of coefficients are slightly different, one can use the following conversion table to compare estimations and standard deviations reported by both packages.

### R Output

```
install.packages("mlogit")
library("mlogit")
data("ModeCanada", package = "mlogit")
MC <- dfidx(ModeCanada, subset = noalt == 4)
ml.MC1 <- mlogit(choice ~ cost + freq + ovt | income | ivt, MC, reflevel='air')
summary(ml.MC1)
```

```
Call:
mlogit(formula = choice ~ cost + freq + ovt | income | ivt, data = MC,
reflevel = "air", method = "nr")
Frequencies of alternatives:choice
air train bus car
0.3738755 0.1666067 0.0035984 0.4559194
nr method
9 iterations, 0h:0m:0s
g'(-H)^-1g = 0.00014
successive function values within tolerance limits
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):train 3.2741952 0.6244152 5.2436 1.575e-07 ***
(Intercept):bus 0.6983381 1.2802466 0.5455 0.5854292
(Intercept):car 1.8441129 0.7085089 2.6028 0.0092464 **
cost -0.0333389 0.0070955 -4.6986 2.620e-06 ***
freq 0.0925297 0.0050976 18.1517 < 2.2e-16 ***
ovt -0.0430036 0.0032247 -13.3356 < 2.2e-16 ***
income:train -0.0381466 0.0040831 -9.3426 < 2.2e-16 ***
income:bus -0.0890867 0.0183471 -4.8556 1.200e-06 ***
income:car -0.0279930 0.0038726 -7.2286 4.881e-13 ***
ivt:air 0.0595097 0.0100727 5.9080 3.463e-09 ***
ivt:train -0.0014504 0.0011875 -1.2214 0.2219430
ivt:bus -0.0067835 0.0044334 -1.5301 0.1259938
ivt:car -0.0064603 0.0018985 -3.4029 0.0006668 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -1874.3
McFadden R^2: 0.35443
Likelihood ratio test : chisq = 2058.1 (p.value = < 2.22e-16)
```