A very simple Variational Quantum Classifier (VQC)

8 min readJun 11, 2023

Introduction

In the previous posts (here and here) I have started talking about QML, why and how to study and from now on I will start to share my studies and findings, which so far are very basic.

Today I am going to design a Variational Quantum Classifier (VQC), which is a hybrid computing classifier, as can be seen in Figure 1.

Here, the quantum computer is configured with a parameterizable quantum circuit, which is analougous to the mathematical function that we use as our model in a classic classification problem, such as the logistic regression, SVM and many others. The classic computer is responsible to perform the optimization, so it sets the circuit parameters, runs the quantum computer, gathers the results and refines the parameters according to the loss function. So, the only difference here is that instead of using our daily classification algorithm we are using a quantum circuit as model (a.k.a ansatz).

However, as simple as it seems, there are challenges surrounding this hybrid computing architecture. As I mentioned in my first post, classic and quantum computing have different paradigms, so our quantum circuit must have some stages.

The feature map is the first stage of the circuit, in which the classical data must be encoded as qubits. There are many ways to encode, since the feature map is a mathematical transformation from a vectorial space to another (a Hilbert space). Researchers have been studying on how to find the best mapping for each problem, that is, turning the mapping itself into an optimization problem. This is rather interesting because a good mapping implies a good separation between data of different classes, which simplifies a lot the classification problem. Pennylane has some interesting and basic readings about this here and here. Unfortunetaly, Qiskit has deprecated aqua, but they also had some feature maps here. And I also want to take a dive and read this article.

In the second stage we design a quantum circuit that is going to be our model, here we can be as creative as we want, but we must consider that same old rules are still important: don’t use too many parameters for simple problems to avoid overfitting and don’t use too few to avoid bias, as we already know. But the most important thing is: since we are using quantum computing, it is essential to work with superposition and entanglement, in order to ger the best from quantum computing paradigm.

Quantum circuits are somehow simple and tricky at the same time, since they are linear transformations, but analyzed as componentized circuits, which demands attention. As I mentioned in my last post, Frank Zickert’s book and blog are the best references I read to learn about quantum circuits.

Problem

With a basic understanding of the VQC, on this post we are going to design a classifier based on the Titanic Dataset from Kaggle. In there, we have informations about passengers and if they the survived or not.

The variables in this dataset are:

PassengerID
Passenger name
Class (First, second or third)
Gender
Age
SibSP (siblings and/or spouses aboard)
Parch (parents or children aboard)
Ticket
Fare
Cabin
Embarked
Survived

Here, we are building a classifier that predicts if a passenger survived based on its characteristics. I am not going to deep dive into the feature analysis and selection because that is not the focus of this post, so I will just jump to the selected variables:

is_child (if age <12)
is_class1 (if person is in the first class)
is_class2
is_female

Basically, children and women from the first class have higher survival rates. Besides, survival decreases for the second and third classes. All variables are booleans here, which is fine because this is a very simple model and this will also simplify a lot our feature map selection.

Feature embedding and mapping

I think this Pennylane post is a very nice introduction to quantum embedding, as it shows that different methodologies have inherent pros and cons. Basis embedding might be the most simple embedding, but at the same time can be costly in terms of qubits. Amplitude embedding provides important qubits savings, but it can have a compressed vectorial space and bad separations between classes.

Since we are working with just four variables I am going to use Basis Embedding with no further mapping circuits, for the sake of simplicity.

In this case, we will just convert the classic bit into its equivalent qubit. Therefore, if our four variables are 1010, this will be converted to |1010>.

Model (Ansatz)

The ansatz is the parameterizable quantum circuit designed as our model. As I mentioned before, this circuit must possess some degree of superposition and entanglement to justify the use of a quantum device in our project.

The chosen circuit is shown:

Figure 3 — Ansatz for out Titanic classifier

This is a common circuit that I took from a Pennylane example. If you haven’t study quantum circuits it might look sofisticated, but the idea is rather simple. This is a two-layer circuit, since the core structure is repeated 2 times. At first we have rotations around Z, Y and Z axis for each qubit and the idea here is to insert some degree of superposition on each qubit separately. These rotations are parameterized and on each interaction of the algorithm these parameters will be updated by the classic computer. Besides, we are talking about rotations on Y and Z axis because the vectorial space of a qubit is a sphere (the Bloch Sphere). The RZ will only change the qubit phase and RY will affect how close the qubit is to |0> and |1>.

After that we have four Controlled-NOT (CNOT) states between each pair of qubits, which is a quantum gate that inverts a qubit state depending on the state of the other qubit (target and control, respectively). Thereby, this gate entangles all qubits in our circuit and now all states are entangled. In the second layer we apply a new set of rotations, which is not simply a logical repetition of the first layer because now all states are entangled, which means that rotating the first qubit will also affect the others! And finally we have a new set of CNOT gates.

This is a very simplistic explanation of our circuit, but with some study and pracice these concepts will get intuitive to you (I confess I am still in this learning process!).

Optimizer

In this project I am using Adam Optimizer from Pennylane (in the last post I said I was using Qiskit, but I had some issues with deprecated functions, so I returned to Pennylane). I tested the learning rate until I found one that converged smoothly to our optimum.

Code and Results

I implemented the code using Pennylane with a qubit simulator. I confess I got most of the quantum part of the code from tutorials and I made some changes in the optimizer, since the original code used Nesterov Momentum Optimizer and I had better results and convergence with Adam.

import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import AdamOptimizer

from sklearn.model_selection import train_test_split
import pandas as pd

from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

import math

num_qubits = 4
num_layers = 2

dev = qml.device("default.qubit", wires=num_qubits)

# quantum circuit functions
def statepreparation(x):
    qml.BasisEmbedding(x, wires=range(0, num_qubits))

def layer(W):

    qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
    qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
    qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
    qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)

    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[2, 3])
    qml.CNOT(wires=[3, 0])

@qml.qnode(dev, interface="autograd")
def circuit(weights, x):

    statepreparation(x)

    for W in weights:
        layer(W)

    return qml.expval(qml.PauliZ(0))

def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias

def square_loss(labels, predictions):
    loss = 0
    for l, p in zip(labels, predictions):
        loss = loss + (l - p) ** 2

    loss = loss / len(labels)
    return loss

def accuracy(labels, predictions):

    loss = 0
    for l, p in zip(labels, predictions):
        if abs(l - p) < 1e-5:
            loss = loss + 1
    loss = loss / len(labels)

    return loss

def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return square_loss(Y, predictions)

# preparaing data
df_train = pd.read_csv('train.csv')

df_train['Pclass'] = df_train['Pclass'].astype(str)

df_train = pd.concat([df_train, pd.get_dummies(df_train[['Pclass', 'Sex', 'Embarked']])], axis=1)

# I will fill missings with the median
df_train['Age'] = df_train['Age'].fillna(df_train['Age'].median())

df_train['is_child'] = df_train['Age'].map(lambda x: 1 if x < 12 else 0)
cols_model = ['is_child', 'Pclass_1', 'Pclass_2', 'Sex_female']

X_train, X_test, y_train, y_test = train_test_split(df_train[cols_model], df_train['Survived'], test_size=0.10, random_state=42, stratify=df_train['Survived'])

X_train = np.array(X_train.values, requires_grad=False)
Y_train = np.array(y_train.values * 2 - np.ones(len(y_train)), requires_grad=False)

# setting init params
np.random.seed(0)
weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)

opt = AdamOptimizer(0.125)
num_it = 70
batch_size = math.floor(len(X_train)/num_it)

weights = weights_init
bias = bias_init
for it in range(num_it):

    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, len(X_train), (batch_size,))
    X_batch = X_train[batch_index]
    Y_batch = Y_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X_batch, Y_batch)

    # Compute accuracy
    predictions = [np.sign(variational_classifier(weights, bias, x)) for x in X_train]
    acc = accuracy(Y_train, predictions)

    print(
        "Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} ".format(
            it + 1, cost(weights, bias, X_train, Y_train), acc
        )
    )

X_test = np.array(X_test.values, requires_grad=False)
Y_test = np.array(y_test.values * 2 - np.ones(len(y_test)), requires_grad=False)

predictions = [np.sign(variational_classifier(weights, bias, x)) for x in X_test]

accuracy_score(Y_test, predictions)
precision_score(Y_test, predictions)
recall_score(Y_test, predictions)
f1_score(Y_test, predictions, average='macro')

I also made some tests varying the amount of layers in our ansatz and it seems that 2 layers is our best result here. With one layer we don’t have the second rotation, that produces the entangled supserposition and this resulted in bias, since our model predicted always the same result (not surviving, the majority target class) and with more layers I hadn’t better results, mostly because our problem is simple and more parameters would lead us to overfit.

Our model had the following results:

Accuracy: 78.89%

Precision: 76.67%

Recall: 65.71%

F1: 77.12%

These are solid results, the model is predicting and not guessing majority results. But we can compare our VQC comparing to classic algorithms, so I trained a logistic regression and had the following results:

Accuracy: 75.56%

Precision: 69.70%

Recall: 65.71%

F1: 74.00%

Our VQC performed slightly better than the logisitic regression model! Well, that doesn’t mean that VQC is necessarily better, just that this specific model with specific optimization process performed better. But the main idea of this post is to show that is simple to build a quantum classifier and even though this is nothing extraordinary, this is a simple and effective use of QML.