Companies and organizations providing customer support via email will over
time grow a big corpus of text documents. With advances made in Machine
Learning the possibilities to use this data to improve the customer support
efficiency is steadily increasing. The aim of this study is to analyze and evaluate
the use of Deep Learning methods for automizing the process of classifying
support errands. This study is based on a Swedish company’s domain where
the classification was made within the company’s predefined categories. A
dataset was built by obtaining email support errands (subject and body pairs)
from the company’s support database. The dataset consisted of data belonging
to one of nine separate categories. The evaluation was done by analyzing the
alteration in classification accuracy when using different methods for data
cleaning and by using different network architectures. A delimitation was set
to only examine the effects by using different combinations of Convolutional
Neural Networks (CNN) and Recurrent Neural Networks (RNN) in the shape
of both unidirectional and bidirectional Long Short Time Memory (LSTM)
cells. The results of this study show no increase in classification accuracy by
any of the examined data cleaning methods. However, a feature reduction of
the used vocabulary is proven to neither have any negative impact on the
accuracy. A feature reduction might still be beneficial to minimize other side
effects such as the time required to train a network, and possibly to help
prevent overfitting. Among the examined network architectures CNN were
proven to outperform RNN on the used dataset. The most accurate network
architecture was a single convolutional network which on two different test
sets reached classification rates of 79,3 and 75,4 percent respectively. The
results also show some categories to be harder to classify than others, due to
them not being distinct enough towards the rest of the categories in the
dataset.