A Semantic Loss Function for Deep Learning with Symbolic Knowledge
Date:
Introduction
The paper proposes a way to include symbolic knowledge into the training process of a Neural Network without changing the existing architecture of these NNs. This is done by constraining the outputs of NNs using propositional Logic. Given a propositional sentence and the relationship between the outputs of the neural network this papers injects these symbolic knowledge in the training process of the NN along with the normal training data without changing the training procedure of the deep learning tool kit.
Link to the paper
Why combine symbolic Knowledge with Deep Learning?
This paper suggests that including symbolic knowledge in the form of constraints (or sentences) in boolean logic can help improve the learning and ultimately the predictions. Deep Learning and Logic despite being the fundamental ingredients of Artificial Intelligence, the combination of the two still poses serious challenges. Because on one hand deep learning relies on continuous, smooth and differentiable functions while on the other hand logic is discrete, symbolic ad with strong semantics.
Constraints example
In a multi-class classification problem the NN gives the probability associated with each class, since this is a classification task we ideally want to have exactly one class to be predicted. One might consider rounding the probabilities corresponding to each class to get a boolean value but this is challenging when constraints are not satisfied and therefor we do not have any information to nudge the NN to update. A solution is to use a probabilistic interpretation of the logic such that each output of the neural network (output corresponding to each class) is treated as an independent Bernoulli random variable and calculating the probability that at least one constraint is satisfied is ‘semantic loss’ which is a clever way of adding a regularisation parameter.
How are the derivatives computed?
The outputs of the neural net that satisfies a constraint is mapped into a logical circuit using conjunctive and disjunctive gates. The logic gate can be turned into an arithmetic circuit by pushing the conjunctions to a + (addition) and disjunction into a * (multiplication). The benefit of such Arithmetic circuit is that it allows for the derivatives to be computed which is then parsed to the neural network for back-propagation.