Abstract
Legal case matching, which automatically constructs a model to estimate the
similarities between the source and target cases, has played an essential role
in intelligent legal systems. Semantic text matching models have been applied
to the task where the source and target legal cases are considered as long-form
text documents. These general-purpose matching models make the predictions
solely based on the texts in the legal cases, overlooking the essential role of
the law articles in legal case matching. In the real world, the matching
results (e.g., relevance labels) are dramatically affected by the law articles
because the contents and the judgments of a legal case are radically formed on
the basis of law. From the causal sense, a matching decision is affected by the
mediation effect from the cited law articles by the legal cases, and the direct
effect of the key circumstances (e.g., detailed fact descriptions) in the legal
cases. In light of the observation, this paper proposes a model-agnostic causal
learning framework called Law-Match, under which the legal case matching models
are learned by respecting the corresponding law articles. Given a pair of legal
cases and the related law articles, Law-Match considers the embeddings of the
law articles as instrumental variables (IVs), and the embeddings of legal cases
as treatments. Using IV regression, the treatments can be decomposed into
law-related and law-unrelated parts, respectively reflecting the mediation and
direct effects. These two parts are then combined with different weights to
collectively support the final matching prediction. We show that the framework
is model-agnostic, and a number of legal case matching models can be applied as
the underlying models. Comprehensive experiments show that Law-Match can
outperform state-of-the-art baselines on three public datasets.