Conditional Random Fields (CRF) Tutorial Section

Conditional Random Fields (CRF)

Master Conditional Random Fields, the advanced probabilistic framework for tagging sequential NLP data.

Conditional Random Fields (CRF)

Conditional Random Fields (CRFs) are the ultimate evolution of statistical sequence modeling prior to the deep learning era. They combine the ability of Hidden Markov Models (HMMs) to predict sequences with the ability of Maximum Entropy models to use vast numbers of overlapping, custom features.

                    Generative vs Discriminative
                    HMMs are Generative: They model the joint probability P(Labels, Words). They try to learn how the data was generated.
CRFs are Discriminative: They model the conditional probability P(Labels | Words) directly. They don't care about predicting the data; they only care about drawing the boundary between the correct labels!

                

The Feature Function Advantage

The superpower of CRFs in Named Entity Recognition (NER) is that you can hand-craft thousands of highly specific "Feature Functions" that look at the entire sentence at once, not just the previous state.

Example CRF Custom Features for NER

If we are predicting whether the current word is a "Person" entity, a CRF can ingest all these features simultaneously:

F1: Is the current word Capitalized? (Yes/No)
F2: Does the previous word == "Mr."? (Yes/No)
F3: Is the word entirely digits? (Yes/No)
F4: Is the word in our predefined list of cities? (Yes/No)
F5: Does the suffix of the word end in "-tion"? (Yes/No)

An HMM cannot handle these overlapping features because they violate its strict independence assumptions. A CRF assigns a mathematical weight to each of these functions and sums them up contextually.

Modern Usage: BiLSTM-CRF

CRFs didn't die with the advent of Deep Learning! In fact, the state-of-the-art for NER before Transformers was the BiLSTM-CRF architecture.

The BiLSTM reads the text and extracts neural features, outputting raw scores for tags. A CRF layer is tacked onto the very end to enforce strict sequence rules (e.g., ensuring an 'Inside-Person' tag never directly follows a 'Beginning-Location' tag).

Previous: Maximum Entropy