In the real world (especially in banking or insurance), critical information does not come in a SQL table. It comes in PDFs, urgent emails, or incident reports written in a messy way. If you have to analyze 2,000 of these per day, either you have an army of people reading them—or you build something automated.
In this post, I want to show you how I designed an NLP pipeline to tackle this problem. It’s not about “adding AI just for the sake of it,” but about creating a logical flow that classifies, extracts, and cleans the information for us. The best part is that, with a few tweaks, this approach works just as well for risk analysis as it does for classifying support tickets or legal contracts.
What we’re going to build does four key things:
- Prioritizes. Is this a fire or a routine alert?
- Labels. Is it fraud, a technical failure, or a legal issue?
- Detects data. Extracts IPs, accounts, emails... the “hard” stuff.
- Understands context. Identifies which laws or companies are mentioned.
The flow architecture (Keep it simple)
There’s no need to overcomplicate things with huge models for everything. Here we combine business logic (rules) with statistical models.
The flow is:
Raw document → Urgency classifier → Risk labeling → Technical data extractor → NER (Entities) → Structured JSON.
A real case to test it
To avoid staying in theory, let’s use this incident alert (I made it up, but it’s very close to what you’d find in a real system):
OPERATIONAL RISK ALERT - ID-2026-00142 A critical failure has been detected in the transaction processing system... it affects credit cards. It started at 14:23 UTC and impacts around 15,000 customers. Data: IP 192.168.1.50 | Error: ERR-DB-TIMEOUT | TXNs: TXN-A7B3C9D2, TXN-F4E8A1B6. Warning: It exceeds the 0.5% MiFID II threshold. CNMV must be notified.
Classifying the fire: criticality
The first step is to know whether we need to escalate immediately or if it can wait until tomorrow. For this, we use a weight-based system. If words like “penalty” or “fraud” appear, the score skyrockets.
class CriticalityClassifier:
def __init__(self):
# Define what keeps us up at night
self.keywords_critico = {'fraud': 10, 'loss': 10, 'non-compliance': 10, 'fine': 10}
self.keywords_alto = {'risk': 7, 'alert': 6, 'failure': 5}
# ... (other levels)
def _calculate_scores(self, text: str):
text_lower = text.lower()
# Add points based on matches
return {
'critical': sum(w for k, w in self.keywords_critico.items() if k in text_lower),
'high': sum(w for k, w in self.keywords_alto.items() if k in text_lower),
# ...
}
Result: in our example, it would yield a HIGH level with 57% confidence. Enough to trigger an automatic alert.
What are we dealing with? (Categories)
A document can belong to multiple categories at the same time. For example: technical (server outage) and compliance (regulatory breach). I used precompiled Regex patterns because, honestly, for specific keywords they are much faster and cheaper than a neural network.
# A small map of what to look for in each risk
self.category_keywords = {
'operational': ['failure', 'error', 'outage', 'technical incident'],
'compliance': ['regulation', 'penalty', 'cnmv', 'mifid'],
'cybersecurity': ['attack', 'malware', 'hack', 'breach']
}
Extraction of technical "threads"
This is where NLP shines by saving time. Instead of having an analyst copy and paste IPs or error codes, we let regular expressions do the dirty work.
We extract simultaneously:
- IPs: 192.168.1.50
- Transaction IDs: TXN-A7B3C9D2
- Emails: incidents@sample-bank.com
NER. Who’s who?
To extract entities (organizations, laws, products), I use spaCy. It’s the Swiss Army knife for this. We add a custom dictionary because spaCy knows what a “person” is, but sometimes struggles to understand what “MiFID II” or “CNMV” are unless we explicitly tell it.
# We combine spaCy with our own domain-specific rules
self.custom_entities = {
'regulations': ['mifid ii', 'gdpr', 'pci dss'],
'organizations': ['cnmv', 'ecb', 'bank of spain']
}
The final result: JSON
In the end, the pipeline outputs something as clean as this:
{
"criticality": "high",
"categories": ["operational", "compliance", "technical"],
"indicators": {
"ipv4": ["192.168.1.50"],
"error_code": ["ERR-DB-TIMEOUT"]
},
"entities": {
"regulations": ["MiFID II"],
"organizations": ["CNMV"]
}
}
Does this scale?
If you run it sequentially, it might put you to sleep. But using ThreadPoolExecutor in Python, I’ve managed to process around 1,500 documents per minute. For most companies, that’s more than enough to handle the entire incoming flow in real time.
- Real metrics:
- Latency: 38ms (blazing fast).
- Accuracy: ~95% (the remaining 5% is usually very ambiguous language that requires human judgment).
Conclusion
Automating this is not just about cost savings—it’s about reaction time. If a system detects a regulatory breach in 40 milliseconds, you can mitigate the risk before it turns into a million-dollar fine.
What do you think? Would you introduce heavier models like Transformers (BERT/LLMs), or do you believe this hybrid approach is more stable for risk classification? I’ll read you in the comments.
Comments are moderated and will only be visible if they add to the discussion in a constructive way. If you disagree with a point, please, be polite.
Tell us what you think.