NYU / Rutgers University · Finance

SECBert

A BERT model pre-trained on SEC EDGAR filings for understanding regulatory financial documents and compliance text.

Overview

SECBert is a domain-specific BERT model trained on a large corpus of SEC EDGAR filings including 10-K, 10-Q, and 8-K documents. The model captures the unique language patterns, legal terminology, and regulatory conventions found in securities filings. It is particularly effective for tasks involving regulatory document analysis, compliance monitoring, and risk factor extraction from public company disclosures.

Parameters

110M

Architecture

BERT-Base

Training Data

SEC EDGAR filings (10-K, 10-Q, 8-K)

Context Window

512 tokens

License

Research use

Capabilities

SEC filing analysis and classification

Risk factor extraction from 10-K filings

Regulatory compliance text understanding

Financial entity recognition in SEC documents

Filing section identification and parsing

Use Cases

Automated analysis of 10-K risk factor disclosures

Monitoring SEC filings for material changes in company disclosures

Extracting key financial metrics from quarterly filings

Compliance document review and classification

Pros

  • +Specialized in regulatory financial document understanding
  • +Captures SEC-specific language patterns and conventions
  • +Effective for compliance and risk analysis workflows
  • +Lightweight deployment requirements

Cons

  • -Narrow focus on SEC filings limits broader financial use
  • -512-token context insufficient for full filing sections
  • -Encoder-only; cannot generate regulatory text
  • -Limited to U.S. SEC regulatory documents

Pricing

Free for research use. Available on Hugging Face. Lightweight enough to run on standard compute infrastructure.

Related Models