Getting Started
Installation
To use tulit, first install it using poetry:
$ poetry shell
$ poetry install
This will install the package and its dependencies.
Alternatively, you can install the package using pip:
$ pip install tulit
Basic usage
The tulit package has two main components:
Client: Query and retrieve data from legal sources across Europe:
EU: Cellar (EU Publications Office)
Member States: Finland, France, Germany, Ireland, Italy, Luxembourg, Malta, Portugal, Spain
Regional: Italian regions (Veneto)
Parser: Convert legal documents from various formats to JSON:
XML: Akoma Ntoso, FORMEX 4, BOE XML
HTML: Cellar variants, Regional parsers
JSON: Legifrance
Retrieving legal documents
EU Cellar Client
Retrieve documents from the EU Publications Office:
from tulit.client.eu.cellar import CellarClient
client = CellarClient(download_dir='./database', log_dir='./logs')
file_format = 'fmx4' # Or 'xhtml', 'pdfa', etc.
celex = "32024R0903"
documents = client.download(celex=celex, format=file_format)
print(f"Downloaded: {documents}")
Member State Clients
Italy (Normattiva):
from tulit.client.state.normattiva import NormativaClient
client = NormativaClient(download_dir='./database')
# Download by URN
client.download(urn='urn:nir:stato:decreto.legge:2024-01-01;1')
Luxembourg (Legilux):
from tulit.client.state.legilux import LegiluxClient
client = LegiluxClient(download_dir='./database')
client.download(eli='eli/etat/leg/code/travail')
Germany (RIS):
from tulit.client.state.germany import GermanyClient
client = GermanyClient(download_dir='./database')
client.download(doc_type='bgbl', year='2024', number='145')
Regional Clients
Veneto (Italy):
from tulit.client.regional.veneto import VenetoClient
client = VenetoClient(download_dir='./database')
client.download(bur_number='1', year='2024')
Parsing legal documents
The tulit parsers support legislative documents in the following formats:
XML Formats:
Akoma Ntoso 3.0 (EU, German LegalDocML, Luxembourg variants)
FORMEX 4 (EU legislative documents)
BOE XML (Spanish Official Gazette)
HTML Formats:
Cellar XHTML (semantic structure)
Cellar Standard HTML (simple structure)
EU Legislative Proposals
Veneto Regional HTML
JSON Formats:
Legifrance JSON
Parsing XML Documents
Akoma Ntoso: The package automatically detects variants (EU, German, Luxembourg):
from tulit.parser.xml.akomantoso import AKN4EUParser, GermanLegalDocMLParser
eu_parser = AKN4EUParser()
eu_parser.parse('tests/data/akn/eu/32014L0092.akn')
german_parser = GermanLegalDocMLParser()
german_parser.parse('tests/data/akn/germany/document.akn')
FORMEX 4: Parse EU legislative documents in FORMEX format:
from tulit.parser.xml.formex import Formex4Parser
parser = Formex4Parser()
formex_file = 'tests/data/formex/c008bcb6-e7ec-11ee-9ea8-01aa75ed71a1.0006.02/DOC_1/L_202400903EN.000101.fmx.xml'
result = parser.parse(formex_file)
# Access parsed content
print(f"Found {len(parser.articles)} articles")
for article in parser.articles:
print(f"Article {article.number}: {article.title}")
Parsing HTML Documents
Cellar HTML: Parse documents from EU Cellar in semantic XHTML format:
from tulit.parser.html.cellar import CellarHTMLParser
parser = CellarHTMLParser()
html_file = 'tests/data/html/c008bcb6-e7ec-11ee-9ea8-01aa75ed71a1.0006.03/DOC_1.html'
parser.parse(html_file)
Cellar Standard HTML: Parse documents with simple <TXT_TE> structure:
from tulit.parser.html.cellar import CellarStandardHTMLParser
parser = CellarStandardHTMLParser()
parser.parse('document.html')
EU Proposals: Parse legislative proposals with special structure:
from tulit.parser.html.cellar import ProposalHTMLParser
parser = ProposalHTMLParser()
parser.parse('proposal.html')
Veneto Regional: Parse Italian regional legislation:
from tulit.parser.html.veneto import VenetoHTMLParser
parser = VenetoHTMLParser()
parser.parse('tests/data/html/veneto/esg.html')
Accessing Parsed Content
After parsing, the document structure is available through parser attributes:
# Parse a document
from tulit.parser.xml.formex import Formex4Parser
parser = Formex4Parser()
parser.parse('document.fmx.xml')
# Metadata and preface
print(f"Title: {parser.preface}")
# Preamble components
for citation in parser.citations:
print(f"Citation: {citation.content}")
for recital in parser.recitals:
print(f"Recital {recital.number}: {recital.content}")
print(f"Formula: {parser.formula}")
print(f"Preamble final: {parser.preamble_final}")
# Body structure
for chapter in parser.chapters:
print(f"Chapter {chapter.number}: {chapter.title}")
for article in parser.articles:
print(f"Article {article.number}: {article.title}")
print(f"Content: {article.content}")
for child in article.children:
print(f" {child.type} {child.number}: {child.content}")
# Conclusions
print(f"Conclusions: {parser.conclusions}")
# Export to JSON
json_output = parser.export_to_json()
print(json_output)