Getting Started

Installation

To use tulit, first install it using poetry:

$ poetry shell
$ poetry install

This will install the package and its dependencies.

Alternatively, you can install the package using pip:

$ pip install tulit

Basic usage

The tulit package has two main components:

Client: Query and retrieve data from legal sources across Europe:
- EU: Cellar (EU Publications Office)
- Member States: Finland, France, Germany, Ireland, Italy, Luxembourg, Malta, Portugal, Spain
- Regional: Italian regions (Veneto)
Parser: Convert legal documents from various formats to JSON:
- XML: Akoma Ntoso, FORMEX 4, BOE XML
- HTML: Cellar variants, Regional parsers
- JSON: Legifrance

Retrieving legal documents

EU Cellar Client

Retrieve documents from the EU Publications Office:

from tulit.client.eu.cellar import CellarClient

client = CellarClient(download_dir='./database', log_dir='./logs')

file_format = 'fmx4'  # Or 'xhtml', 'pdfa', etc.
celex = "32024R0903"

documents = client.download(celex=celex, format=file_format)
print(f"Downloaded: {documents}")

Member State Clients

Italy (Normattiva):

from tulit.client.state.normattiva import NormativaClient

client = NormativaClient(download_dir='./database')
# Download by URN
client.download(urn='urn:nir:stato:decreto.legge:2024-01-01;1')

Luxembourg (Legilux):

from tulit.client.state.legilux import LegiluxClient

client = LegiluxClient(download_dir='./database')
client.download(eli='eli/etat/leg/code/travail')

Germany (RIS):

from tulit.client.state.germany import GermanyClient

client = GermanyClient(download_dir='./database')
client.download(doc_type='bgbl', year='2024', number='145')

Regional Clients

Veneto (Italy):

from tulit.client.regional.veneto import VenetoClient

client = VenetoClient(download_dir='./database')
client.download(bur_number='1', year='2024')

Parsing legal documents

The tulit parsers support legislative documents in the following formats:

XML Formats:

Akoma Ntoso 3.0 (EU, German LegalDocML, Luxembourg variants)

FORMEX 4 (EU legislative documents)

BOE XML (Spanish Official Gazette)

HTML Formats:

Cellar XHTML (semantic structure)

Cellar Standard HTML (simple structure)

EU Legislative Proposals

Veneto Regional HTML

JSON Formats:

Legifrance JSON

Parsing XML Documents

Akoma Ntoso: The package automatically detects variants (EU, German, Luxembourg):

from tulit.parser.xml.akomantoso import AKN4EUParser, GermanLegalDocMLParser

eu_parser = AKN4EUParser()
eu_parser.parse('tests/data/akn/eu/32014L0092.akn')

german_parser = GermanLegalDocMLParser()
german_parser.parse('tests/data/akn/germany/document.akn')

FORMEX 4: Parse EU legislative documents in FORMEX format:

from tulit.parser.xml.formex import Formex4Parser

parser = Formex4Parser()
formex_file = 'tests/data/formex/c008bcb6-e7ec-11ee-9ea8-01aa75ed71a1.0006.02/DOC_1/L_202400903EN.000101.fmx.xml'
result = parser.parse(formex_file)

# Access parsed content
print(f"Found {len(parser.articles)} articles")
for article in parser.articles:
    print(f"Article {article.number}: {article.title}")

Parsing HTML Documents

Cellar HTML: Parse documents from EU Cellar in semantic XHTML format:

from tulit.parser.html.cellar import CellarHTMLParser

parser = CellarHTMLParser()
html_file = 'tests/data/html/c008bcb6-e7ec-11ee-9ea8-01aa75ed71a1.0006.03/DOC_1.html'
parser.parse(html_file)

Cellar Standard HTML: Parse documents with simple <TXT_TE> structure:

from tulit.parser.html.cellar import CellarStandardHTMLParser

parser = CellarStandardHTMLParser()
parser.parse('document.html')

EU Proposals: Parse legislative proposals with special structure:

from tulit.parser.html.cellar import ProposalHTMLParser

parser = ProposalHTMLParser()
parser.parse('proposal.html')

Veneto Regional: Parse Italian regional legislation:

from tulit.parser.html.veneto import VenetoHTMLParser

parser = VenetoHTMLParser()
parser.parse('tests/data/html/veneto/esg.html')

Accessing Parsed Content

After parsing, the document structure is available through parser attributes:

# Parse a document
from tulit.parser.xml.formex import Formex4Parser
parser = Formex4Parser()
parser.parse('document.fmx.xml')

# Metadata and preface
print(f"Title: {parser.preface}")

# Preamble components
for citation in parser.citations:
    print(f"Citation: {citation.content}")

for recital in parser.recitals:
    print(f"Recital {recital.number}: {recital.content}")

print(f"Formula: {parser.formula}")
print(f"Preamble final: {parser.preamble_final}")

# Body structure
for chapter in parser.chapters:
    print(f"Chapter {chapter.number}: {chapter.title}")

for article in parser.articles:
    print(f"Article {article.number}: {article.title}")
    print(f"Content: {article.content}")
    for child in article.children:
        print(f"  {child.type} {child.number}: {child.content}")

# Conclusions
print(f"Conclusions: {parser.conclusions}")

# Export to JSON
json_output = parser.export_to_json()
print(json_output)