Federated search using pycsw

pycsw

Tom Kralidis
Angelos Tzotsos

OSGeo Project

FOSS4G Europe 2026

This presentation available at pycsw.org/publications/foss4g-europe2026

Outline

  • Introduction to pycsw
  • Features
  • Architecture
  • Federated search implementation
  • Next steps

Introduction

Introduction

  • pycsw is an OGC API - Records, OGC CSW server and STAC API implementation written in Python
  • Open Source project released under the MIT license
  • Runs on all major platforms (Windows, Linux, Mac OS X)
  • OSGeo Project since 11 March 2015

Introduction

  • fully implements OGC API - Records and the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]
  • also implements additional search APIs: STAC API, OpenSearch, OAI-PMH, SRU
  • allows for the publishing and discovery of geospatial metadata

Introduction

The project is certified OGC Compliant, and is an OGC Reference Implementation for both CSW 2.0.2 and 3.0.0

Aiming for OGC API - Records Reference Implementation

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 3.0.0. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

pycsw is an OSGeo Project

OSGeo Project

Project History

  • 2010: Development started by Tom Kralidis
  • Feb 2011: Project officially announced
  • Apr 2011: First official release (0.1) released, passing all CITE tests
  • Jul 2011: Version 1.0 released
  • Feb 2012: pycsw included in OSGeoLive
  • Jan 2013: 1.4 certified OGC Compliant, Reference Implementation of OGC CSW 2.0.2
  • Apr 2013: pycsw entered OSGeo Incubation
  • Feb 2014: pycsw powers data.gov
  • Mar 2015: pycsw graduates OSGeo Incubation

Project History

  • Jul 2016: Version 2.0 released
  • Jul 2016: Reference implementation of OGC CSW 3.0.0
  • May 2019: 2.4.0 released
  • Dec 2020: Happy 10th birthday!
  • Dec 2020: 2.6.0 released
  • Jul 2021: OGC API - Records and STAC implementation
  • Oct 2021: CQL implementation using pygeofilter
  • May 2022: XSL transformations, JSON storage
  • September 2023: OGC API Transactions (OAFeat Part 4)
  • October 2023 - present: Numerous specification updates (OGC API, STAC, OpenSearch)
    • Development alongside OGC API - Records SWG
    • STAC compliance
  • Feb 2025: 2.6.2 released (maintenance)

Goals

  • Lightweight and easy to setup: a standalone catalogue, no GUI or metadata editing front end, designed for the use case of exposing ready-to-go metadata (files or in existing DB) through a CSW interface
  • Extensible: the ability to add metadata formats and mapping them to a common information model and core/additional queryables
  • OGC compliant: always pass CITE tests (integrated into CI)

Features

Features

  • Certified OGC Compliant and OGC Reference Implementation for both CSW 2.0.2 and CSW 3.0.0
  • Implements OGC API - Records
  • Implements STAC API 1.0.0
  • Harvesting support for WMS, WFS, WCS, WPS, WAF, CSW, SOS
  • Implements ISO Metadata Application Profile 1.0.0
  • Implements FGDC CSDGM Application Profile for CSW 2.0
  • Implements OGC OpenSearch Geo, Time and EO Extensions
  • Implements INSPIRE Discovery Services 3.0
  • Supports ISO, Dublin Core, DIF, FGDC, Atom and GM03 metadata models
  • Standalone of embedded deployment (CGI or WSGI)
  • Transactional capabilities (CSW-T)
  • Flexible repository configuration (SQLite, PostgreSQL, PostGIS, MySQL)
  • Federated catalogue distributed searching

More features...

  • Simple configuration
  • Extensible plugin architecture (profiles, repositories/backends)
  • Seamless integration with Python environments (e.g. GeoNode, HHypermap, Open Data Catalog)
  • Integration with CKAN through ckanext-spatial and ckanext-publicamundi
  • Includes commandline utility to administer the metadata repository
  • Implements the Search/Retrieval via URL (SRU) search protocol
  • Implements Open Archives Initiative Protocol for Metadata Harvesting
  • Implements Full Text Search capabilities
  • Realtime XML Schema validation

Standards Support

  • OGC API - Records: Part 1 - Core
  • OGC API - Records: Part 2 - Facets
  • OGC API - Records: Part 4 - Federated Search
  • OGC CSW 2.0.2, 3.0.0
  • OGC Filter 1.1.0, 2.0.0
  • OGC OWS Common 1.0.0, 2.0.0
  • OGC SFSQL 1.2.1
  • SOAP 1.2
  • SRU 1.1
  • OAI-PMH 2.0
  • OGC OpenSearch 1.0
  • STAC API 1.0.0
  • OGC Common Query Language (CQL)

Standards Support

  • OGC GML 3.2.1
  • Dublin Core 1.1
  • ISO 19115 2003, ISO 19115-2 2019
  • ISO 19139 2007, ISO 19119 2005
  • NASA DIF 9.7
  • FGDC CSDGM 1998
  • OGC API - Records core record model/schema
  • STAC (API, Collection, Catalog, Item)

Architecture

Component Architecture

Software Architecture

Federated search implementation

ESA requirement

  • EOEPCA
  • Resource catalogue buildling block (BB)
  • Cross-Catalogue Search
  • Federation

pycsw Federated Search support

  • CSW2/CSW3 distributed search
  • OGC API - Records - Part 4: Federated Search (May 2026)

Design

				      client
				         ^
				         |
				         v
         			       pycsw
				         ^
				         |
				         v
				  /-------------\
				  ^      ^      ^
				  |      |      |
				  v      v      v
				 cat1   cat2   cat3
				

Configuration

distributedsearch:
    merge_results: true
    catalogues:
        - id: fedcat01
          type: CSW 
          title: Arctic SDI 
          url: https://catalogue.arctic-sdi.org/csw
        - id: metno
          type: OARec
          title: met.no API records
          url: https://test.wps.met.no/collections/no.met.adc:1e7f7e14-5753-5dc9-bf51-0626d9af0dce
        - id: fedcat03
          type: STAC-API
          title: Copernicus Data Space Ecosystem (CDSE) asset-level STAC catalogue
          url: https://stac.dataspace.copernicus.eu/v1
          collections:
              - daymet-annual-pr
				  

Link relations

  • http://www.opengis.net/def/rel/ogc/1.0/federatedCatalogues

Endpoints

  • /collections/{collectionId}/federatedCatalogues
  • /collections/{collectionId}/federatedCatalogues/{federatedCatalogueId}
  • /collections/{collectionId}/items?distributedSearch=TRUE
    • Additional query parameters / filters passed to distributed catalogues

Search results

  • grouped by federatedSearchResults

Merging search results

  • unpacked into main features array
  • deduplication: ID prefixing
    • id123 -> fedcat01::id123
  • catalogue/source identification (federatedCatalogueId)
    • "federatedCatalogueId": "fedcat01"

Next steps

  • EODAG integration (eodag.plugins.search)
  • promote merging results at specification level (mergeResults=TRUE)
  • CRUD management of federated catalogues

Thank you

pycsw.org

@tomkralidis

@kalxas