pycsw project status 2022

pycsw

Angelos Tzotsos / @tzotsos
Tom Kralidis / @tomkralidis

OSGeo Project

FOSS4G 2022

Outline

  • Introduction to pycsw
  • Features
  • Architecture
  • Installation
  • Latest developments
  • Downstream Projects and Deployments
  • Roadmap
  • Community

Introduction

Introduction

  • pycsw is an OGC API - Records and OGC CSW server implementation written in Python
  • Open Source project released under the MIT license
  • Runs on all major platforms (Windows, Linux, Mac OS X)
  • OSGeo Project since 11 March 2015

Introduction

  • pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]
  • pycsw also implements OGC API - Records, CQL, OpenSearch, OAI-PMH, SRU, STAC
  • pycsw allows for the publishing and discovery of geospatial metadata

Introduction

The project is certified OGC Compliant, and is an OGC Reference Implementation for both CSW 2.0.2 and 3.0.0

Aiming OGC API - Records Reference Implementation

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 3.0.0. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

pycsw is an official OSGeo Project

OSGeo Project

Project History

  • 2010: Development started by Tom Kralidis
  • Feb 2011: Project officially announced
  • Apr 2011: First official release (0.1) released, passing all CITE tests
  • Jul 2011: Version 1.0 released
  • Feb 2012: pycsw included in OSGeoLive
  • Jan 2013: 1.4 certified OGC Compliant, Reference Implementation of OGC CSW 2.0.2
  • Apr 2013: pycsw entered OSGeo Incubation
  • Feb 2014: pycsw powers data.gov
  • Mar 2015: pycsw graduates OSGeo Incubation

Project History

  • Jul 2016: Version 2.0 released
  • Jul 2016: Reference implementation of OGC CSW 3.0.0
  • May 2019: 2.4.0 released
  • Dec 2020: Happy 10th birthday!
  • Dec 2020: 2.6.0 released
  • Jul 2021: OGC API - Records and STAC implementation
  • Oct 2021: CQL implementation using pygeofilter
  • May 2022: XSL transformations, JSON storage

Goals

  • Lightweight and easy to setup: a standalone catalogue, no GUI or metadata editing front end, designed for the use case of exposing ready-to-go metadata (files or in existing DB) through a CSW interface
  • Extensible: the ability to add metadata formats and mapping them to a common information model and core/additional queryables
  • OGC compliant: always pass CITE tests (integrated into CI)

Features

Features

  • Certified OGC Compliant and OGC Reference Implementation for both CSW 2.0.2 and CSW 3.0.0
  • Harvesting support for WMS, WFS, WCS, WPS, WAF, CSW, SOS
  • Implements ISO Metadata Application Profile 1.0.0
  • Implements FGDC CSDGM Application Profile for CSW 2.0
  • Implements OGC OpenSearch Geo, Time and EO Extensions
  • Implements OGC API - Records (in development)
  • Implements STAC API (1.0.0-beta2)
  • Implements INSPIRE Discovery Services 3.0
  • Supports ISO, Dublin Core, DIF, FGDC, Atom and GM03 metadata models
  • Standalone of embedded deployment (CGI or WSGI)
  • Transactional capabilities (CSW-T)
  • Flexible repository configuration (SQLite, PostgreSQL, PostGIS, MySQL)
  • Federated catalogue distributed searching

More features...

  • Simple configuration
  • Extensible plugin architecture (profiles, repositories/backends)
  • Seamless integration with Python environments (e.g. GeoNode, HHypermap, Open Data Catalog)
  • Integration with CKAN through ckanext-spatial and ckanext-publicamundi
  • Includes commandline utility to administer the metadata repository
  • Implements the Search/Retrieval via URL (SRU) search protocol
  • Implements Open Archives Initiative Protocol for Metadata Harvesting
  • Implements Full Text Search capabilities
  • Realtime XML Schema validation

Standards Support

  • OGC API - Records: Part 1 - Core
  • OGC CSW 2.0.2, 3.0.0
  • OGC Filter 1.1.0, 2.0.0
  • OGC OWS Common 1.0.0, 2.0.0
  • OGC SFSQL 1.2.1
  • SOAP 1.2
  • SRU 1.1
  • OAI-PMH 2.0
  • OGC OpenSearch 1.0
  • STAC API 1.0.0-beta2
  • OGC Common Query Language (CQL)

Standards Support

  • OGC GML 3.1.1
  • Dublin Core 1.1
  • ISO 19115 2003, ISO 19115-2 2019
  • ISO 19139 2007, ISO 19119 2005
  • NASA DIF 9.7
  • FGDC CSDGM 1998
  • OGC API - Records core record model/schema
  • STAC Item

Architecture

Component Architecture

Software Architecture

Installation

Installation: the proper way

Installation: 4 Minute Install



# Setup a virtual environment:
virtualenv pycsw && cd pycsw && . bin/activate

# Grab the pycsw source code:
git clone https://github.com/geopython/pycsw.git && cd pycsw
pip install -e . && pip install -r requirements-standalone.txt

# Create and adjust a configuration file:
cp default-sample.cfg default.cfg
vi default.cfg
# adjust paths in
# - server.home
# - repository.database
# set server.url to http://localhost:8000/

# Setup the database:
pycsw-admin.py -c setup_db -f default.cfg

# Load records by indicating a directory of XML files, use -r for recursive:
pycsw-admin.py -c load_records -f default.cfg -p /path/to/xml/

# Run the server:
python ./pycsw/wsgi.py

# See that it works!
curl http://localhost:8000/?service=CSW&version=2.0.2&request=GetCapabilities

		                

OSGeoLive

Docker - K8s

  • pycsw is available on DockerHub
  • Sample Kubernetes configuration available on GitHub
  • Helm Chart also available on GitHub

Latest developments

pycsw 2022 code sprint

Initial JSON storage support

  • Adds generic metadata object storage (beyond XML)
  • Media type qualification: metadata_type
  • Initial OGC API - Records core record parser
  • Implementation

XSLT transformations

  • pycsw's transformation is model driven based on core queryables
  • XSLT enables deep transformations (can transform beyond core queryables
  • Custom business logic
  • Configuration:
  • 
    # custom XSLT (section format: xslt:input_xml_schema,output_xml_schema)
    [xslt:http://www.opengis.net/cat/csw/2.0.2,http://www.isotc211.org/2005/gmd]
    xslt=/path/to/my-custom-iso.xslt
    				

OGC Common Query Language (CQL) support

  • Representations: CQL TEXT and CQL JSON
  • via GET (TEXT or JSON) or POST (JSON)
  • Using pygeofilter
  • CQL TEXT:
  • 
    title = 'Lorem ipsum'
    title LIKE 'foo%'
    				
  • CQL JSON:
  • 
    {
        "op": "=",
        "args": [
            {
                "property": "title"
            },
            "Lorem ipsum"
        ]
    }
    				

Recent Projects and Deployments

EOEPCA

  • ESA's Earth Observation Exploitation Platform Common Architecture
  • Exploitation Platform: A collaborative, virtual work environment providing access to EO data, algorithms, tools and ICT resources
  • Goal to define and agree a re-usable exploitation platform architecture using open interfaces
  • pycsw is the Resource Catalogue component of the architecture

Norweigan Meteorological Institute

Serving over 900,000 metadata records for the following projects

  • NBS: Norwegian National Ground Segment for Satellite Data
  • SIOS: Svalbard Integrated Arctic Earth Observing System
  • ADC: Arctic Data Center
  • NorDataNet: Norwegian Scientific Data Network
  • NMDC: Norwegian Marine Data Centre
  • WMO GCW: Global Cryosphere Watch

Enabling data search and access to raw data as well as to data services like OpenDAP, WMS

https:// [nbs, sios, adc, nordatanet, nmdc, gcw ] .csw.met.no

Norweigan Meteorological Institute

Ongoing work

  • DIF, DCAT, INSPIRE output profiles
  • Continuous deployment of CSW services integrated into a Kubernetes environment
  • Automatic update of CSW catalogues with new metadata as they are produced

Meteorological Service of Canada and WMO Information System (WIS)

Migration of Data Collection and Production Centre (DCPC) Catalogue

  • Implementation of WMO Core Metadata Profile (WCMP)
  • Discovery/search of WMO member weather/climate/water data

Roadmap

OGC API - Records / OGC SWG

  • Part of OGC API efforts
  • REST/JSON/OpenAPI/Swagger
  • pycsw is part of the OGC API - Records Standards Working Group (SWG)
  • OGC CQL (text/JSON)

STAC

  • STAC Item 1.0 support
  • STAC API 1.0-beta2 support
  • Tested against pystac-client

Coming soon

  • Deeper JSON metadata management support (ingest, harvest)
  • Deeper EO support / queryables / granularity
  • pygeofilter integration
  • updated CLI tooling (Python Click)

Future releases

  • pycsw 3.0: Long term release (CSW 2/3)
  • pycsw 4.0: OGC API - Records
  • Relationship to pygeoapi project

Community

Getting Involved

Community

https://pycsw.org/community.html

Mailing List and Gitter

https://lists.osgeo.org/mailman/listinfo/pycsw-devel

Gitter channel

Source Code, Wiki, Issues on GitHub

https://github.com/geopython/pycsw

Professional support

OSGeo Service providers

Thank you

Questions?