Towards pycsw 2.0: Project report

Angelos Tzotsos / @tzotsos
Tom Kralidis / @tomkralidis

FOSS4G 2016 Bonn

Outline

  • Introduction to pycsw
  • Features
  • What's new in pycsw 2.0
  • Architecture
  • Installation
  • Downstream Projects and Deployments
  • Future Developments
  • Community

Introduction

Introduction

  • pycsw is a OGC CSW server implementation written in Python
  • Open Source project released under the MIT license
  • Runs on all major platforms (Windows, Linux, Mac OS X)
  • OSGeo Project since 11 March 2015

Introduction

  • pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]
  • pycsw also implements OpenSearch, OAI-PMH, SRU
  • pycsw allows for the publishing and discovery of geospatial metadata

Introduction

The project is certified OGC Compliant, and is an OGC Reference Implementation for both CSW 2.0.2 and 3.0.0

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 3.0.0. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

pycsw is an official OSGeo Project

OSGeo Project

Project History

  • 2010: Development started by Tom Kralidis
  • Feb 2011: Project officially announced
  • Apr 2011: First official release (0.1) was released and already passed all CITE tests
  • Jul 2011: Version 1.0 released
  • Feb 2012: pycsw included in OSGeoLive
  • Jan 2013: pycsw 1.4 certified as OGC Compliant
  • Apr 2013: pycsw entered OSGeo Incubation
  • Feb 2014: pycsw powers data.gov
  • Mar 2015: pycsw graduates OSGeo Incubation
  • Jul 2016: Latest stable release (2.0.0)
  • Jul 2016: Reference implementation of OGC CSW 3.0.0

Goals

  • Lightweight and easy to setup: a standalone catalogue, no GUI or metadata editing front end, designed for the use case of exposing ready-to-go metadata (files or in existing DB) through a CSW interface
  • Extensible: the ability to add metadata formats and mapping them to a common information model and core/additional queryables
  • OGC compliant: always pass CITE tests

Features

Features

  • Certified OGC Compliant and OGC Reference Implementation for both CSW 2.0.2 and CSW 3.0.0
  • Harvesting support for WMS, WFS, WCS, WPS, WAF, CSW, SOS
  • Implements ISO Metadata Application Profile 1.0.0
  • Implements FGDC CSDGM Application Profile for CSW 2.0
  • Implements OGC OpenSearch Geo and Time Extensions
  • Implements INSPIRE Discovery Services 3.0
  • Supports ISO, Dublin Core, DIF, FGDC, Atom and GM03 metadata models
  • Standalone of embedded deployment (CGI or WSGI)
  • Transactional capabilities (CSW-T)
  • Flexible repository configuration (SQLite, PostgreSQL, PostGIS, MySQL)
  • Federated catalogue distributed searching
  • Python 2 and 3 compatible

More features...

  • Simple configuration
  • Extensible plugin architecture (profiles, repositories/backends)
  • Seamless integration with Python environments (e.g. GeoNode, HHypermap, Open Data Catalog)
  • Integration with CKAN through ckanext-spatial and ckanext-publicamundi
  • Includes commandline utility to administer the metadata repository
  • Implements the Search/Retrieval via URL (SRU) search protocol
  • Implements Open Archives Initiative Protocol for Metadata Harvesting
  • Implements Full Text Search capabilities
  • Realtime XML Schema validation

Standards Support

  • OGC CSW 2.0.2, 3.0.0
  • OGC Filter 1.1.0, 2.0.0
  • OGC OWS Common 1.0.0, 2.0.0
  • OGC GML 3.1.1
  • OGC SFSQL 1.2.1
  • Dublin Core 1.1
  • SOAP 1.2
  • ISO 19115 2003
  • ISO 19139 2007
  • ISO 19119 2005
  • NASA DIF 9.7
  • FGDC CSDGM 1998
  • SRU 1.1
  • OAI-PMH 2.0
  • OGC OpenSearch 1.0

Supported CSW Operations

  • GetCapabilities
  • DescribeRecord
  • GetRecords
  • GetRecordById
  • GetRepositoryItem
  • GetDomain
  • Harvest
  • UnHarvest
  • Transaction

What's new in pycsw 2.0.0

OGC CSW 3.0.0

  • OGC released Catalogue Service Standard v3.0 on 12 July 2016
  • Version 3.0 better aligns with newer OGC standards (such as OWS Common 2.0 and Filter 2.0)
  • Provides a developer-friendly OpenSearch Geo API
  • Supports querying via temporal extents
  • Improves distributed search to better federate catalogues
  • OGC blogpost on CSW 3.0

OGC CSW 3.0.0

CSW 3.0.0 provides major features and improvements over CSW 2.0.2 as part of the evolution of OGC Catalogue Services, including Open Search Geo and Time Extensions, OpenSearch 1.1, and the Atom Syndication Format.

Other enhancements include:

  • Features advertised as conformance classes
  • Simpler KVP API
  • Enhanced distributed searching functionality
  • Raw metadata response for GetRecordById
  • Proper use of HTTP status codes
  • Proper use of HTTP request/response headers
  • UnHarvest operation
  • Use of temporal predicates for query and presentation

OGC CSW 3.0.0

  • OGC provides a CSW 3.0 compliance test suite
  • The compliance suite was developed in 2015, while the specification was under review
  • pycsw 2.0 was also mainly developed in 2015, in parallel with the compliance suite
  • pycsw 2.0 development contributed to bug fixes and improvements to the OGC compliance suite
  • pycsw development branch was the first to fully pass the compliance test
  • Two reference implementations were available at release time

OGC CSW 3.0.0

  • In alignment with the CSW specifications, the default version returned is the latest supported version
  • pycsw 2.0 will always behave like a 3.0.0 CSW unless the client explicitly requests a 2.0.2 CSW
  • The new default behaviour breaks the pycsw API compatibility (pycsw.server.Csw.dispatch_wsgi now returns the HTTP status code along with the response string)
  • More details on the pycsw RFC for CSW 3.0 support

OGC CSW 3.0.0



http://localhost/csw  # returns 3.0.0 Capabilities
http://localhost/csw?service=CSW&request=GetCapabilities  # returns 3.0.0 Capabilities
http://localhost/csw?service=CSW&version=2.0.2&request=GetCapabilities  # returns 2.0.2 Capabilities
http://localhost/csw?service=CSW&version=3.0.0&request=GetCapabilities  # returns 3.0.0 Capabilities

		                

OGC OpenSearch

OGC OpenSearch

To query pycsw via OpenSearch, requests must be specificed with mode=opensearch. The following parameters are supported:

  • {searchTerms} (keywords)
  • {geo:box} (bounding box of minx,miny,maxx,maxy)
  • {time:start} and {time:end} (temporal)

OGC OpenSearch

Some examples:

OGC OpenSearch

More examples:

Python 3 support

  • Support for Python 3 through usage of __future__ and six library
  • Needed to heavily review list usage, since Python 3 added support for sorted lists
  • Only support Python 3 after 3.4
  • Continous integration through Travis now includes Python 2.6, 2.7 and 3.4

Better JSON support

  • Implemented with xmltodict which provides parsing/serialization support in a manner which provides a closer mapping to / from an XML content model
  • More compact JSON output (~30% reduction in payload size)
  • Closer representation / better transformation of native XML
  • Easier for downstream applications to process

Better JSON support (before)



    {
      "tag": "csw30:SummaryRecord",
      "children": [
        {
          "text": "urn:uuid:19887a8a-f6b0-4a63-ae56-7fba0e17801f",
          "tag": "dc:identifier"
        },
        {
          "text": "Lorem ipsum",
          "tag": "dc:title"
        },
        {
          "text": "http:\/\/purl.org\/dc\/dcmitype\/Image",
          "tag": "dc:type"
        },
        {
          "text": "Tourism--Greece",
          "tag": "dc:subject"
        },
        {
          "text": "image\/svg+xml",
          "tag": "dc:format"
        },
        {
          "text": "Quisque lacus diam, placerat mollis, pharetra in, commodo sed, augue. Duis iaculis arcu vel arcu.",
          "tag": "dct:abstract"
        }
      ]
    }

		                

Better JSON support (after)



  "csw30:SummaryRecord": [
    {
      "dc:identifier": "urn:uuid:19887a8a-f6b0-4a63-ae56-7fba0e17801f",
      "dc:title": "Lorem ipsum",
      "dc:type": "http:\/\/purl.org\/dc\/dcmitype\/Image",
      "dc:subject": "Tourism--Greece",
      "dc:format": "image\/svg+xml",
      "dct:abstract": "Quisque lacus diam, placerat mollis, pharetra in, commodo sed, augue. Duis iaculis arcu vel arcu."
    }

		                

More pycsw 2.0.0 features

  • WMTS harvesting support
  • XML output improvements
  • GM03 support for Swiss metadata
  • Added temporal extent support to WMS layer harvesting

Architecture

Component Architecture

Software Architecture

Installation

Installation: the proper way

Installation: 4 Minute Install



# Setup a virtual environment:
$ virtualenv pycsw && cd pycsw && . bin/activate

# Grab the pycsw source code:
$ git clone https://github.com/geopython/pycsw.git && cd pycsw
$ pip install -e . && pip install -r requirements-standalone.txt

# Create and adjust a configuration file:
$ cp default-sample.cfg default.cfg
$ vi default.cfg
# adjust paths in
# - server.home
# - repository.database
# set server.url to http://localhost:8000/

# Setup the database:
$ pycsw-admin.py -c setup_db -f default.cfg

# Load records by indicating a directory of XML files, use -r for recursive:
$ pycsw-admin.py -c load_records -f default.cfg -p /path/to/xml/

# Run the server:
$ python ./pycsw/wsgi.py

# See that it works!
$ curl http://localhost:8000/?service=CSW&version=2.0.2&request=GetCapabilities

		                

OSGeo-Live

  • pycsw is available to test in OSGeoLive since version 5.5
  • Project Overview and Quickstart Tutorial are included

Downstream Projects and Deployments

GeoNode

Open Source Geospatial Content Management System

  • GeoNode is a web-based application and platform for developing geospatial information systems (GIS) and for deploying spatial data infrastructures (SDI).
  • pycsw is embeded and enabled as the default CSW server

CKAN

Open Data Catalog

  • Code For America Application
  • Open data publishing
  • pycsw is embeded and enabled out of the box

HHypermap (Harvard Hypermap)

Boundless Exchange

  • Boundless Exchange is a web-based GIS platform. It facilitates the creation, sharing, and collaborative use of geospatial data
  • Boundless Exchange is built on top of GeoNode, pycsw, HHypermap, GeoGig and GeoServer

Data.gov

Recent Deployments

See more at the pycsw Live Deployments Map

Future Developments

Future Developments

  • Support for GeoDCAT-AP
  • Support for ElasticSearch backend
  • Support for SOLR backend
  • Improved testing (py.test)

Community

OSGeo Incubation process

Getting Involved

Community

http://pycsw.org/community.html

Mailing List and IRC

http://lists.osgeo.org/mailman/listinfo/pycsw-devel

#pycsw and #geopython on Freenode

Source Code, Wiki, Issues on GitHub

https://github.com/geopython/pycsw

Professional support

http://www.osgeo.org/search_profile?SET=1&MUL_TECH[]=00107

Workshop

Oregon Coastal & Marine Data Network pycsw Workshop materials available at http://www.coastalmarinedata.net/?p=229

Workshop source code available on https://github.com/geopython/pycsw-workshop

pycsw 2.0.0 codenamed "Doug"

The 2.0.0 release is codenamed “Doug” in honour of Doug Nebert of the FGDC. Doug was internationally recognized as a champion of metadata, discovery and interoperability. Involved in numerous international standards bodies and spatial data infrastructure initiatives, Doug was one of the editors of the CSW 3.0 specification and encouraged pycsw developers to adopt and implement CSW 3.0 as part of US data.gov efforts. Doug’s vision and expertise will always be remembered and appreciated by the pycsw development team.

Thank you

Questions?