EnviLOD Workshop Announcement

How do you discover research relevant to research in environmental science? Do you think that the process could be better?

In the EnviLOD project we have been on exploring the potential  of Linked Open Data vocabularies, in the context of environmental science to improve information discovery. We have developed a new tool,  based upon input from the flooding community, which demonstrates how semantic technologies can enhance environmental information discovery. 

We are organising an EnviLOD dissemination workshop, which will provide an introduction to the EnviLOD project, present the newly developed LOD-based semantic enrichment tools and the associated semantic search interface. 

The afternoon session will have more technical and less technical break-out groups and is thus relevant to anybody with an interest in the discovery, management or use of environmental information. 

  25 January 2013
  11.00 – 15.00

  The British Library Conference Centre
  96 Euston Rd, London NW1 2DB

EnviLOD Workshop Registration Form

#EnviLOD: User Requirements Survey Results Now Published

The British Library and HR Wallingford, as part of our work on #EnviLOD, carried out a user survey, which identified the following potential use cases for vocabularies to be tested in the semantic enrichment and search workpackages:

  1. Returning results for geographically specific queries. Beyond keyword recognition, this use case includes proximity and recognition of geographic entities that are implied, but not stated within the query (for example in a query for flooding in SW England, identifying towns such as Exeter within that region, without it being explicitly articulated in the query).
  2. Answering non-open-ended queries. In this case, the user is asking for a specific piece of information, which might pertain to a budget, specific piece of legislation, flood levels in a particular locality, etc. For example: What is the annual flood defence expenditure in The Netherlands? These are questions that can be definitively answered, and to which semantic search algorithms can likely be easily trained.
  3. Answering open-ended queries. In this case, a user is conducting research with an aim of learning more about a particular topic. In this case, there is no definitive ‘answer’ to the query—the question is answered once the user has established that s/he has sufficient information on the topic. For example: What are some examples of community engagement relating to flood risk management? These questions are likely more difficult for a LOD approach to add value—but nevertheless represent an important type of question asked by survey respondents.

In general, users were found to prefer Google-style keyword searches, or searches in which they could pose a question above other types of searches. The amount of subject specific jargon used in their queries depended on the nature of the question that was asked, as well as the job held by the individual who was asking it. As such, the LOD vocabularies used in this work need to be flexible, enabling generalist queries, while also allowing subject-specific queries, where possible.

For further details, please see this public deliverable.

GATE and EnviLOD at the JISC Research Tools programme meeting

Today I went to Birmingham for the #JISCrestools programme meeting, organised by Christopher Brown and Torsten Reimer.

My presentation was on our latest #EnviLOD research on semantic annotation with Linked Open Data (DBPedia, Geonames, and GEMET) coupled with a demo of the Mimir-based semantic search over environmental science literature.

It was a really good meeting, especially seeing in more detail related work around:

  • text mining and social science tools for analysing social media (COSMOS and the Twitter analysis workbench); 
  • SKOS-HASSET on turning the HASSET thesaurus into SKOS and publishing as Linked Open Data, as well as using it for automated indexing; 
  • the INSPIRES project on finding links between researchers;
  • the Histore project which created training modules on text mining for historians, including GATE;
  • the eHealth GATEWay to the Clouds project which will soon publish some GATE plugins for anonymisation of electronic patient records
  • the COMTAX project on community-driven curation of taxonomic databases.
There were many other very interesting ones, just no time to write now about them all, but they are listed here.

#EnviLOD: Project Risks and Budget

Like any project involving software development, as well as research, #EnviLOD is facing a number of risks, detailed below:

EnviLOD Budget

EnviLOD's total budget is £69,771, with £55,816 being funded by JISC.  The budget breaks down as follows:

#EnviLOD: Project Timeline and Work packages

Our project started in June 2012 and is due to finish on December 31st, 2012.  We have just completed the user requirements gathering stage and are writing up the corresponding deliverable. As soon as it is ready, we will share it here for feedback.  We also had our third meeting today, discussing the work carried out in the past two weeks on user engagement and LOD-based semantic enrichment. 

In the mean time, here are some more details on the project workplan:




1: Project Management

2: User Engagement & Case Studies

3: Linked Environment Data Enrichment

4: User-Friendly Semantic Search over Linked Data

5: Evaluation

6: Dissemination & Engagement

WP 1: Project Management (Responsible partner: Sheffield)

The cross-institutional nature of the project necessitates close liaison between Sheffield, the British Library (BL) and HR Wallingford; in addition to communication as a result of collaborative working, monthly telecoms and regular face-to-face meetings will be used to advance the project and monitor progress. 
Deliverables: Project plan. Legacy plan, including sustainability and support. Final report.

WP 2: User Engagement and Case Studies (BL, HR Wallingford)

This WP covers engagement with environmental science researchers and other key stakeholders. This takes place throughout the project, but in particular: (i) early in the project, to produce detailed requirements and use cases, based on interviews; (ii) later in the project, when we will test the utility of Linked Data and assessing how the vocabularies support the needs of researchers and practitioners, and whether the Linked Open Data (LOD) approach will produce an added benefit in comparison with keyword searching.
Deliverables: Stakeholder analysis, requirements and use cases; User feedback.

WP 3: Linked Environment Data Enrichment (Sheffield)

This WP will deliver semantic enrichment tools, based on relevant LOD vocabularies. Where required, relevant ontologies not already connected to existing Linked Environment Data will be integrated. Sheffield’s open-source tools for lookup and term disambiguation with respect to Linked Data vocabularies will be tested and adapted to the environmental science domains. As part of this work, we are evaluating the coverage and accuracy of relevant general purpose LOD datasets (namely GeoNames and DBPedia), when applied to data and content from our domain. Tools for LOD-based geo-location disambiguation, date and measurement recognition and normalisation will also be delivered. 
Our solution is based on Ontotext's high performance OWLIM semantic repository, the open-source GATE semantic annotation tools, and their integration with Linked Data endpoints. We import Linked Data into the semantic repository, which provides a SPARQL endpoint and also full text, metadata, and semantic annotation indices, which underpin the semantic search UI.
Deliverables: Open source tools for semantic enrichment with Linked Environment Data.

WP 4:User-Friendly Semantic Search over Linked Data (Sheffield)

GATE Mimir (Multi-paradigm Information ManagementIndexing and Retrieval) is open-source software framework for multi-paradigm indexing and searching of semantically annotated documents. Enriching documents with explicit semantics allows users to search more effectively for ambiguous names such as London (Ontario) and London (UK).The multi-paradigm aspect of Mimir refers to the accessing and linking together of multiple information sources, such as the textual content of the documents, the semantic metadata and knowledge encoded in the Linked Data vocabularies. Accessing knowledge from Linked Data allows Mimir to understand generalisations, making it capable of answering more complex information needs, such as identification of documents that refer to water levels at the Thames barrier as relevant to a keyword search for flooding in south-east Britain. At the same time, the explicit LOD semantics associated with the indexed semantic metadata and content makes sure that references to places called London (other than the one in the UK) are not seen as relevant results to such a query.
This WP will develop a customised semantic search interface, which enables users to carry out such powerful searches and fully benefit from the knowledge contained in Linked Data, without needing to write SPARQL queries.
Deliverable: A web-based interface for semantic search with Linked Environment Data.

WP 5: Evaluation (Sheffield and BL)

Firstly, quantitative evaluation of the accuracy of semantic enrichment and Linked Data vocabulary coverage will be carried out, based on a human annotated gold standard and established metrics such as f-measure. In addition,  a comparative evaluation of the new semantic search web interface will be completed, against the current keyword-search Envia tool, using a set of search queries supplied by the BL. Evaluation will be carried out in the context of the user requirements developed in WP2.

Deliverables: Quantitative evaluation results; A report detailing the lessons learned.

WP 6: Dissemination and Engagement (Sheffield, BL, HR Wallingford)

The project will devote significant effort to dissemination, including practical activities such as demonstrations and tutorials, to show how project outputs might be exploited in other institutions. Details of planned dissemination activities are provided below.

Deliverables: Presentations; research paper; online demonstration; training materials; blog; website; user workshop, engagement with JISC programme manager and related projects.
