Alexander Artikis

 

 

Datasets for Symbolic Event Recognition

Below are a few datasets that may be used for symbolic event recognition. The datasets are represented in an Event Calculus syntax. The datasets include streams of derived events that are used for the recognition of composite events of interest. The datasets may be used for, among others:

  • testing the recognition accuracy of manually developed composite event definitions;
  • testing the recognition accuracy of composite event definitions that have been automatically constructed (by means of machine learning techniques);
  • testing the efficiency of event recognition algorithms.

Please email me if you have any queries concerning these datasets, or if you have datasets that you would like to make available through this web page.

Public Space Surveillance

This dataset comes from the CAVIAR project. This dataset includes 28 surveillance videos of a public space. The videos are staged - actors walk around, sit down, meet one another, leave objects behind, fight, and so on. Each video has been manually annotated in order to provide the ground truth for both short-term activities (walking, running, being active, being inactive) detected on each video frame, and long-term activities (leaving an object unattended, people meeting, moving together, fighting, etc) that take place over frame sequences. Details about this dataset, including the actual videos, may be found here. The first CAVIAR dataset in an Event Calculus syntax is available here. In brief, the annotated short-term activities and context information are given in the files entitled 'appearanceIndv' and 'movementIndv', whereas the annotated long-term activities are given in the files entitled 'situation' and 'context' (when available).

We have edited the first CAVIAR dataset in order to introduce the 'abrupt' short-term activity. A person is said to exhibit an 'abrupt' activity if he moves abruptly and his position in the global coordinate system does not change significantly - if it did then the short-term activity would be classified as 'running'. The annotated long-term activities have not been edited. The edited CAVIAR dataset may be found here.

Thanks to Anastasios Skarlatidis for translating the CAVIAR dataset from XML to an Event Calculus syntax.

Input Output Size Noise Known Experiments on Dataset Dataset in Other Formats
Instantaneous short-term activities: walking, runinng, active, inactive, abrupt motion. Durative long-term activities: leaving object unattended, people meeting, moving together, fighting, etc. 26419 video frames in which, on average, 2-3 short-term activities take place.
There are various types of noise. See, for example, this paper. See the original dataset description and this paper. XML.

 

City Transport Management

In the context of the PRONTO project, an event recognition system is being developed with the aim to support the management of public transport. Buses and trams are equipped with in-vehicle units that send GPS coordinates to a central server, offering information about the current status of the transport system (for example, the location of buses and trams on the city map). Additionally, buses and trams are being equipped with sensors for in-vehicle temperature, in-vehicle noise level and acceleration. Given the derived events that will be extracted from these sensors and other data sources, such as digital maps, as well as derived events that will be extracted from the communication between the drivers and the public transport control centre, composite events will be recognised related to, among others, the punctuality of a vehicle, passenger and driver comfort, passenger and driver safety, and passenger satisfaction.

The dataset in an Event Calculus syntax is available here. Annotation.txt describes the annotated composite events while the remaining files describe events derived from sensor data.

Input Output Size Noise Known Experiments on Dataset Dataset in Other Formats
Instantaneous and durative events: enter stop, leave stop, internal temperature change, noise level change, enter intersection, abrupt acceleration, abrupt deceleration, sharp turn. Instantaneous and durative composite events: punctuality change, uncomfortable driving style, low quality driving, reducing passenger satisfaction, etc. 1300 events derived from sensor data concerning 4 vehicles (2 trams and 2 buses). No. See Deliverable 4.1.1 of the PRONTO project. No.

 

Cardiac Arrhythmia Recognition

This dataset contains sequences of time-stamped symbolic abstractions of ECG cardiac waves - QRS waves and P waves - from real human ECG records. These records represent a subset of the ECG records available at the MIT-BIH Arrhythmia Database Directory. The task is to recognise episodes of arrhythmia given the occurrence date and type of the different ECG waves. Details about this task may be found here. QRS waves and arrhythmia episodes have been manually annotated by cardiologists, while P waves have been manually annotated by non-cardiologist experts (the annotations are thus provided without any warranty). The dataset in Event Calculus format is available here. The manually annotated episodes of arrhythmia are given the .train files while the symbolic representation of the ECG waves is given in the .data files.

Note that automatically detected ECG cardiac waves are noisy. This is in contrast to the manually annotated ECG waves found in this dataset.

This dataset has been kindly offered by Francois Portet. Please contact him for further information on this dataset and event recognition application.

Input Output Size Noise Known Experiments on Dataset Dataset in Other Formats
Symbolic representation of instantaneous ECG waves. Durative arrhythmia episodes. 16000 ECG waves.
No. See this PhD thesis (in French) and the Calicot website. The dataset in CRS format is available here.

 

Humpback Whale Songs

This dataset contains a symbolic representation of recordings of humpback whale songs, recorded in Hawaii in 1978. The task here is the thematic analysis of the songs, which have a very rich structure. In other words, the task is to recognise (various parts of) a whale song given various types of whale sound. Details about this dataset may be found here. The dataset in Event Calculus format is available here. In brief, the manually annotated parts of a whale song are given the .train files while the symbolic representation of whale sounds is given in the .data files.

Input Output Size Noise Known Experiments on Dataset Dataset in Other Formats
Durative whale sounds. Durative whale songs. 2500 whale sounds.
No. See this PhD thesis. No.

Thanks to Nikos Katzouris for representing this dataset in an Event Calculus syntax.

Virtual Soccer

This dataset concerns virtual soccer games played in Second Life. The task here is to recognise the composite events 'passing the ball' and 'scoring a goal' given the primitive actions of players, such as running and kicking the ball. Information about this dataset may be found here. The dataset in an Event Calculus syntax will be available in the following months.

This dataset has been kindly offered by Surangika Ranathunga and Stephen Cranefield.

Input Output Size Noise Known Experiments on Dataset Dataset in Other Formats
Durative events: kicking the ball, moving, running, etc. Durative events: passing the ball, scoring a goal.
     

 

 

 

 

 

 

Alexander Artikis