With more and more datasets available in the Web that are accessible to a broad range of heterogeneous users the risk of applying inappropriate analysis increases. For example, the Carbon Dioxid (CO2) emissions of coal power plants may be interpolated to arbitrary locations in space by unexperienced users. But how can one interpret the CO2 emissions at an arbitrary location in space where no power plant is located? We argue that it is only meaningful to predict (or interpolate) values for observable phenomena. This means, in the example of CO2 emissions per power plant, that we cannot observe the CO2 emissions everywhere, but only at locations of power plants. Our aim is to formalise this to make it machine readable and provide basic recommendation for appropriate, or warnings for inappropriate, analysis in statistical software.
For meaningful spatial statistics, we combine approaches from knowledge engineering that allow to formalize semantics with statistical analysis in order to provide support for a meaningful analysis using appropriate statistical procedures for a particular dataset.
In spatial statistics we distinguish different data types, such as point patterns, geostatistical data, lattice data and trajectories, and develop different methods to answer particular questions for specific types of data. The information systems we use to manage and exchange data typically do not make the same distinction. It then becomes very easy to carry out analyses that are syntactically correct but semantically not meaningful, and we will present a number of such cases.
We formalized the notion of meaningfulness in the context of spatio-temporal prediction and spatio-temporal aggregation. Meaningful prediction implies correspondence between the prediction function and the observation function: each prediction should predict an observable quantity. Meaningful aggregation requires correspondence of the observation window, the region or set of points over which data are exhaustive, and the aggregation target (region or set of points). Our theory has been formalized in a higher order logic language (Isabelle), and an OWL (Web Ontology Language) pattern is provided for semantic annotation of data sets that could be used to prohibit meaningless spatial statistical operations on these data. Concrete use cases and workflows for the latter will be presented on this website.