Methodologies for Big Data Analytics
A programme of methodological research on the quality of data, analysis and modelling techniques for Big Data led by Professor Maria Fasli..
About this research stream
This research underpins the work on the Centre with a focus on techniques and methods for the quality, pre-processing and analysis of Big Data. It also addresses the modelling and predicting of complex and adaptive socio-economic systems. This research focuses on:
This research will develop new and adapt existing methodologies for merging data from multiple sources. It will also develop robust techniques for data quality grading and assurance providing automated data quality and cleaning procedures for use by researchers.
Methods will be developed to automatically identify "unusual" data segments through an ICMetrics-based technique. Such methods will be able to alert researchers of specific data segments that require subsequent further analysis and identify potential issues with unsolicited data manipulation and integrity breaches.
Textual data represents rich information, but lacks structure and requires specialist techniques to be mined and linked properly as well as to reason with and make useful correlations. A set of techniques will be developed for extracting entities, relations between them, opinions and other elements for use to support semantic indexing and visualisation and anonymisation.
Data generated via the interaction of users online contains a wealth of information. This research will investigate automatic methods for tracking interactions that can be used, for example, to identify service pathways in local government or business data to aid organisations in improving service delivery to citizens/customers. Methods to identify the context of the interaction and the individual user needs to provide tailor-made services will also be developed.
Investigate machine learning and other methods for identifying stylised facts, seasonal, spatial or other relations, patterns of behaviour at the level of the individual, group, or region from transactional data from business, local government or other organisations. Such methods can provide essential decision support information to organisations in planning services based on predicted trends, spikes or troughs in demand.
Models and statistical methods for the analysis of local government health and social care data will be developed alongside new data mining and machine learning algorithms to identify intervention subgroups, and new joint modelling methods to improve existing predictive models with a view to evaluate, target and monitor the provision of care.
Data vary in content and granularity. Some will be available at the individual or firm level but often, due to various business or privacy preservation considerations, the data will be aggregated to higher levels, such as postcode, ward or institutional level, or aggregated by individual characteristics (e.g. age group). The focus of this project will be on developing meta-analysis and evidence synthesis methods to enable users to undertake unified analysis specifically for the types of data available through the Centre. We shall also develop new methods for indirect comparisons (network meta-analysis) of social interventions.
Datasets encompass the results of interactions/transactions within complex socio-economic systems. Although the techniques and methods developed under the first theme will enable researchers to analyse and mine these datasets, there is a need to understand the data, behaviours and processes that have led to these, at a much deeper level. Alongside analytical models, we will be deploying agent-based modelling and social simulation (ABSS) as an alternative method for exploring complex Big Data. ABSS enables one to alter the rules, interactions, and behaviour of the individual components within the system and observe the subsequent impact at the individual and the emergent system behaviour. This facilitates alternative and exhaustive scenario testing. ABSS can serve as a decision support tool for policy makers helping them identify issues and factors to enable them to better design and implement policies based on the features of their target population. Firms can also use such tools to better understand customer behaviour and market trends.
Similar to early warning systems for natural disasters and medical emergencies, such a system for social care would draw attention to a crisis at various levels: locality, institution or an individual. This would require a data-sharing platform that can pull together information held by separate agencies and would create a real-time score for levels of risk based on aggregated values of identified predictors.
View a selection of the latest Methodologies for Big Data Analytics research papers below or you can view them all in our Research Repository.