Financed by: Department of Education and Research of the Regional Government of Madrid, through the 2013 R&D technology program for research groups, co-financed by Structural Funds of the European Union
Big Data is an emerging paradigm for large scale distributed data management that aims at being able to process large amounts of data beyond the possibilities of traditional database technologies. Big Data leverages cloud computing to attain a highly scalable infrastructure for both computing and storage. The Cloud4BigData project will enhance Big Data technologies and also their underlying cloud infrastructure to attain high levels of efficiency, flexibility, scalability, high availability, QoS, ease of use, security and privacy.
Big Data is already attaining good results with batch analytical processing technologies such as MapReduce, but it has important gaps. The most important issue is the lack of support for other data management needs, namely, Online Transactional Processing (OLTP), Online Analytical Processing (OLAP) and Complex Event Processing (CEP). In Cloud4BigData we aim at providing full Big Data support for OLTP, OLAP and CEP. This implies overcoming important challenges such as scaling transactional processing, analytical query processing and complex event processing as well as the integration of these technologies in a single unified platform. What is more, many Big Data applications require the use of a combination of cloud Big Data technologies specialized for different purposes such as graph databases, key-value data stores, document-oriented databases, SQL databases, in-memory databases, column-oriented data stores, CEP, etc. Cloud4BigData aims at providing holistic support to ease the development of Big Data applications on top on diverse cloud Big Data stores.
Another important drawback of Big Data technologies is their level of efficiency. Current technologies, such as MapReduce, and underlying storage, such as the Hadoop File System (HDFS) and the HBase key-value data store, attain high degrees of scalability, reaching 3,000 to 4,000 nodes. Unfortunately, they do so with very low levels of efficiency. In Cloud4BigData we aim to increase by 4 to 5 times the efficiency of Big Data processing.
Cloud computing is the underlying infrastructure for Big Data, and it is maturing and becoming widespread. However, cloud technology is still far from Big Data user requirements, especially in terms of efficiency, flexibility, ease of use, SLAs (Service Level Agreements) and security. In terms of IaaS (Infrastructure as a Service), significant improvements are required in areas such as energy efficiency, flexibility in networking (e.g., through software defined networks, virtual networks), simplicity in infrastructure management, etc. PaaS (Platform as a Service) demands more efficient platforms, development frameworks and methodologies providing elasticity and scalability, adaptation to the incoming workload, and transparent fault tolerance.
In all of these topics, SLAs are far from being supported in clouds mainly due to the lack of isolation across different tenants, lack of intelligent placement and routing schemes, lack of frameworks providing high availability, uncertainties about when requested on-demand VMs will be provided, etc. Security in clouds has improved at the cloud infrastructure level, but still applications, such as Big Data applications, have to defend themselves from all kinds of attacks, from those exploiting vulnerabilities in the application or underlying software infrastructure (e.g., application server, database, libraries, etc.) to generic attacks such as Distributed Denial of Service (DDoS – Distributed Denial of Service) attacks. In Cloud4BigData we aim at improving the efficiency of PaaS, especially in terms of elasticity and multi-tenancy support, but also the security of clouds leveraging scalable cloud technology such as scalable and elastic CEP (Complex Event Processing).
Finally, Cloud4BigData will demonstrate its abilities in emerging application areas with highly demanding requirements requiring cloud and Big Data technologies such as machine-to-machine, Internet of Things and smart*(smart grid, smart cities, smart transport, etc.) and other more traditional areas such as online applications, multimedia applications, or distributed games demanding Big Data support to go beyond their current functionalities.
The research groups that have partnered with IMDEA Networks Institute to carry out the Cloud4BigData project are the LSD group from the Polytechnic University of Madrid (Coordinator), the LS Group and the FUNLab Group from University Rey Juan Carlos.