In my guest post on the automation experts blog two weeks ago, I discussed the topic of big bata and how companies should learn from the U.S. election of Barack Obama in 2012, to convert real-time information in the lead. In addition to cloud computing, mobile and social media, big data is one to the current top issues in the IT business environment. This is by far not only a trend but a reality. With a far-reaching influence on business, its strategic direction and IT. Known technologies and methods on the analysis of big data have reached its limits. And only the company that manages to obtain an information advantage from the data silos is one step ahead to the competition in the future.

Big Data: No new wine in old wineskins

Basically the idea behind big data is nothing new. From the early to mid 1990s, there was already the term “business intelligence” about using procedures to systematically analyze data. The results are used to gain new insights that will help in achieving the objectives of a company better and to make strategic decisions. However, the data base, which had to be analyzed, was much smaller than today and only analyzes on data from the past could made, leading to uncertain forecasts for the future. Today everyday objects collect data with massive amounts of information every second. This includes smartphones, tablets, cars, electricity meters or cameras. There are also areas that are not located in the immediate vicinity of a person, such as fully automated manufacturing lines, distribution warehouse, measuring instruments, aircraft and other means of transport. And of course it is we humans who nurture big data with our habits. Tweets on Twitter, comments on Facebook, Google search queries, browsing with Amazon and even the vital signs during a jogging session provide modern companies vast amounts of data today which can turn in valuable information.

Structured and unstructured data

Large data sets are not a new phenomenon. For decades, retail chains, oil companies, insurance companies or banks collect solid information on inventories, drilling data and transactions. This also includes projects for parallel processing of large data sets, data mining grids, distributed file systems and databases, distributed to the typical areas of what is now known as big data. These include the biotech sector, projects of interdisciplinary scientific research, weather forecasting and the medical industry. All of the above areas and industries are struggling with the management and processing of large data volumes.

But now the problem has also affected the “normal” industries. Today’s challenges are that data arise from many different sources and sometimes fast, unpredictable and, lest unstructured. Big data is to help in places where a lot of different data sources may be combined. Examples are tweeting on Twitter, browsing behavior or information about clearance sales and to this understanding to develop new products and services. New regulations in the financial sector lead to higher volumes of data and require better analysis. In addition, web portals like Google, Yahoo and Facebook collect an enormous amount of daily data which they also associate with users to understand how the user moves to the side and behaves. Big data becomes a general problem. According to Gartner, enterprise data could grow in the next five years by up to 650%. 80% of those will be unstructured data or big data that have already shown that they are difficult to manage. In addition, IDC estimates that the average company has to manage 50 times more information by 2020, while the number of IT staff will increase by only 1.5%. A challenge companies must respond to in an efficient manner if they try to remain competitive.

Why companies choose big data

But where do these huge amounts of data come from and what motivates a business to deal with the issue. Market researchers of Experton Group tried to clarify the questions in their “Big Data 2012 – 2015” client study in October 2012. Accordingly, the main driver for the use of big data technologies and concepts is the rapid growth of data, including the appropriate quality management and automation of analysis and reporting. The topics of loyalty and marketing take about a third of the companies as an opportunity to renew the analysis of their databases. New database technologies give companies 27 percent of respondents as a motivation for new methods of data analysis. Furthermore almost all the features of big data matter the reasons for the expansion of strategic data management. This shows that big data is already reality, even though in many cases it is not known by this term. The big data drivers themselves are the same across all industries and company sizes across. The only difference is in the meaning and intensity. One big difference is the size of the company and the distribution of data and information to the right people in the company. Here companies see their biggest challenges. Whereas smaller companies classify the issue as uncritical.

Big data: An use case for the cloud

The oil and gas industry has solved the processing of large amounts of data through the use of traditional storage solutions (SAN and NAS). Research-oriented organizations or companies like Google, which have to do with the analysis of mass data are more likely to keep track of the Grid approach to invest the unused resources in software development.

Big data processing belongs to the cloud

Cloud infrastructures help to reduce costs for the IT infrastructure. This alows company to be able to focus more effectively on their core business and gain greater flexibility and agility for the implementation of new solutions. Thus a foundation is laid, to adapt to the ever-changing amounts of data and to provide the necessary scalability. Cloud computing providers are capable based on investments in their infrastructure, to develop a big data usable and friendly environment and maintain these. Whereas a single company can’t provide the adequate resources for scalability and also does not have the necessary expertise.

Cloud resources increasing with the amount of data

Cloud computing infrastructures are designed to grow or reduce with the demands and needs. Companies can meet the high requirements – such as high processing power, amount of memory, high I / O, high-performance databases, etc. – that are expected from big data, easily face through the use of cloud computing infrastructure without investing heavily in their own resources.

Cloud concepts such as infrastructure-as-a-service (IaaS) combine both worlds and take in a unique position. For those who understand the SAN / NAS approach, resources can also be use to design massively parallel systems. For companies who find it difficult to deal with the above technologies or understand this, IaaS providers offer appropriate solutions to avoid the complexity of storage technologies and to focus on the challenges facing the company.

An acceptable solution comes from cloud computing pioneer Amazon Web Services. With the AWS Data Pipeline (still in beta) Amazon offers a service which move and handle data automatically between different systems. The systems are to be either directly in the Amazon cloud or on an other system outside. Amazon makes the handling of the growing amounts of data to distributed system with different formats easier. To this number of pipelines, in which the different data sources, conditions, objectives, instructions and schedules are defined can be created. In short, it’s about what data is loaded from which system based on which conditions, then be processed, and afterall where the results should be save. The pipeline will be started as needed, hourly, daily or weekly. The processing can take place either directly in the Amazon cloud or on the systems in the company’s own data center.

Big Data = Big Opportunities?

Not only the Obama example shows how profitable the operation of structured and unstructured data from mobile devices, social media channels, the cloud, and many other different sources of a company can be. However, one has to be clear about one point regarding big data. It is ultimately not the mass of the data that is collected, but the quality and for which the data is to be ultimately used in general.

It is therefore crucial whether and how a company manages the masses of data generated by human and machine interactions and to analyze the highest-quality of information and thus secures a leading position in the market. Qualified data is the new oil and provides companies that recognize their own advantage therein, the lucrative drive.