As an agency, our clients are asking us more and more to decipher the huge volumes of data that they’re being provided from their reporting engines – CRM, Email, Social Media, etc. To date, we’ve pulled this data into data stores on our internal server, massaged it, and then exported the information we needed.
The data, however, are becoming more difficult to filter, segment and assign. Social media, for example, provides strings and URLs of information that must be evaluated so that it can be properly tagged, classified and reported on. And the window of time we’re asked to crunch through this data is getting smaller and smaller. The volume of data is getting larger and larger – and more difficult to manage. Our clients need answers… now.
Enter big data solutions like Infochimps. Infochimps started their business with a focus of managing thousands of datasets with a dream of becoming the ‘wikipedia’ of data. However, over time, they figured out that the value and expertise they really provided was building out an infrastructure and service that could manage big data effectively.
Whether you’re a new startup that’s exploding in growth and being buried in data, or an enterprise company that needs help to deploy a solution that will help extract, manipulate and present actionable data, Infochimps can help. An added benefit to the platform as a service? They already have over 15,000 datasets, and are one of the few Gip partners providing the firehose of social media data.
A standard deployment with Infochimps is a combination of in-house services to build out the infrastructure combined with the user accessing and developing the tools. Infochimps is unique among big data platforms because of their flexibility in deployment sizes.
Infochimps Platform Overview:
Infrastructure Layer – The underlying machines that power data collection and integration, real-time analytics, large-scale batch analytics, and data storage.
- Data Delivery Service™ – DDS integrates seamlessly with your existing environment, provides highly scalable ETL (extract-transform-load) capabilities, and enables real-time, streaming data analytics.
- Data Management – Whether it’s HBase, Cassandra, Elasticsearch, MongoDB, MySQL, or others, we ensure the right data storage for the job is always right at your fingertips.
- Cloud Hadoop – Perform large-scale batch analysis as you need it, whether ad-hoc Hadoop clusters or always-on production workflows. Access all the tools you need, with on-demand scaling and tuning.
Application Layer – Interfaces for making Big Data more accessible, including a streamlined analytics scripting tool, a graphical dashboard, and a powerful API.
- Wukong – provides a simplified analytics scripting experience. Write your analytics in developer-friendly Ruby, run code locally for faster development cycles, and leverage existing analytics scripts.
- Dashpot™ – Create real time visualizations from streaming data, gain deep visibility into your Platform systems, and quickly start and stop functional units in your data clusters.
- Platform API – With a unified API, control of the platform and visibility of the data within it are just a few web requests away. Currently in beta.
An example: A large media company has built a social media listening platform that gathers 18 or so streams of social media data. The stream is augmented with gender information, aggregated to summary information, rolled out to counters, has sentiment applied, and executes notifications where requested. The goal of the system is to provide customers with real-time news. Other customers include Cisco, BlackLocus, Runa, Whaleshark Media, and Blue Cava.