Big Data Glossary by Pete Warden

By Pete Warden

To assist you navigate the big variety of new information instruments to be had, this consultant describes 60 of the latest options, from NoSQL databases and MapReduce techniques to computing device studying and visualization instruments. Descriptions are in line with first-hand event with those instruments in a construction environment.

This convenient word list additionally incorporates a bankruptcy of key words that aid outline a lot of those software categories:

  • NoSQL Databases—Document-oriented databases utilizing a key/value interface instead of SQL
  • MapReduce—Tools that help dispensed computing on huge datasets
  • Storage—Technologies for storing facts in a disbursed means
  • Servers—Ways to hire computing strength on distant machines
  • Processing—Tools for extracting worthy details from huge datasets
  • Natural Language Processing—Methods for extracting info from human-created textual content
  • Machine Learning—Tools that instantly practice info analyses, in keeping with result of a one-off research
  • Visualization—Applications that current significant info graphically
  • Acquisition—Techniques for cleansing up messy public info resources
  • Serialization—Methods to transform facts constitution or item nation right into a storable structure

Show description

Read Online or Download Big Data Glossary PDF

Similar data modeling & design books

Polynomial Algorithms in Computer Algebra

For a number of years now i've been educating classes in desktop algebra on the Universitat Linz, the college of Delaware, and the Universidad de Alcala de Henares. within the summers of 1990 and 1992 i've got geared up and taught summer season faculties in laptop algebra on the Universitat Linz. progressively a collection after all notes has emerged from those actions.

Data Dissemination and Query in Mobile Social Networks

With the expanding popularization of private hand held cellular units, extra humans use them to set up community connectivity and to question and proportion facts between themselves within the absence of community infrastructure, growing cellular social networks (MSNet). because clients are just intermittently hooked up to MSNets, person mobility might be exploited to bridge community walls and ahead facts.

Big Practical Guide to Computer Simulations

"This distinct e-book is a musthave for any pupil trying first steps in computing device simulations. Any new pupil becoming a member of my computational physics staff is anticipated to first paintings via Hartmann's consultant sooner than beginning a learn undertaking. " Helmut Katzgraber affiliate Professor Texas A&M college "This publication is filled with necessary details for everybody doing machine simulations.

Extra info for Big Data Glossary

Sample text

Thrift With Thrift, you predefine both the structure of your data objects and the interfaces you’ll be using to interact with them. The system then generates code to serialize and deserialize the data and stub functions that implement the entry points to your interfaces. It generates efficient code for a wide variety of languages, and under the hood offers a lot of choices for the underlying data format without affecting the application layer. It has proven to be a popular IDL (Interface Definition Language) for open source infrastructure projects like Cassandra and HDFS.

Info Solr/Lucene Lucene is a Java library that handles indexing and searching large collections of documents, and Solr is an application that uses the library to build a search engine server. Originally separate projects, they were recently merged into a single Apache open source team. It’s designed to handle very big amounts of data, with a sharding architecture that means it will scale horizontally across a cluster of machines. It also has a very flexible plug-in architecture and configuration system, and it can be integrated with a lot of different data sources.

As a free tool aimed at technically minded consumers, Pipes can’t handle massive datasets, but it’s the equivalent of duct tape for a lot of smaller tasks. Similar but more specialized tools like Alpine Miner have had a lot of success in the commercial world, so I’m hopeful that the Pipes style of interface will show up more often in data processing applications. info • Using YQL and Yahoo! Pipes together Mechanical Turk The original Mechanical Turk was a fraudulent device that appeared to be a chessplaying robot but was actually controlled by a hidden midget.

Download PDF sample

Rated 4.93 of 5 – based on 40 votes