click below
click below
Normal Size Small Size show me how
Compsci #10
From 12 [big data]
| Term | Definition |
|---|---|
| Big data | Ex volume of data at Google, NASA, etc |
| The Vs of Big Data | Main ones - Volume, Variety, Velocity. Other - Veracity, Variability, Value |
| Big data challenges | Difficult to effectively and efficiently capture, store, and analyze big data. Also new breeds of tech are needed |
| Big data considerations | Platform limitations, data arriving too fast for the platform to handle, etc |
| Success factors for big data analytics | A clear business need, strong sponsorship, alignment between the business and IT strategy |
| In memory analytics | Storing and processing the complete data set in RAM |
| In database analytics | Placing analytic procedures close to where data is stored |
| Grid computing & MPP | Use of many machines and processors in parallel (MPP - massively parallel processing) |
| Appliances | Combining hardware, software, and storage in a single unit for performance and scalability |
| Data integration | The ability to combine data quickly and at reasonable costs |
| Processing capabilities | The ability to process the data quicky, as it is captured (ex stream analytics) |
| MapReduce | MapReduce processes large, complex data files by distributing tasks across many simple computers to achieve high performance |
| Hadoop | Open-source framework for storing and analyzing massive unstructured data using low-cost hardware for easy scaling |
| How Hadoop works | Uses HDFS(file system) to split data into parts across nodes on commodity hardware, with one node as Facilitator and another as Job Tracker |
| Hadoop and data warehousing | Hadoop handles unstructured, large-scale data, while data warehouses work with structured, processed data. Together, Hadoop can store raw data, and the warehouse can analyze it after processing |
| MapReduce vs Hadoop | Related but not the same. MapReduce provides control for analytics, but it is not an analytic. Hadoop is about data diversity, not just data volume |