<< Previous Clicked Course

CP2025 - Data Gathering

This course is also available through Online Learning

Information exists in many formats and locations. How data is stored varies greatly, from compact Comma Separated Values (CSV) files for small datasets to distributed file systems like Hadoop Distributed File System (HDFS) for larger datasets. It includes structured data in SQL databases and unstructured data in log files, resulting in a diverse range of storage options and data formats. This course provides students with a broad overview of prevalent data sources and equips students with the expertise to interface effectively with these different data repositories.
 
Students will extract information from database tables by employing select database queries, joins, groupby, and having clauses. Additionally, they will employ subqueries to carry out aggregate calculations. They will utilize NoSQL databases such as MongoDB and manage file-based data formats using the versatile capabilities of scripts. In addition, they will explore distributed data storage mechanisms, encompassing HDFS, Apache Spark, and a Statistical Analysis System (SAS).
 
Upon completion of this course, students will have honed the capability to source, refine, and manage data from a broad spectrum of origins.
 


This course is offered in the following programs:
Data Analytics  | 

<< Previous Clicked Course
Copyright © www.cna.nl.ca