PRACTICAL BASICS OF TRAINING UNDERGRADUATES IN BIG DATA(PROCESSING BIG DATA IN CHUNKS USING THE R PROGRAMMING LANGUAGE)
DOI:
https://doi.org/10.52269/RWEP2522187Keywords:
big data, big data analysis, R programming environment, Nycflights13 package, flights files.Abstract
The article considers the training of specialists in big data, increasing the knowledge of students in higher educational institutions based on the processing, storage and analysis of big data in the R programming environment. The presented results are part of a research project aimed at a comprehensive study and integration of knowledge related to hardware-software systems and programming languages. Special attention is given to familiarizing students with R language packages and data storage formats. It is demonstrated that data can be stored in two different formats—.rds and .csv—each offering distinct features and advantages for subsequent big data processing. Big data is divided into structured, semi-structured (XML and JSON) and unstructured (texts, images, and videos), which makes their storage, processing, and analysis more complex. Objective: to consider the case when it is impossible to immediately load a complete set of data into R memory. The ability to process data in fragments when it is impossible to immediately load the full data set into R memory when analyzing big data, and in this case the use of the chunk.apply function from the iotools package by Simon Urbanek and Taylor Arnold is mentioned. The analysis of big data related to the training of undergraduates is carried out, data from the results of the practical part of our research is presented.