Data
iEcology data sources fall into two broad categories:
a) new data uploaded by users but for different purposes.
b) data on online activity, including data access and search engine usage.
Types of data within the first category can comprise text, images, videos and sounds. The second category is aggregated data and the exploration of frequencies (e.g. the number of times a term was searched or a webpage visited, but could also include interactions on social media such as ‘shares’ and ‘likes’). Both categories have different types of associated metadata that are particularly important for iEcology, such as locality, timestamp, user identity, and links across data.
a) new data uploaded by users but for different purposes.
b) data on online activity, including data access and search engine usage.
Types of data within the first category can comprise text, images, videos and sounds. The second category is aggregated data and the exploration of frequencies (e.g. the number of times a term was searched or a webpage visited, but could also include interactions on social media such as ‘shares’ and ‘likes’). Both categories have different types of associated metadata that are particularly important for iEcology, such as locality, timestamp, user identity, and links across data.
Sources
iEcology data sources differ greatly in their scope, availability, ease of access, associated metadata, and therefore utility for different types of research. Potential data sources range from various social media platforms (e.g. Twitter, Flickr), search engines (e.g. Google, Baidu, Bing), online encyclopedias (e.g. Wikipedia, Encyclopedia Britannica online), and other online repositories (blogs, discussion forums, popular articles and books, etc.). Many of these sources can also be accessed through search engines. The scope of sources differs based on spatio-temporal coverage, linguistic or cultural breadth, data resolution, and the degree of multimedia composition (e.g. text, images, and video) per source. Data also differ in availability: while many sources are freely available, some platforms may restrict availability by limiting data collection (i.e. limits on volume, time-frame, or number of queries) or use (e.g. privileged access or paywall restrictions). Sources also differ in their ease of access, from simple online tools embedded at the source (e.g. Google Trends webpage), through open APIs (Application Programming Interfaces) accessible via various dedicated computer scripts (e.g. Wikipedia and Flickr), to APIs with restricted access (e.g. Facebook). However, data availability and ease of access to different sources can also change over time.
Analysis
The analysis of iEcology data faces similar challenges and uses the same solutions as many other approaches for analysis of Big Data. Many of the methods used in iEcology rely on high levels of automation, frequently adopting machine-learning techniques. There are different tools that can aid each stage of the research: data access, downloading, handling, extraction, storage, pattern identification and recognition, data analysis, and visualization. These tools are in constant state of evolution, as illustrated by developments in deep neural network analysis and other emerging technologies.
Furthermore, it is worthwhile to look here for: Ten simple rules for responsible big data research
Furthermore, it is worthwhile to look here for: Ten simple rules for responsible big data research