evaluating the open source data containers for handling big geospatial raster data

Fei Hu; Mengchao Xu; Jingchao Yang; Yanshou Liang; Kejin Cui; Michael  M. Little; Christopher  S. Lynnes; Daniel  Q. Duffy; Chaowei Yang

evaluating the open source data containers for handling big geospatial raster data

;Fei Hu;Mengchao Xu;Jingchao Yang;Yanshou Liang;Kejin Cui;Michael M. Little;Christopher S. Lynnes;Daniel Q. Duffy;Chaowei Yang

población y desarrollo 2018 Vol. 7 pp. 144-

196

hu2018isprsevaluating

Abstract

Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.

Keywords

big data

Access

DOI:

10.3390/ijgi7040144

URL:

http://www.mdpi.com/2220-9964/7/4/144

Citation

ID: 185233

Ref Key: hu2018isprsevaluating

Use this key to autocite in SciMatic or Thesis Manager

References

No Bibliography

Blockchain Verification

Account:

NFT Contract Address:

0x95644003c57E6F55A65596E3D9Eac6813e3566dA

Article ID:

185233

Unique Identifier:

10.3390/ijgi7040144

Network:

Scimatic Chain (ID: 481)

Blockchain Readiness Checklist

Authors

Abstract

Journal Name

Year

Title

5/5

Creates 1,000,000 NFT tokens for this article

Token Features:

ERC-1155 Standard NFT
1 Million Supply per Article
Transferable via MetaMask
Permanent Blockchain Record

Scan with Saymatik Web3.0 Wallet

Gas fees required in SCI Coins

Buy SCI

Saymatik Web3.0 Wallet

Google Play

App Store

Coming soon

Reference Key: lastname+year+titlefirstword+journalfirstword

Article Type (Article, Book, Proceedings etc.)

Add a reference in a raw form. Our automatic system will correct it later.

evaluating the open source data containers for handling big geospatial raster data

Abstract

Keywords

Access

Citation

References

References

Blockchain Verification

Blockchain Readiness Checklist

Article Tokenized!

Token Features:

Saymatik Web3.0 Wallet