創建SparkDataFrames

從本地數據集創建,傳遞data.frame作為參數

  • as.DataFrame
  • createDataFrame
  • read.df

1. From local data frames

> head(faithful)
> class(faithful)
> dim(faithful)

> df <- as.DataFrame(faithful)

> head(df)
  eruptions waiting
1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55

> printSchema(df)
root
 |-- eruptions: double (nullable = true)
 |-- waiting: double (nullable = true)

2. From Data Sources

> people <- read.df("./examples/src/main/resources/people.json", "json")

> head(people)
  age    name
1  NA Michael
2  30    Andy
3  19  Justin

3. From Hive Tables

也可以創建DF連到HIVE表。這需要創建一個能夠支持訪問Hive元數據庫的SparkSession。在SparkR中,創建SparkSession時使用參數(enableHiveSupport = TRUE)。

> sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
SparkDataFrame[]

> sql("LOAD DATA LOCAL INPATH 
'/home/pzdn/app/spark2.1.0/examples/src/main/resources/kv1.txt' INTO TABLE src")
SparkDataFrame[]

> results <- sql("FROM src SELECT key, value")
> head(results)
  key   value
1 238 val_238
2  86  val_86
3 311 val_311
4  27  val_27
5 165 val_165
6 409 val_409

results matching ""

    No results matching ""