大数据处理首选Scala语言,Java语言在Java8之前无数据处理能力,无函数式编程。
一、模式匹配
类似于Java中的switch case,switch case对传入的值进行匹配,而Scala中的模式匹配可对类型、集合等进行匹配,功能更强大。
//一个简单的值匹配例子:
//函数为Unit类型,因为函数返回println()
scala> def bigData(data:String){
| data match{
| case "Spark" => println("wow")
| case "Hadoop"=> println("ok")
| case _ => println("Something Others ") //_表示不满足上面匹配条件的所有其他情况
| }
| }
bigData: (data: String)Unit
scala> bigData("Hadoop")
ok
scala> bigData("HBase")
Something Others
//匹配时对_其他情况做条件判断,即守卫条件
scala> def bigData(data:String){
data match{
case "Spark" => println("wow")
case "Hadoop"=> println("ok")
case _ if data == "HBase" => println("Cool:"+data)
case _ => println("Something Others ")
}
}
bigData: (data: String)Unit
scala> bigData("HBase")
Cool:HBase
//类型匹配
scala> import java.io._
import java.io._
scala> def exception(e: Exception){
e match{
case fileException:FileNotFoundException => println("File not found:" + fileException)
case _: Exception => println("Other exception")
}
}
exception: (e: Exception)Unit
scala> exception(new FileNotFoundException("Oops"))
File not found:java.io.FileNotFoundException: Oops
//对集合(以Array类型为例,List,Set类似)进行模式匹配
scala> def data(array:Array[String]){
| array match{
| case Array("Scala") => println("Scala") //集合中仅有Scala一个元素
| case Array(spark, hadoop, hbase) => println("spark:" + spark +" hadoop:" + hadoop + " hbase:"+ hbase) //集合中有三个元素,匹配依次赋值给spark,hadoop,hbase
| case Array("Spark", _*) => println("Spark ...") //集合以Saprk开头,_*表示后面的元素,不参加匹配
| case _ => println("Unknow")
| }
| }
data: (array: Array[String])Unit
scala> data(Array("Scala"))
Scala
scala> data(Array("Spark"))
Spark ...
scala> data(Array("wow","ok","hhha"))
spark:wow hadoop:ok hbase:hhha
//case class样例类,默认成员只读(val,只有getter()方法)
//会自动生成伴生对象,伴生对象中有apply和unapply方法,这使得我们在使用时可以不显式的new对象
scala> case class Person(name:String)
defined class Person
scala> Person("Spark") //"Spark"传给case class Person默认生成的伴生对象的apply()方法
res8: Person = Person(Spark) //返回case class Person的实例
//定义Person类
scala> class Person
defined class Person
warning: previously defined object Person is not a companion to class Person.
Companions must be defined together; you may wish to use :paste mode for this.
//定义两个case class
scala> case class Worker(name:String,salart:Double) extends Person
defined class Worker
scala> case class Student(name:String, score:Double) extends Person
defined class Student
//case class模式匹配
scala> def sayHi(person:Person){
| person match{
| case Student(name,score) => println("Student: " + name + " " + score)
| case Worker(name,salary) =>println("Worker: " + name + " " + salary)
| case _ => println("Unknow")
| }
| }
sayHi: (person: Person)Unit
scala> sayHi(Worker("Spark",6.5))
Worker: Spark 6.5
scala> sayHi(Student("Hadoop",6.0))
Student: Hadoop 6.0
//多个参数的case class
scala> case class WorkClass(persons:Person*) //WorkClass接受多个Person类型参数的类
defined class WorkClass
scala> def sayHi(){
| val person = WorkClass(Worker("Spark",6.6),Student("Hadoop",6.5))
| person match{
| case WorkClass(_,Student(name, score) ) => println("Student: " + name + " " + score)
| case _ => println("Unknow")
| }
| }
sayHi: ()Unit
scala> sayHi()
Student: Hadoop 6.5
//元组匹配
scala> val t=("spark","hive","SparkSQL")
t: (String, String, String) = (spark,hive,SparkSQL)
scala> def tuplePattern(t:Any)=t match {
| case (one,_,_) => one //对应位置"Spark"赋值给one
| case _ => "Other"
| }
tuplePattern: (t: Any)Any
scala> tuplePattern(t)
res3: Any = spark
//利用模式匹配按规定格式输出Map
scala> def pipei(){
| val m=Map("china"->"beijing","dwarf japan"->"tokyo","Aerican"->"DC Washington")
| for((nation,capital)<-m)
| println(nation+": " +capital)
| }
pipei: ()Unit
scala> pipei()
china: beijing
dwarf japan: tokyo
Aerican: DC Washington
在模式匹配中,有时为了确保所有的可能情况都被列出,将case class的超类定义为sealed(密封的) case class,上例Person则定义sealed class Person,此时在匹配中必须将所有可能出现的情况全部列出。
Option类型模式匹配:
Spark源码中模式匹配case class Some和case object None(继承于Option类)。
case class会自动生成伴生对象,含apply和unapply方法,而case object则无。
//Some,None用于模式匹配
scala> def OptionDemo(t:String){
| val p=Map("spark"->2,"hadoop"->3,"hbase"->4)
| p.get(t) match{
| case Some(x) => println(x)
| case _None=> println("None")
| }
| }
OptionDemo: (t: String)Unit
scala> OptionDemo("Spark")
None
scala> OptionDemo("spark")
2
二、类型系统
泛型:泛型类和泛型方法的参数类型在实际使用时具体指定。
类class和特质trait可以带泛型。对象不能泛型化。
//定义泛型类
scala> class Person[T](val content:T){ //泛型类,scala会从构造函数推断具体类型
| def getContent(id: T) = id + "_" + content //泛型函数
| }
defined class Person
scala> val p = new Person[String]("Spark") //指定为String类型
p: Person[String] = Person@8c11eee
scala> p.getContent("Scala") //此时getContent必须传入String类型
res1: String = Scala_Spark
scala> p.getContent(666)
<console>:10: error: type mismatch;
found : Int(666)
required: String
p.getContent(666)
//多个泛型参数
scala> class Person[T,S](val name:T, val age:S){
| def get() = name + "_" + age
| }
defined class Person
scala> val p = new Person[String,Int]("Spark",8)
p: Person[String,Int] = Person@661fe025
scala> p.get
res6: String = Spark_8
scala> def mid[T](a:Array[T]) = a(a.length/2) //泛型函数
mid: [T](a: Array[T])T
scala> mid(Array("hadoop","spark","hbase"))
res0: String = spark
scala> mid(Array(1,2,3))
res1: Int = 2
类型变量界定(Type Variable Bound):
若我们在泛型类中使用了某个方法,而该方法并非所有类型中都存在,此时若不进行类型变量界定,则编译不通过,因为无法判断后续将要指定的具体类型。我们可以使用类型变量界定(<:)将泛型T限定在某个存在该方法的类或接口的继承层次结构中,达到要求。
下边界 Lower Bound(>:) 上边界 Upper Bound(<:)
下边界指定泛型类型必须为某个类的父类或该类本身;
上边界指定泛型类型必须为某个类的子类或该类本身。
使用上边界时,实际调用时并非调用子类,而是调用抽象类或接口。
//T <: AnyVal表示泛型T的类型的最顶层类是AnyVal,所有输入是AnyVal的子类都是合法的,其它的都是非法的
case class Student[S,T <: AnyVal](var name:S,var hight:T)
scala> class Pair[T](val first: T, val second: T){ //错误,T类型并不一定有compareTo方法
| def smaller = if (first.compareTo(second)<0) first else second
| }
<console>:8: error: value compareTo is not a member of type parameter T
def smaller = if (first.compareTo(second)<0) first else second
scala> class Pair[T <: Comparable[T]](val first: T, val second: T){ //指定上界
| def smaller = if (first.compareTo(second)<0) first else second
| }
defined class Pair
scala> val p = new Pair("hadoop", "Spark")
p: Pair[String] = Pair@6de30571
scala> p.smaller
res2: String = Spark
scala> val h = new Pair(1, 2) //错误,不满足上界条件,用视图界定解决这个问题
<console>:8: error: inferred type arguments [Int] do not conform to class Pair's type parameter bounds [T <: Comparable[T]]
val h = new Pair(1, 2)
^
<console>:8: error: type mismatch;
found : Int(1)
required: T
val h = new Pair(1, 2)
^
<console>:8: error: type mismatch;
found : Int(2)
required: T
val h = new Pair(1, 2)
ViewBound 视图界定 <%
在进行类型变量界定后,若我们指定的具体类型无实现类中方法的接口,此时编译不通过。采用ViewBound,可跨越类层次结构,对指定类型进行隐式转换,隐式转换后的类型如果处于视图边界内,即可实现类中方法,则可将该类型传入使用。
scala> class Pair[T <% Comparable[T]](val first: T, val second: T){ //视图界定
| def smaller = if (first.compareTo(second)<0) first else second
| }
defined class Pair
scala> val h = new Pair(1, 2) //Int隐式转换成为RichInt
h: Pair[Int] = Pair@50d951e7
scala> h.smaller
res3: Int = 1
//Context Bound上下文界定T:M(M是另一个泛型类),要求存在一个M[T]的隐式值
scala> class Compare[T:Ordering](val n1:T, val n2:T){
//隐式值ordered,类型为Ordering[T]
| def bigger(implicit ordered: Ordering[T]) = if(ordered.compare(n1,n2)>0) n1 else n2}
defined class Compare
scala> new Compare[Int](8, 3).bigger
res3: Int = 8
scala> new Compare[String]("Spark", "Hadoop").bigger
res4: String = Spark
多重界定:
类型变量同时有上界和下界T >:Lower <: Upper
有多个视图界定T <% Comparable[T] <%String
trait List[+T] {} 协变
trait List[-T] {} 逆变
当类型S是类型A的子类型时,则List[S]也可以认为是List[A}的子类型,称为协变。
Java中无协变逆变,协变逆变会破环类型安全。
定义协变时,类和类中方法都要定义为协变。
当类型S是类型A的子类型,则Queue[A]反过来可以认为是Queue[S}的子类型,称为逆变。
例如:
Pair[T]
若Student是Person的子类,此时Pair[Student]和Pair[Person]无关系。
Pair[+T] 协变
若Student是Person的子类,此时Pair[Student]是Pair[Person]的子类。
Pair[-T] 协变
若Student是Person的子类,此时Pair[Student]是Pair[Person]的父类。