C++笔记：二叉搜索树（Binary Search Tree）

文章目录

二叉搜索树的概念
二叉搜索树操作
二叉搜索树的应用
二叉搜索树的性能分析
二叉搜索树模拟实现源码

二叉搜索树的概念

二叉搜索树又称二叉排序树，它或者是一棵空树，或者是具有以下性质的二叉树：

若它的左子树不为空，则左子树上所有节点的值都小于根节点的值。
若它的右子树不为空，则右子树上所有节点的值都大于根节点的值。
它的左右子树也分别为二叉搜索树。

在这里插入图片描述

二叉搜索树操作

1. 框架搭建

// struct BinarySearchTreeNode - 结点类
template<class K>
struct BSTreeNode
{
	BSTreeNode* _left;
	BSTreeNode* _right;
	K _key;

	BSTreeNode(const K& key)
		: _left(nullptr)
		, _right(nullptr)
		, _key(key)
	{}
};

// class BinarySearchTreeNode - 树类
template<class K>
class BSTree
{
	typedef BSTreeNode<K> Node;

public:

protected:
	Node* _root;
};

【说明】

BSTreeNode 类使用 struct 定义，其成员受默认访问限定符 public 修饰，BSTree 类能够直接访问结点的成员而不需要提供 Get 系列的接口。
BSTree 类的主要功能是维护树的结构，包括插入、删除、搜索等操作，这些操作都是基于根节点展开的。因此，只需要一个根节点的指针就可以代表和维护整棵树。
typedef 操作只是为了简化类型和规范命名，无特别深意。
new 一个新节点时，编译器肯定要调用结点类的构造函数，默认生成的构造函数无法满足要求，所以要显示实现。
为什么用的是protected而不是private，在不涉及继承的情况下，二者并无区别，如何涉及继承protected的使用优于private。

2. 遍历

我们都知道二分查找是一个十分厉害的算法，它能够在 $O (l o g n)$ 的时间复杂度内找到一个目标值，但是它同时又是一个不实用的算法，① 二分查找的前提是要求数据是有序的，对数据预排序会带来额外开销，特别是大型数据集；② 二分查找依赖于顺序表结构的，顺序表结构的头部和中间插入删除开销大，而且插入删除之后需要重新排序，维护成本极高。

但是二叉搜索树规避了这些问题，如果对二叉搜索树进行中序遍历之后就会发现，它从某种意义上来说就是一个天然有序的结构，而且由于其性质的规定，二叉搜索树的插入删除不会影响结构，维护成本低。

public:
	void Inorder()
	{
		_Inorder(_root);
		cout << endl;
	}

protected:
	void _Inorder(Node* root)
	{
		if (root == nullptr)
			return;

		_Inorder(root->_left);
		
		cout << root->_key << " ";
		
		_Inorder(root->_right);
	}

【说明】

将中序遍历写成子函数然后再封装一层的原因是在类外调用函数要传根节点指针作为参数，但由于成员变量 _root 是私有的，类外无法访问，后面操作的递归实现由于这个原因，也是要封装上一层。
因为子函数_Inorder仅仅只是给Inorder调用，为了保证封装性，使用protected访问限定符修饰。
至于为什么用的是protected而不是private，在不涉及继承的情况下，二者并无区别，如何涉及继承protected的使用优于private。

3. 查找

查找到具体过程如下：

从根结点开始比较、查找。
目标值比结点的值大则往右边走查找，目标值比结点的值小则往左边走查找。
找到返回true，走到到空，还没找到，说明值不存在，返回false。
最多查找高度次。

在这里插入图片描述

迭代实现

bool Find(const K& key)
{
	Node* cur = _root;

	while (cur)
	{
		if (cur->_key < key)
		{
			cur = cur->_right;
		}
		else if (cur->_key > key)
		{
			cur = cur->_left;
		}
		else
		{
			return true;
		}
	}

	return false;
}

递归实现

public:
	bool FindR(const K& key)
	{
		return _FindR(_root, key);
	}
protected:
	bool _FindR(Node* root, const K& key)
	{
		if (root == nullptr)
			return false;
	
		if (root->_key < key)
		{
			return _FindR(root->_left, key);
		}
		else if (root->_key > key)
		{
			return _FindR(root->_right, key);
		}
		else
		{
			return true;
		}
	}

4. 插入

插入到具体过程如下：

树为空，则直接新增节点，赋值给 _root 指针，返回true。
树不空，按二叉搜索树性质查找插入位置，插入新节点，返回true。
如果待插入的值已存在，按插入失败处理，返回false。

迭代实现

bool Insert(const K& key)
{
	// 树空，直接作为根结点
	if (_root == nullptr)
	{
		_root = new Node(key);
		return true;
	}

	// 树不空，查找何时位置再插入
	Node* cur = _root;
	Node* parent = cur;

	while (cur)
	{
		if (cur->_key < key)
		{
			parent = cur;
			cur = cur->_right;
		}
		else if (cur->_key > key)
		{
			parent = cur;
			cur = cur->_left;
		}
		else
		{
			return false;
		}
	}

	cur = new Node(key);

	if (parent->_key < key)
	{
		parent->_right = cur;
	}
	else
	{
		parent->_left = cur;
	}

	return false;
}

递归实现

public:
bool InsertR(const K& key)
	{
		return _InsertR(_root, key);
	}
protected:
	bool _InsertR(Node*& root, const K& key)
	{
		if (root == nullptr)
		{
			root = new Node(key);
			return true;
		}

		if (root->_key < key)
		{
			return _InsertR(root->_right, key);
		}
		else if (root->_key > key)
		{
			return _InsertR(root->_left, key);
		}
		else
		{
			return false;
		}
	}

【说明】

递归实现不像迭代器实现那样要分情况，只要root是空就直接插入。
Node*& root能够如此简单的实现，多亏引用运用，
如果不加引用，root只是一个临时变量指向待插入位置，还要想办法去找双亲结点；
加了引用之后，root就是待插位置的双亲结点的孩子指针的别名。

5. 删除

搜索二叉树的删除操作比较复杂，首先，待删结点有五种可能性：
1、待除结点不存在。
2、待删结点是叶子结点。
3、待删结点只存在左子树。
4、待删结点只存在右子树。
5、待删结点左、右子树都存在。

而实际情况中，可能性 2 可以与可能性 3 或者可能性 4 合并起来，因此真正的删除过程如下：

情况1：二叉搜索树为空，或者找不到待删结点，函数返回false，表示删除失败。

情况2：待删结点只存在左子树，先保存待删结点，然后判断待删结点是不是整棵树的根节点：

是根节点：使左子树的根节点作为整棵树的根节点，再删除结点。
不是根节点：使待删节点的双亲结点指向待删节点的左孩子结点，再删除结点。
待删节点有可能是其双亲结点左孩子或者有孩子，这个需要额外判断。

在这里插入图片描述

情况3：待删结点只存在右子树，先保存待删结点，然后判断待删结点是不是整棵树的根节点：

根节点：使右子树的根节点作为整棵树的根节点，再删除结点。
非根节点：使待删节点的双亲结点指向待删节点的右孩子结点，再删除结点。
待删节点有可能是其双亲结点左孩子或者有孩子，这个需要额外判断。

在这里插入图片描述

情况4：待删结点左、右子树都存在，先找到待删结点的右子树的最小结点（或者左子树的最大结点），然后用它来替换待删结点（这里选取右子树的最小结点作为替换方案），然后删除找到的最小结点，删除结点时需要加判断：

右子树的最小结点既有可能是其双亲结点的左孩子，也有可能是右孩子。

在这里插入图片描述

迭代实现

bool Erase(const K& key)
{
	if (_root == nullptr)
		return false;

	Node* cur = _root;
	Node* parent = cur;

	while (cur)
	{
		if (cur->_key < key)
		{
			parent = cur;
			cur = cur->_right;
		}
		else if (cur->_key > key)
		{
			parent = cur;
			cur = cur->_left;
		}
		else // cur->_key == key，执行删除操作
		{
			// 处理只存在右子树
			if (cur->_left == nullptr)
			{
				if (cur == _root)
				{
					_root = cur->_right;
					delete cur;
				}
				else
				{
					if (parent->_left == cur)
					{
						parent->_left = cur->_right;
					}
					else
					{
						parent->_right = cur->_right;
					}

					delete cur;
				}

				return true;
			}
			// 处理只存在左子树
			else if (cur->_right == nullptr)
			{
				if (cur == _root)
				{
					_root = cur->_left;
					delete cur;
				}
				else
				{
					if (parent->_left == cur)
					{
						parent->_left = cur->_left;
					}
					else
					{
						parent->_right = cur->_left;
					}

					delete cur;
				}

				return true;
			}
			// 处理左右子树都存在，将待删结点替换成右子树的最小结点
			// 然后转换成删除右子树的最小结点
			else
			{
				Node* rightMinParent = cur;
				Node* rightMin = cur->_right;

				while (rightMin->_left)
				{
					rightMinParent = rightMin;
					rightMin = rightMin->_left;
				}

				cur->_key = rightMin->_key;
				
				if (rightMinParent->_left == rightMin)
					rightMinParent->_left = rightMin->_right;
				else
					rightMinParent->_right = rightMin->_right;

				delete rightMin;

			}
		}
	}

	// cur == nullptr
	return false;
}

递归实现

public:
	bool EraseR(const K& key)
	{
		return _EraseR(_root, key);
	}
protected:
	bool _EraseR(Node*& root, const K& key)
	{
		if (root == nullptr)
		{
			// 结点不存在，包含空树
			return false;
		}

		if (root->_key < key)
		{
			return _EraseR(root->_right, key);
		}
		else if (root->_key > key)
		{
			return _EraseR(root->_left, key);
		}
		else // root->_key == key，执行删除操作
		{
			Node* del = root;
			// 左为空，右不为空，待删结点只存在右子树
			if (root->_left == nullptr)
			{
				root = root->_right;
			}
			// 右为空，左不为空，待删结点只存在左子树
			else if (root->_right == nullptr)
			{
				root = root->_left;
			}
			// 左右都不为空，左右子树都存在
			else
			{
				Node* rightMin = root->_right;

				while (rightMin->_left)
				{
					rightMin = rightMin->_left;
				}

				swap(root->_key, rightMin->_key);

				// 为什么不能传rightMin？
				return _EraseR(root->_right, key);
			}

			delete del;
			return true;
		}
	}

6. 析构与销毁

public:
	~BSTree()
	{
		clear();
	}

	void clear()
	{
		_Destroy(_root);
		_root = nullptr;
	}
protected:
	void _Destroy(Node* root)
	{
		if (root == nullptr)
			return;

		_Destroy(root->_left);
		_Destroy(root->_right);

		delete root;
	}

【说明】

在某些情况下，我们需要将一颗树清空，按照STL的常规做法该提供一个clear()，清空之后为了避免野指针问题，需要将作为树的入口的_root置空，避免野指针。
析构的作用是回收对象内部的资源，这个功能恰好可以复用clear()接口。
清空这棵树采取的做法是后续遍历删除，目的是为了避免内存泄漏。
后续遍历采用递归实现，需要再封装。

7. 拷贝构造与赋值重载

public:
	// default 关键字强制让编译器生成默认的构造函数
	BSTree() = default;
	
	BSTree(const BSTree<K>& t)
	{
		_root = _Copy(t._root);
	}
	BSTree<K>& operator=(BSTree<K> t)
	{
		swap(_root, t._root);

		return *this;
	}
protected:
	Node* _Copy(const Node* root)
	{
		if (root == nullptr)
			return nullptr;

		Node* newRoot = new Node(root->_key);
		newRoot->_left = _Copy(root->_left);
		newRoot->_right = _Copy(root->_right);

		return newRoot;
	}

【说明】

二叉搜索树的拷贝构造和赋值运算符重载涉及到深拷贝问题，编译器默认生成的函数无法满足要求得自己实现。
拷贝过程决定采用后序递归构建，由于是递归，所以实现一个子函数_Copy()来完成。
拷贝构造函数算是构造函数的重载，显式定义拷贝构造函数之后编译器不再会自己生成默认构造函数，这里使用关键字default强制让编译器生成默认构造函数。
赋值运算符重载参数为BSTree<K> t，对于该写法，编译器会自动调用拷贝构造生成一个临时对象，然后调用库中的swap函数互换_root内容。
赋值运算符要求支持连续赋值，所以要返回*this。

二叉搜索树的应用

K模型：K模型即只有key作为关键码，结构中只需要存储Key即可，关键码即为需要搜索到的值。
比如：给一个单词word，判断该单词是否拼写正确，具体方式如下：
- 以词库中所有单词集合中的每个单词作为key，构建一棵二叉搜索树
- 在二叉搜索树中检索该单词是否存在，存在则拼写正确，不存在则拼写错误。
KV模型：每一个关键码key，都有与之对应的值Value，即<Key, Value>的键值对。该种方式在现实生活中非常常见：
- 比如英汉词典就是英文与中文的对应关系，通过英文可以快速找到与其对应的中文，英文单词与其对应的中文<word, chinese>就构成一种键值对；
- 再比如统计单词次数，统计成功后，给定单词就可快速找到其出现的次数，单词与其出现次数就是<word, count>就构成一种键值对。

二叉搜索树的性能分析

插入和删除操作都必须先查找，查找效率代表了二叉搜索树中各个操作的性能。

对有n个结点的二叉搜索树，若每个元素查找的概率相等，则二叉搜索树平均查找长度是结点在二叉搜索树的深度的函数，即结点越深，则比较次数越多。

但对于同一个关键码集合，如果各关键码插入的次序不同，可能得到不同结构的二叉搜索树：

在这里插入图片描述

最优情况下，二叉搜索树为完全二叉树(或者接近完全二叉树)，其平均比较次数为： $log_2 N$

最差情况下，二叉搜索树退化为单支树(或者类似单支)，其平均比较次数为： $\frac{N}{2}$

二叉搜索树模拟实现源码

#include <iostream>

using namespace std;

namespace ljh
{
	// struct BinarySearchTreeNode - 结点类
	template<class K>
	struct BSTreeNode
	{
		BSTreeNode* _left;
		BSTreeNode* _right;
		K _key;

		BSTreeNode(const K& key)
			: _left(nullptr)
			, _right(nullptr)
			, _key(key)
		{}
	};

	// class BinarySearchTreeNode - 树类
	template<class K>
	class BSTree
	{
		typedef BSTreeNode<K> Node;

	public:
		// default 关键字强制让编译器生成默认的构造函数
		BSTree() = default;

		BSTree(const BSTree<K>& t)
		{
			_root = _Copy(t._root);
		}

		BSTree<K>& operator=(BSTree<K> t)
		{
			swap(_root, t._root);

			return *this;
		}

		~BSTree()
		{
			clear();
		}

		void clear()
		{
			_Destroy(_root);
			_root = nullptr;
		}

		bool Find(const K& key)
		{
			Node* cur = _root;

			while (cur)
			{
				if (cur->_key < key)
				{
					cur = cur->_right;
				}
				else if (cur->_key > key)
				{
					cur = cur->_left;
				}
				else
				{
					return true;
				}
			}

			return false;
		}

		bool Insert(const K& key)
		{
			// 树空，直接作为根结点
			if (_root == nullptr)
			{
				_root = new Node(key);
				return true;
			}

			// 树不空，查找何时位置再插入
			Node* cur = _root;
			Node* parent = cur;

			while (cur)
			{
				if (cur->_key < key)
				{
					parent = cur;
					cur = cur->_right;
				}
				else if (cur->_key > key)
				{
					parent = cur;
					cur = cur->_left;
				}
				else
				{
					return false;
				}
			}

			cur = new Node(key);

			if (parent->_key < key)
			{
				parent->_right = cur;
			}
			else
			{
				parent->_left = cur;
			}

			return false;
		}

		bool Erase(const K& key)
		{
			if (_root == nullptr)
				return false;

			Node* cur = _root;
			Node* parent = cur;

			while (cur)
			{
				if (cur->_key < key)
				{
					parent = cur;
					cur = cur->_right;
				}
				else if (cur->_key > key)
				{
					parent = cur;
					cur = cur->_left;
				}
				else // cur->_key == key，执行删除操作
				{
					// 左为空，右不为空，待删结点只存在右子树
					if (cur->_left == nullptr)
					{
						if (cur == _root)
						{
							_root = cur->_right;
							delete cur;
						}
						else
						{
							if (parent->_left == cur)
							{
								parent->_left = cur->_right;
							}
							else
							{
								parent->_right = cur->_right;
							}

							delete cur;
						}

						return true;
					}
					// 右为空，左不为空，待删结点只存在左子树
					else if (cur->_right == nullptr)
					{
						if (cur == _root)
						{
							_root = cur->_left;
							delete cur;
						}
						else
						{
							if (parent->_left == cur)
							{
								parent->_left = cur->_left;
							}
							else
							{
								parent->_right = cur->_left;
							}

							delete cur;
						}

						return true;
					}
					// 处理左右子树都存在，将待删结点替换成右子树的最小结点
					// 然后转换成删除右子树的最小结点
					else
					{
						Node* rightMinParent = cur;
						Node* rightMin = cur->_right;

						while (rightMin->_left)
						{
							rightMinParent = rightMin;
							rightMin = rightMin->_left;
						}

						cur->_key = rightMin->_key;
						
						if (rightMinParent->_left == rightMin)
							rightMinParent->_left = rightMin->_right;
						else
							rightMinParent->_right = rightMin->_right;

						delete rightMin;

					}
				}
			}

			// cur == nullptr
			return false;
		}

		/
		// 递归实现的函数

		void Inorder()
		{
			_Inorder(_root);
			cout << endl;
		}
		
		bool FindR(const K& key)
		{
			return _FindR(_root, key);
		}

		bool InsertR(const K& key)
		{
			return _InsertR(_root, key);
		}

		bool EraseR(const K& key)
		{
			return _EraseR(_root, key);
		}

	protected:
		void _Inorder(Node* root)
		{
			if (root == nullptr)
				return;

			_Inorder(root->_left);
			
			cout << root->_key << " ";
			
			_Inorder(root->_right);
		}

		bool _FindR(Node* root, const K& key)
		{
			if (root == nullptr)
				return false;

			if (root->_key < key)
			{
				return _FindR(root->_left, key);
			}
			else if (root->_key > key)
			{
				return _FindR(root->_right, key);
			}
			else
			{
				return true;
			}
		}

		bool _InsertR(Node*& root, const K& key)
		{
			if (root == nullptr)
			{
				root = new Node(key);
				return true;
			}

			if (root->_key < key)
			{
				return _InsertR(root->_right, key);
			}
			else if (root->_key > key)
			{
				return _InsertR(root->_left, key);
			}
			else
			{
				return false;
			}
		}

		bool _EraseR(Node*& root, const K& key)
		{
			if (root == nullptr)
			{
				// 结点不存在，包含空树
				return false;
			}

			if (root->_key < key)
			{
				return _EraseR(root->_right, key);
			}
			else if (root->_key > key)
			{
				return _EraseR(root->_left, key);
			}
			else // root->_key == key，执行删除操作
			{
				Node* del = root;
				// 左为空，右不为空，待删结点只存在右子树
				if (root->_left == nullptr)
				{
					root = root->_right;
				}
				// 右为空，左不为空，待删结点只存在左子树
				else if (root->_right == nullptr)
				{
					root = root->_left;
				}
				// 左右都不为空，左右子树都存在
				else
				{
					Node* rightMin = root->_right;

					while (rightMin->_left)
					{
						rightMin = rightMin->_left;
					}

					swap(root->_key, rightMin->_key);

					// 为什么不能传rightMin？
					return _EraseR(root->_right, key);
				}

				delete del;
				return true;
			}
		}

		void _Destroy(Node* root)
		{
			if (root == nullptr)
				return;

			_Destroy(root->_left);
			_Destroy(root->_right);

			delete root;
		}

		Node* _Copy(const Node* root)
		{
			if (root == nullptr)
				return nullptr;

			Node* newRoot = new Node(root->_key);
			newRoot->_left = _Copy(root->_left);
			newRoot->_right = _Copy(root->_right);

			return newRoot;
		}

	protected:
		Node* _root = nullptr;
	};
}