集合-2 | Running Noob

Set

Set接口和常用方法

Set 接口是 Collection 接口的子接口。public interface Set<E> extends Collection<E>
1. Set 集合类中的元素是无序的（即添加顺序和取出顺序不一致），没有索引；
  注意：虽然 Set 中的元素的添加顺序和取出顺序不一致，但是取出顺序是固定好的，即取出顺序是不会变化的。
2. Set 不允许重复元素，所以最多包含一个 null；
3. JDK API 中 Set 接口的常用实现类有 HashSet、TreeSet、LinkedHashSet。
Set 接口的常用方法：和 List 接口一样，Set 接口也是 Collection 接口的子接口，因此，常用方法和 Collection 接口一样。

Set 接口的遍历方式：和 Collection 的遍历方式一样，因为 Set 接口是 Collection 接口的子接口：

可以使用迭代器 iterator；
可以使用增强for；
不能使用索引的方式来获取。Set 没有 get 和 set 方法。

package com.f.chapter14.set;

import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

/**
 * @author fzy
 * @date 2023/5/2 12:35
 */
public class SetMethod {
    public static void main(String[] args) {
        Set set = new HashSet();
        System.out.println(set.add("hello"));   //true
        System.out.println(set.add("hello"));   //false
        set.add(10);
        set.add(10);
        set.add(true);
        set.add(null);
        //Set没有重复元素，即使重复添加了，也只有一个
        System.out.println(set);    //[null, 10, hello, true]

        //Set的遍历
        Iterator iterator = set.iterator();
        while (iterator.hasNext()) {
            Object obj =  iterator.next();
            System.out.println(obj);    //依次为：null, 10, hello, true。也说明了取出顺序不会改变
        }
    }
}

HashSet

public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable

HashSet 实现了 Set 接口。

**HashSet 实际上是 HashMap**。

1
2
3

public HashSet() {
    map = new HashMap<>();
}

可以存放 null 值，但是只能有一个 null。HashSet 不能有重复元素/对象。
HashSet 不保证元素是有序的，取决于 hash 后，再确定索引的结果。即不保证存放元素的顺序和取出顺序一致。

★★★HashSet底层机制和源码解读

HashSet 底层是 HashMap，**HashMap 底层是（数组+链表+红黑树）**。
![](../../../../../Running Noob/计算机/Typora笔记/笔记-git仓库/Java-notebook/img/Java/C14-5.jpg)

分析 HashSet 添加元素时，底层是如何实现的（hash() + equals()）：

HashSet 底层是 HashMap。

1
2
3

public HashSet() {
    map = new HashMap<>();
}

添加一个元素时，**先得到该元素的 hash 值 (h = key.hashCode()) ^ (h >>> 16)**；

public boolean add(E e) {
    /*
    * PRESENT: private static final Object PRESENT = new Object();
    * 用于占位的，使得HashSet能与"底层为HashMap"的逻辑相切合
    * */
    return map.put(e, PRESENT)==null;
}

1
2
3

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

然后将 hash 值转换为索引值（tab[i = (n - 1) & hash]），找到存储数据表 table，看 table 的这个索引位置是否已经存放元素；

（table 就是 HashMap 的属性：transient Node<K,V>[] table;，用于存放元素）

如果还未添加任何元素，则 table 为 null，这时就会调用 resize 方法（resize方法是由putVal方法调用的），对 table 进行初始化，table 的默认初始容量为 16，默认初始阈值为 12：

final Node<K,V>[] resize() {	//resize方法是由putVal方法调用的
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        //默认初始容量为16：static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
        newCap = DEFAULT_INITIAL_CAPACITY;
        //默认初始阈值为12：static final float DEFAULT_LOAD_FACTOR = 0.75f;
        //目的是能让Table有缓冲，防止突然有大量数据加入而造成阻塞
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

如果 table 数组使用到了临界值 12，就会扩容到 16 * 2 = 32，新的临界值就是 12 * 2 = 24，以此类推。若以 capacity(threshold) 的形式来表示，则为 16(12)、32(24)、64(48)、128(96)...... 。

如果 table 的这个索引位置没有存放元素，则直接加入；

如果有，则会有三种情况（具体见代码），主要就是先比较 hash 值，然后调用 equals 比较（**equals条件可以由开发者自己来重写），如果已经存在 hash 值相同且**满足 equals 条件的元素，就放弃添加，反之则添加到链表的最后；-> hash() + equals()

if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;		//定义临时变量
    //table就是HashMap的属性：transient Node<K,V>[] table;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;	//还未添加任何元素时，table为null，所以会调用resize方法
    if ((p = tab[i = (n - 1) & hash]) == null)		//tab索引所在位置还没有存放元素
        tab[i] = newNode(hash, key, value, null);	//直接加入
    else {	//tab索引所在位置已经存放元素了
        Node<K,V> e; K k;
        if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode) //如果p是一棵红黑树，就调用putTreeVal方法进行添加
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //循环比较tab索引所在位置的元素链表
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);   //添加到链表最后
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);  //红黑树化
                    break;
                }
                if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;	//将key相同的旧val用新val替换
            afterNodeAccess(e);
            return oldValue;	//返回oldValue表示添加失败
        }
    }
    ++modCount;
    //每成功加入一个结点，size就会++
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;	//返回null代表添加成功
}

在 Java 8 中，如果**一条链表的元素个数超过 TREEIFY THRESHOLD (默认是8)**，就会进行判断：

如果 table 的大小 >= MIN_TREEIFY_CAPACITY (默认64)，就会将该链表进行树化 (转换为红黑树) treeifyBin 。
否则仍采用数组扩容机制 resize。

然后在下一次添加元素时，如果**一条链表的元素个数又超过 TREEIFY THRESHOLD (默认是8)**，就会再次判断，并树化或 resize ，以此类推。

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();	//如果没有树化那就变为采用数组扩容机制
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

注意：添加的元素结点的类型为 HashMap$Node

★HashSet例题

来一道有意思的题：下面的代码，输出内容分别是什么？

package com.f.chapter14.set;

import java.util.HashSet;
import java.util.Objects;

/**
 * @author fzy
 * @date 2023/6/23 22:27
 */
public class HashSetExercise {
    public static void main(String[] args) {
        HashSet hashSet = new HashSet();
        Person p1 = new Person("1001", "AA");
        Person p2 = new Person("1003","CC");
        hashSet.add(p1);
        hashSet.add(p2);
        p1.setName("DD");
        hashSet.remove(p1);
        System.out.println(hashSet);    //1.这里输出什么？ -> [Person{id='1003', name='CC'}, Person{id='1001', name='DD'}]
        hashSet.add(new Person("1001","DD"));
        System.out.println(hashSet);    //2.这里输出什么？ -> [Person{id='1003', name='CC'}, Person{id='1001', name='DD'}, Person{id='1001', name='DD'}]
        hashSet.add(new Person("1001","AA"));
        System.out.println(hashSet);    //3.这里输出什么？ -> [Person{id='1003', name='CC'}, Person{id='1001', name='DD'}, Person{id='1001', name='DD'}, Person{id='1001', name='AA'}]
    }
}

class Person {
    private String id;
    private String name;

    public Person(String id, String name) {
        this.id = id;
        this.name = name;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Person person = (Person) o;
        return Objects.equals(id, person.id) && Objects.equals(name, person.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, name);
    }

    @Override
    public String toString() {
        return "Person{" +
                "id='" + id + '\'' +
                ", name='" + name + '\'' +
                '}';
    }
}

分析：首先要明确，因为 Person 类已经重写了 equals 和 hashCode 方法，所以对于 id 和 name 相同的元素来说，它们的 hash() 是相同的，并且 equals 为 true。
- 在新建 p1、p2 对象并将其添加到 hashSet 中时，因为在添加对象时，机制为：根据 hashCode 得到 hash ，然后根据 hash 得到在 table 表中的索引位置，如果所在位置为空，就直接添加，否则通过 hash & equals() 的比较 [链表比较] ，来判断是否添加。
  因为 p1 、p2 的 id 和 name 并不相同，所以毫无疑问，添加成功。
- 然后将 p1.name 修改为 DD ，注意这里是先将 p1 添加到 table 表中，再修改 name，所以 p1 在 table 表中的索引位置并没有发生变化。
- 然后调用 remove 函数，需要注意，remove 的机制和 add 的机制是一样的，也是先根据对象的 hash 值来得到其所在 table 表中的索引位置，然后通过 hash & equals() 的比较 [链表比较] ，来进行删除。
- 但是，因为 p1.name 发生了变化，所以 p1 的 hashCode 变化了，进而带动 hash 及索引位置的变化，所以在 remove 时，并没有正确找到 p1 所在 table 表中的位置，所以 p1 并没有被删除。
  因此在第一个地方，输出为 [Person{id='1003', name='CC'}, Person{id='1001', name='DD'}] 。
- 然后添加一个新的 Person 对象：hashSet.add(new Person("1001","DD")); ，因为根据新的 Person 对象的 hash，得到在 table 表中的索引位置处并没有元素，所以添加成功。
  注意：在这里，新的 Person 对象在 table 表中的索引位置和之前要对 p1 remove 时得到的在 table 表中的索引位置是一样的，因为此时它们的 id 都是 1001，name 都是 DD。
  因此在第二个地方，输出为 [Person{id='1003', name='CC'}, Person{id='1001', name='DD'}, Person{id='1001', name='DD'}] 。
- 接着添加一个新的 Person 对象：hashSet.add(new Person("1001","AA")); ，因为根据新的 Person 对象的 hash，得到在 table 表中的索引位置，正好在 p1 所在的索引位置，所以该对象会和 p1 进行 hash & equals() 的比较，比较发现，它们的 hash 不相同，equals() 也为 false，所以添加成功，并且添加在 p1 的后面，形成链表。
  之所以 hash 不相同，equals() 也为 false，是因为此时 p1 的 id 为 1001，name 为 DD，并不和该新的 Person 对象相同。
  因此在第三个地方，输出为 [Person{id='1003', name='CC'}, Person{id='1001', name='DD'}, Person{id='1001', name='DD'}, Person{id='1001', name='AA'}] 。
这题需要注意的点就是，p1 是在添加到 table 后对 name 进行了修改，所以其在添加到 table 表中时的 hashCode 值和修改 name 之后的 hashCode 值是不同的。

LinkedHashSet

public class LinkedHashSet<E> extends HashSet<E> implements Set<E>, Cloneable, java.io.Serializable

LinkedHashSet 是 HashSet 的子类。
LinkedHashSet 底层是一个 LinkedHashMap，底层维护了一个 “**数组+双向链表**”。
LinkedHashSet 根据元素的 hashCode 值来决定元素的存储位置，同时使用链表维护元素的次序（如下图），这使得元素看起来是以插入顺序保存的，即 LinkedHashSet 的添加顺序和取出顺序一致。
![](../../../../../Running Noob/计算机/Typora笔记/笔记-git仓库/Java-notebook/img/Java/C14-6.jpg)
LinkedHashSet 不允许重复添加元素。

LinkedHashSet源码解读

LinkedHashSet 的底层维护的是一个 LinkedHashMap（LinkedHashMap 是 HashMap 的子类）：
public class LinkedHashMap<K,V> extends HashMap<K,V> implements Map<K,V>
LinkedHashSet 底层结构是 **数组table+双向列表**。
1
2
3
public LinkedHashSet() {
super(16, .75f, true);
}
1
2
3
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<>(initialCapacity, loadFactor);
}
第一次添加元素时，直接将数组 table 扩容到 16，走的还是和前面的 HashSet 一样的函数。不过现在 table (HashMap$Node[] 类型) 里存放的结点类型是 LinkedHashMap$Entry。
- Entry 是 LinkedHashMap 的 static class ，里面有 Entry<K,V> before, after; 分别指向前后结点。
  1
  2
  3
  4
  5
  6
  static class Entry<K,V> extends HashMap.Node<K,V> {
  Entry<K,V> before, after;
  Entry(int hash, K key, V value, Node<K,V> next) {
  super(hash, key, value, next);
  }
  }
  Entry<K,V> 是 HashMap.Node<K,V> 的子类。

在添加第二个及之后的元素时，会通过 linkNodeLast 函数来形成双向链表。

Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) {
    LinkedHashMap.Entry<K,V> p =
        new LinkedHashMap.Entry<K,V>(hash, key, value, e);
    linkNodeLast(p);
    return p;
}

// link at the end of list
private void linkNodeLast(LinkedHashMap.Entry<K,V> p) {
    LinkedHashMap.Entry<K,V> last = tail;
    tail = p;
    if (last == null)
        head = p;
    else {
        p.before = last;
        last.after = p;
    }
}

TreeSet

当我们使用无参构造器创建 TreeSet 时，对添加到其中的元素，默认是按升序进行排序的。

我们也可以用 Comparator 来指定排序的方式：

package com.f.chapter14.set;

import jdk.internal.org.objectweb.asm.tree.MultiANewArrayInsnNode;

import java.util.Comparator;
import java.util.TreeSet;

/**
 * @author fzy
 * @date 2023/6/23 9:57
 */
public class TreeSet01 {
    public static void main(String[] args) {
        TreeSet treeSet = new TreeSet(new Comparator() {
            @Override
            public int compare(Object o1, Object o2) {
                return ((String)o1).compareTo((String)o2);  //升序(默认)
                //return ((String)o2).compareTo((String)o1);  //降序
            }
        });
        treeSet.add("3");
        treeSet.add("1");
        treeSet.add("2");
        treeSet.add("fbc");
        treeSet.add("bcd");
        System.out.println(treeSet);
    }
}

TreeSet 不允许添加 null 。
TreeSet 添加的对象之间是要可以通过 Comparator 相互比较的，目前测试来看，添加的对象最好都是同一类型。

TreeSet源码解读

在上面的例子中，创建 TreeSet 对象时，TreeSet 构造器将传入的 Comparator 对象，赋给了 TreeSet 底层的 TreeMap 对象的 comparator 属性。

1
2
3

public TreeSet(Comparator<? super E> comparator) {
    this(new TreeMap<>(comparator));
}

1
2
3

public TreeMap(Comparator<? super K> comparator) {
    this.comparator = comparator;
}

在向 TreeSet 对象中添加元素时，最终调用的是 TreeMap 类的 put 方法。

1
2
3

public boolean add(E e) {
    return m.put(e, PRESENT)==null;
}

public V put(K key, V value) {
    Entry<K,V> t = root;
    if (t == null) {
        compare(key, key); // type (and possibly null) check

        root = new Entry<>(key, value, null);
        size = 1;
        modCount++;
        return null;
    }
    int cmp;
    Entry<K,V> parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;	//就是我们传入的Comparator对象
    if (cpr != null) {
        do {
            parent = t;
            cmp = cpr.compare(key, t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else
                return t.setValue(value);
        } while (t != null);
    }
    else {
        if (key == null)
            throw new NullPointerException();
        @SuppressWarnings("unchecked")
        Comparable<? super K> k = (Comparable<? super K>) key;
        do {
            parent = t;
            cmp = k.compareTo(t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else
                return t.setValue(value);	//如果cmp==0，则不添加(并不是替换)
        } while (t != null);
    }
    Entry<K,V> e = new Entry<>(key, value, parent);
    if (cmp < 0)
        parent.left = e;
    else
        parent.right = e;
    fixAfterInsertion(e);
    size++;
    modCount++;
    return null;
}