[Python] – Use ‘set’ to find the different items in list

Posted in :

陣列(array) 和 集合(set) 的效能在數量巨大時,就可能明顯的感受到差異,今天要來分享,set 的基楚用法:
https://docs.python.org/2/library/stdtypes.html#set

 >>> a=set()
 >>> a.add(1)
 >>> a.add(2)
 >>> a.add(3)
 >>> b=set()
 >>> b.add(2)
 >>> b.add(3)
 >>> b.add(4)
 >>> a^b
 {1, 4}
 >>> a&b
 {2, 3}
 >>> a-b
 {1}
 >>> b-a
 {4}
 >>> a|b
 {1, 2, 3, 4}

說明:

  • 差異用 ^
  • 交集用 &
  • 多的項目用 –
  • 聯集用 |

官方文件:

class set([iterable])class frozenset([iterable])

Return a new set or frozenset object whose elements are taken from iterable. The elements of a set must be hashable. To represent sets of sets, the inner sets must be frozenset objects. If iterable is not specified, a new empty set is returned.

Instances of set and frozenset provide the following operations:len(s)

Return the number of elements in set s (cardinality of s).x in s

Test x for membership in s.x not in s

Test x for non-membership in s.isdisjoint(other)

Return True if the set has no elements in common with other. Sets are disjoint if and only if their intersection is the empty set.

New in version 2.6.issubset(other)set <= other

Test whether every element in the set is in other.set < other

Test whether the set is a proper subset of other, that is, set <= other and set != other.issuperset(other)set >= other

Test whether every element in other is in the set.set > other

Test whether the set is a proper superset of other, that is, set >= other and set != other.union(*others)set | other | ...

Return a new set with elements from the set and all others.

Changed in version 2.6: Accepts multiple input iterables.intersection(*others)set & other & ...

Return a new set with elements common to the set and all others.

Changed in version 2.6: Accepts multiple input iterables.difference(*others)set - other - ...

Return a new set with elements in the set that are not in the others.

Changed in version 2.6: Accepts multiple input iterables.symmetric_difference(other)set ^ other

Return a new set with elements in either the set or other but not both.copy()

Return a shallow copy of the set.

Note, the non-operator versions of union()intersection()difference(), and symmetric_difference()issubset(), and issuperset() methods will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs').

Both set and frozenset support set to set comparisons. Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other). A set is less than another set if and only if the first set is a proper subset of the second set (is a subset, but is not equal). A set is greater than another set if and only if the first set is a proper superset of the second set (is a superset, but is not equal).

Instances of set are compared to instances of frozenset based on their members. For example, set('abc') == frozenset('abc') returns True and so does set('abc') in set([frozenset('abc')]).

The subset and equality comparisons do not generalize to a total ordering function. For example, any two non-empty disjoint sets are not equal and are not subsets of each other, so all of the following return Falsea<ba==b, or a>b. Accordingly, sets do not implement the __cmp__() method.

Since sets only define partial ordering (subset relationships), the output of the list.sort() method is undefined for lists of sets.

Set elements, like dictionary keys, must be hashable.

Binary operations that mix set instances with frozenset return the type of the first operand. For example: frozenset('ab') | set('bc') returns an instance of frozenset.

The following table lists operations available for set that do not apply to immutable instances of frozenset:update(*others)set |= other | ...

Update the set, adding elements from all others.

Changed in version 2.6: Accepts multiple input iterables.intersection_update(*others)set &= other & ...

Update the set, keeping only elements found in it and all others.

Changed in version 2.6: Accepts multiple input iterables.difference_update(*others)set -= other | ...

Update the set, removing elements found in others.

Changed in version 2.6: Accepts multiple input iterables.symmetric_difference_update(other)set ^= other

Update the set, keeping only elements found in either set, but not in both.add(elem)

Add element elem to the set.remove(elem)

Remove element elem from the set. Raises KeyError if elem is not contained in the set.discard(elem)

Remove element elem from the set if it is present.pop()

Remove and return an arbitrary element from the set. Raises KeyError if the set is empty.clear()

Remove all elements from the set.


最後輸出需要排序一下,比較適合人類閱讀:

I want to sort each set.

That’s easy. For any set s (or anything else iterable), sorted(s) returns a list of the elements of s in sorted order:

>>> s = set(['0.000000000', '0.009518000', '10.277200999', '0.030810999', '0.018384000', '4.918560000'])
>>> sorted(s)
['0.000000000', '0.009518000', '0.018384000', '0.030810999', '10.277200999', '4.918560000']

Note that sorted is giving you a list, not a set. That’s because the whole point of a set, both in mathematics and in almost every programming language,* is that it’s not ordered: the sets {1, 2} and {2, 1} are the same set.


You probably don’t really want to sort those elements as strings, but as numbers (so 4.918560000 will come before 10.277200999 rather than after).

The best solution is most likely to store the numbers as numbers rather than strings in the first place. But if not, you just need to use a key function:

>>> sorted(s, key=float)
['0.000000000', '0.009518000', '0.018384000', '0.030810999', '4.918560000', '10.277200999']

For more information, see the Sorting HOWTO in the official docs.

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *