Appendix C. Tips and tricks


	Windows 上の ActivePython IDE では, 現在編集している Python プログラムを File->Run... (Ctrl-R) を選ぶことで実行することができます. 出力は interactive window に表示されます.


	Mac OS 上の Python IDE では, Python->Run window... (Cmd-R) で Python プログラムを実行できますが, 最初に設定すべき重要なオプションがあります. IDE で `.py` ファイルを開いて, ウィンドウの右上にある黒い三角のマークをクリックしてメニューをポップアップさせ, Run as __main__ オプションにチェックが入っていることを確認してください. これはファイルごとの設定ですが, 設定が必要なのは 1 つのファイルに対し 1 回だけです.


	UNIX 互換なシステム (Mac OS X を含む) では, `python odbchelper.py` とコマンドラインから Python プログラムが実行できます.

2.2. 関数の宣言


	Visual Basic では, 関数 (つまり値を返すもの) の宣言は `function` で始まり, サブルーチン (値を返さないもの) の宣言は `sub` で始まります. Python にはサブルーチンはありません. 全ては関数であり, 全ての関数は (たとえそれが `None` だとしても) 値を返し, 宣言は `def` で始まります.


	Java, C++ や他の静的型付き言語では, 関数の返り値と個々の引数のデータ型を指定しなくてはいけません. Python では, どれもデータ型を明示することはありません. どの値を割り当てたかによって, Python は内部でデータ型の情報を得ています.

2.3. 関数の文書化


	また Perl の `qq/.../` のように, 3 連引用符は単引用符や二重引用符を含んだ文字列を定義する簡単な方法です.


	多くの Python IDE は `doc string` を文脈を読むのに役立つ文書として提供するのに使用していて, 関数名を打ったときにその `doc string` が簡単な使い方の説明として表れます. この機能はとても助かるのですが, それはもちろん `doc string` をきちんと書いた分だけです.

2.4. 全てはオブジェクト


	Python の `import` は Perl の `require` に似ています. Python モジュールを `import` すれば, その関数に `module.function` という形式でアクセスできます. Perl モジュールを `require` すれば, その関数に `module::function` という形式でアクセスできます.

2.5. コードのインデント


	Python は改行 (carriage return) を文の区切りに使い, コロンとインデント付けをコードブロックの区切りに使います. C++ と Java はセミコロンを文の区切りに使い, 波括弧をコードブロックの区切りに使います.

2.6. モジュールのテスト


	Python は C のように, `==` を比較に用い, `=` を代入に用います. しかしPython は C と違って, インラインの代入を認めていません. なので, 比較しようとしていたところで間違って代入してしまうことはありません.


	MacPython では, `if` `__name__` トリックを仕掛ける前にもう 1 ステップ作業が必要です. ウィンドウの右上角の白い三角をクリックしてモジュールのオプションをポップアップさせ, Run as __main__ にチェックが付いているか確認します.

Chapter 3. 組み込みのデータ型

3.1. 辞書の紹介


	Python の辞書は Perl のハッシュに似ています. Perl では, ハッシュを値に持つ変数は常に `%` という文字から始まります. Python では変数名は任意であり, Python は内部的にデータ型を把握しています.


	Python の辞書は Java の `Hashtable` クラスのインスタンスに似ています.


	Python の辞書は Visual Basic の `Scripting.Dictionary` オブジェクトのインスタンスに似ています.

3.1.2. 辞書の変更


	辞書には要素間の順序という考えはありません. なので, 要素の“順序が狂った”と言うのは正しくありません. ただ単に順序付けされていないだけです. (例えばキーによるアルファベット順のように) ある特定の順序で辞書の要素に何度もアクセスしたい場合に, この重要な違いは悩ましいものかもしれません. そうする方法は実際にはあるのですが, 辞書に組み込まれてはいません.

3.2. リストの紹介


	Python のリストは Perl の配列に似ています. Perl では, 配列を格納する変数は必ず `@` 文字から始まります. Python では変数名は任意であり, Python は内部的にデータ型を把握しています.


	Python のリストは Java の配列以上のものです (とはいえ, ただ単に配列のように使いたいのであれば, その通りに使えます). どちらかと言うと, 任意のオブジェクトを保持でき, 新しい要素が追加されたときに動的に拡張される `ArrayList` クラスに例える方が適切です.

3.2.3. リストの探索


	バージョン 2.2.1 以前, Python は専用の論理値用データ型を持っていませんでした. その代わりに, Python は (`if` 文のような) 論理値を期待する文脈で, 次のルールに従ってほとんどのものを受け入れていました. `0` は偽, それ以外の全ての数字は真. 空文字列 (`""`) は偽, それ以外の全ての文字列は真. 空のリスト (`[]`) は偽, それ以外の全てのリストは真. 空のタプル (`()`) は偽, それ以外の全てのタプルは真. 空の辞書 (`{}`) は偽, それ以外の全ての辞書は真. これらのルールは Python 2.2.1 以降でもまだ適用されますが, 今は `True` や `False` という値を持つきちんとした論理値も使えます. 大文字小文字の違いに注意してください. Python の他の全てのものと同じように, これらの値でも大文字小文字が区別されます.

3.3. タプルの紹介


	タプルはリストへ変換することができ, その逆も可能です. 組み込みの `tuple` 関数はリストを 1 つ引数に取り, 同じ要素を持つタプルを返します. そして `list` 関数はタプルを 1 つ引数に取りリストを返します. つまり効果としては, `tuple` はリストを冷凍し, `list` はタプルを解凍します.

3.4. 変数の宣言


	行継続記号 (“`\`”) を使って 1 つの命令を複数行に分割するときには, 継続する行のインデントは好きなように付けられます. Python の通常の厳格なインデントのルールは適用されません. Python IDE が自動で継続行にインデントを付ける場合には, 重大な理由がない限りデフォルトのインデントを受け入れた方が良いでしょう.

3.5. フォーマット文字列


	Python のフォーマット文字列では, C の `sprintf` 関数と同じ文法が使われています.

3.7. リストの結合と文字列の分割


	`join` は文字列のリストに対してのみ作用します. 暗黙の型変換は一切行いません. 結合するリストに 1 つ以上の非文字列要素が含まれていると, 例外が送出されます.


	`anystring.split(delimiter, 1)` は, 文字列から部分文字列を検索して, その部分文字列より前の部分 (返り値のリストの 1 番目の要素) や後ろの部分 (2 番目の要素) に何か処理をする場合に役立つテクニックです.

Chapter 4. 内省の力

4.2. オプションの引数と名前付きの引数を使う


	関数を呼び出すときにしなければならないことは, 必須の引数に対しそれぞれ (なんらかの方法で) 値を指定することだけです. その作法や順序はあなた次第です.

4.3.3. 組み込み関数


	Python には素晴しいリファレンスマニュアルが付いているので, それをじっくり読んで Python が提供してくれている全てのモジュールについて学ぶべきです. しかし, 他のほとんどの言語ではモジュールの使い方を思い出すのにマニュアルや man ページに戻って調べるのに対し, Python は大部分が自己文書化されています.

4.7. lambda 関数を使う


	`lambda` 関数はスタイルに関わる問題です. それは絶対使わなければいけないものではありません. それが使える場所では, 代わりに通常の関数を別に定義して使うこともできます. 私は, ある固有の再利用できないコードをカプセル化し, たくさんの小さな 1 行関数を撒き散らさないようにするために, `lambda` 関数を使います.

4.8. 全てをまとめる


	SQL では, null 値との比較に `= NULL` ではなく `IS NULL` を使わなければいけません. Python では, `== None` も `is None` も使えますが `is None` の方が高速です.

Chapter 5. オブジェクトとオブジェクト指向

5.2. from module import を使ってのインポート


	`from module import *` in Python is like `use module` in Perl; `import module` in Python is like `require module` in Perl.


	`from module import ` in Python is like `import module.` in Java; `import module` in Python is like `import module` in Java.


	Use `from module import *` sparingly, because it makes it difficult to determine where a particular function or attribute came from, and that makes debugging and refactoring more difficult.

5.3. Defining Classes


	The `pass` statement in Python is like an empty set of braces (`{}`) in Java or C.


	In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. There is no special keyword like `extends` in Java.

5.3.1. Initializing and Coding Classes


	By convention, the first argument of any Python class method (the reference to the current instance) is called `self`. This argument fills the role of the reserved word `this` in C++ or Java, but `self` is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but `self`; this is a very strong convention.

5.3.2. Knowing When to Use self and __init__


	`__init__` methods are optional, but when you define one, you must remember to explicitly call the ancestor's `__init__` method (if it defines one). This is more generally true: whenever a descendant wants to extend the behavior of the ancestor, the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments.

5.4. Instantiating Classes


	In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit `new` operator like C++ or Java.

5.5. Exploring UserDict: A Wrapper Class


	In the ActivePython IDE on Windows, you can quickly open any module in your library path by selecting File->Locate... (Ctrl-L).


	Java and Powerbuilder support function overloading by argument list, i.e. one class can have multiple methods with the same name but a different number of arguments, or arguments of different types. Other languages (most notably PL/SQL) even support function overloading by argument name; i.e. one class can have multiple methods with the same name and the same number of arguments of the same type but different argument names. Python supports neither of these; it has no form of function overloading whatsoever. Methods are defined solely by their name, and there can be only one method per class with a given name. So if a descendant class has an `__init__` method, it always overrides the ancestor `__init__` method, even if the descendant defines it with a different argument list. And the same rule applies to any other method.


	Guido, the original author of Python, explains method overriding this way: "Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class, may in fact end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively virtual.)" If that doesn't make sense to you (it confuses the hell out of me), feel free to ignore it. I just thought I'd pass it along.


	Always assign an initial value to all of an instance's data attributes in the `__init__` method. It will save you hours of debugging later, tracking down `AttributeError` exceptions because you're referencing uninitialized (and therefore non-existent) attributes.


	In versions of Python prior to 2.2, you could not directly subclass built-in datatypes like strings, lists, and dictionaries. To compensate for this, Python comes with wrapper classes that mimic the behavior of these built-in datatypes: `UserString`, `UserList`, and `UserDict`. Using a combination of normal and special methods, the `UserDict` class does an excellent imitation of a dictionary. In Python 2.2 and later, you can inherit classes directly from built-in datatypes like `dict`. An example of this is given in the examples that come with this book, in `fileinfo_fromdict.py`.

5.6.1. Getting and Setting Items


	When accessing data attributes within a class, you need to qualify the attribute name: `self.attribute`. When calling other methods within a class, you need to qualify the method name: `self.method`.

5.7. Advanced Special Class Methods


	In Java, you determine whether two string variables reference the same physical memory location by using `str1 == str2`. This is called object identity, and it is written in Python as `str1 is str2`. To compare string values in Java, you would use `str1.equals(str2)`; in Python, you would use `str1 == str2`. Java programmers who have been taught to believe that the world is a better place because `==` in Java compares by identity instead of by value may have a difficult time adjusting to Python's lack of such “gotchas”.


	While other object-oriented languages only let you define the physical model of an object (“this object has a `GetLength` method”), Python's special class methods like `__len__` allow you to define the logical model of an object (“this object has a length”).

5.8. Introducing Class Attributes


	In Java, both static variables (called class attributes in Python) and instance variables (called data attributes in Python) are defined immediately after the class definition (one with the `static` keyword, one without). In Python, only class attributes can be defined here; data attributes are defined in the `__init__` method.


	There are no constants in Python. Everything can be changed if you try hard enough. This fits with one of the core principles of Python: bad behavior should be discouraged but not banned. If you really want to change the value of `None`, you can do it, but don't come running to me when your code is impossible to debug.

5.9. Private Functions


	In Python, all special methods (like `__setitem__`) and built-in attributes (like `__doc__`) follow a standard naming convention: they both start with and end with two underscores. Don't name your own methods and attributes this way, because it will only confuse you (and others) later.

Chapter 6. Exceptions and File Handling

6.1. Handling Exceptions


	Python uses `try...except` to handle exceptions and `raise` to generate them. Java and C++ use `try...catch` to handle exceptions, and `throw` to generate them.

6.5. Working with Directories


	Whenever possible, you should use the functions in `os` and `os.path` for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like `os.path.split` work on UNIX, Windows, Mac OS, and any other platform supported by Python.

Chapter 7. Regular Expressions

7.4. Using the {n,m} Syntax


	There is no way to programmatically determine that two regular expressions are equivalent. The best you can do is write a lot of test cases to make sure they behave the same way on all relevant inputs. You'll talk more about writing test cases later in this book.

Chapter 8. HTML Processing

8.2. Introducing sgmllib.py


	Python 2.0 had a bug where `SGMLParser` would not recognize declarations at all (`handle_decl` would never be called), which meant that `DOCTYPE`s were silently ignored. This is fixed in Python 2.1.


	In the ActivePython IDE on Windows, you can specify command line arguments in the “Run script” dialog. Separate multiple arguments with spaces.

8.4. Introducing BaseHTMLProcessor.py


	The HTML specification requires that all non-HTML (like client-side JavaScript) must be enclosed in HTML comments, but not all web pages do this properly (and all modern web browsers are forgiving if they don't). `BaseHTMLProcessor` is not forgiving; if script is improperly embedded, it will be parsed as if it were HTML. For instance, if the script contains less-than and equals signs, `SGMLParser` may incorrectly think that it has found tags and attributes. `SGMLParser` always converts tags and attribute names to lowercase, which may break the script, and `BaseHTMLProcessor` always encloses attribute values in double quotes (even if the original HTML document used single quotes or no quotes), which will certainly break the script. Always protect your client-side script within HTML comments.

8.5. locals and globals


	Python 2.2 introduced a subtle but important change that affects the namespace search order: nested scopes. In versions of Python prior to 2.2, when you reference a variable within a nested function or `lambda` function, Python will search for that variable in the current (nested or `lambda`) function's namespace, then in the module's namespace. Python 2.2 will search for the variable in the current (nested or `lambda`) function's namespace, then in the parent function's namespace, then in the module's namespace. Python 2.1 can work either way; by default, it works like Python 2.0, but you can add the following line of code at the top of your module to make your module work like Python 2.2: from __future__ import nested_scopes


	Using the `locals` and `globals` functions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors the functionality of the `getattr` function, which allows you to access arbitrary functions dynamically by providing the function name as a string.

8.6. Dictionary-based string formatting


	Using dictionary-based string formatting with `locals` is a convenient way of making complex string formatting expressions more readable, but it comes with a price. There is a slight performance hit in making the call to `locals`, since `locals` builds a copy of the local namespace.

Chapter 9. XML Processing

9.2. Packages


	A package is a directory with the special `__init__.py` file in it. The `__init__.py` file defines the attributes and methods of the package. It doesn't need to define anything; it can just be an empty file, but it has to exist. But if `__init__.py` doesn't exist, the directory is just a directory, not a package, and it can't be imported or contain modules or nested packages.

9.6. Accessing element attributes


	This section may be a little confusing, because of some overlapping terminology. Elements in an XML document have attributes, and Python objects also have attributes. When you parse an XML document, you get a bunch of Python objects that represent all the pieces of the XML document, and some of these Python objects represent attributes of the XML elements. But the (Python) objects that represent the (XML) attributes also have (Python) attributes, which are used to access various parts of the (XML) attribute that the object represents. I told you it was confusing. I am open to suggestions on how to distinguish these more clearly.


	Like a dictionary, attributes of an XML element have no ordering. Attributes may happen to be listed in a certain order in the original XML document, and the `Attr` objects may happen to be listed in a certain order when the XML document is parsed into Python objects, but these orders are arbitrary and should carry no special meaning. You should always access individual attributes by name, like the keys of a dictionary.

Chapter 10. Scripts and Streams

Chapter 11. HTTP Web Services

11.6. Handling Last-Modified and ETag


	In these examples, the HTTP server has supported both `Last-Modified` and `ETag` headers, but not all servers do. As a web services client, you should be prepared to support both, but you must code defensively in case a server only supports one or the other, or neither.

Chapter 12. SOAP Web Services

Chapter 13. Unit Testing

13.2. Diving in


	`unittest` is included with Python 2.1 and later. Python 2.0 users can download it from `pyunit.sourceforge.net`.

Chapter 14. Test-First Programming

14.3. roman.py, stage 3


	The most important thing that comprehensive unit testing can tell you is when to stop coding. When all the unit tests for a function pass, stop coding the function. When all the unit tests for an entire module pass, stop coding the module.

14.5. roman.py, stage 5

When all of your tests pass, stop coding.

Chapter 15. Refactoring

15.3. Refactoring


	Whenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call the methods on the pattern object directly.

Chapter 16. Functional Programming

16.2. Finding the path


	The pathnames and filenames you pass to `os.path.abspath` do not need to exist.


	`os.path.abspath` not only constructs full path names, it also normalizes them. That means that if you are in the `/usr/` directory, `os.path.abspath('bin/../local/bin')` will return `/usr/local/bin`. It normalizes the path by making it as simple as possible. If you just want to normalize a pathname like this without turning it into a full pathname, use `os.path.normpath` instead.


	Like the other functions in the `os` and `os.path` modules, `os.path.abspath` is cross-platform. Your results will look slightly different than my examples if you're running on Windows (which uses backslash as a path separator) or Mac OS (which uses colons), but they'll still work. That's the whole point of the `os` module.

Chapter 17. Dynamic functions

Chapter 18. Performance Tuning

18.2. Using the timeit Module


	You can use the `timeit` module on the command line to test an existing Python program, without modifying the code. See http://docs.python.org/lib/node396.html for documentation on the command-line flags.


	The `timeit` module only works if you already know what piece of code you need to optimize. If you have a larger Python program and don't know where your performance problems are, check out the `hotshot` module.

Dive Into Python

Appendix C. Tips and tricks