Split Notation ============== Identifier naming convention for object-oriented software development Developed by Jeremy Kelly www.anthemion.org Version 5.0 December 7, 2016 This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. ------------ Introduction ------------ Split Notation is an identifier naming convention for object-oriented software development. It was designed originally for use with C++, but is largely applicable to other object-oriented languages. Its purpose is twofold: 1) To make identifiers easier to generate and easier to read by promoting a 'separation of concerns' within them; 2) To avoid common mistakes by documenting language features with unusual side effects. The notation was previously known as 'Iowan Notation'. Identifier generation --------------------- Software development requires careful management of numerous details from varied domains: business concerns must be modeled, the platform must be usefully abstracted, and, at the lowest level, the programming language and its particular way of representing and transforming data must be carefully exploited. To best aid the developer, identifiers must document these concerns while remaining concise and legible. A program that fails to meet this standard fails necessarily to represent the problem clearly, and errors are certain to result. Identifiers should also be easy to generate, and ideally, predictable, such that different developers can hope to produce the same name for a given element. Parts of this problem are too difficult to solve in a general way: business concerns and the data structures and algorithms that represent them vary too much to be categorized from afar. One concern is well-suited to categorization, however: language features, which are clearly defined and relatively few in number. Split Notation honors this distinction by enforcing a 'separation of concerns' within identifiers. It labels important language features with a range of one-character prefixes, leaving the identifier root, which follows the prefixes, to document high-level concepts without concern for technical detail. Consider a database application. The 'table' concept can be generally identified by the root 'Tbl'. Throughout the program, however, this concept may be used and referenced in many distinct ways. Applying some of the more common prefixes: Identifier Referent ---------- -------- tTbl Table type oTbl Local table instance gpTbl Global pointer to a table instance esTbl Class-private static table instance arTbl Function parameter passing a non-const reference to a table instance As shown, Split Notation allows a single root to produce a range of identifiers that are both concise and descriptive, distinct, yet obviously related. Details are documented without obscuring commonalities. The result is semantically dense but highly legible code. Feature documentation --------------------- Static analysis tools like 'lint' help developers write reliable code. Some mistakes defy such analysis, however, as when technically correct code generates unwanted side-effects. The detection of these mistakes cannot be automated without needlessly flagging correct code. It can be facilitated, however, by labeling unusual properties. In Split Notation: 1) References, static objects, and union members are marked to show that changes to such objects affect other objects or scopes; 2) Virtual functions are marked to show that their behavior may change when invoked by subclasses; 3) Macros are marked to warn against side-effects from duplicated parameter expressions. By documenting scopes, the notation provides an additional safeguard. Because the same concepts are referenced repeatedly throughout a program, it is easy to hide or 'shadow' names in wider scopes when declaring entities in narrower ones. This effect is difficult to spot, and can cause significant confusion. By generating distinct identifiers in distinct scopes, the notation avoids most of these conflicts automatically. Other considerations -------------------- Unlike other notations, Split Notation does not document specific types. There are several reasons for this: 1) Compilers flag most type-safety violations, so there is little need for help in this area; 2) A prefix list cannot enumerate non-fundamental types, which are innumerable, and of which it is necessarily ignorant; 3) Types often change in the course of development, and many such changes — replacing a fundamental type with a comparable larger type, replacing a class with another implementing the same interface — can be made to well-written code without detailed review. Requiring identifier changes in such cases would create unnecessary work for the developer. Though types are not explicitly documented, the identifier root strongly hints at an object's type by describing its role. ------------ The Notation ------------ In Split Notation, identifiers begin with zero or more lowercase prefix characters, one for each of the criteria below: Prefix Criterion ------ --------- z (none) g Global element y Element with internal linkage c Protected class member e Private class member a Function parameter o Local element t Type x Template parameter f Interface m Macro n Namespace s Static class member or local static object v Virtual function u Union member r Non-const reference q Object reference p Data pointer d Function pointer or delegate b Managing object h Handle i Iterator Non-static class-public objects and functions meet no criterion, and their identifiers are not prefixed. Other identifiers are prefixed at least once. Only qualities inherent to the identified entity are labeled; pointers to pointers, for instance, are prefixed with one 'p', not two. When multiple prefixes apply, they are affixed in the order specified above, with the exception of 'z', which may be placed anywhere within the prefix. The identifier root follows the prefixes; it includes whatever text most succinctly describes the concept being represented. The first letter of every word within the root is capitalized, as in 'CamelCase': string oNameLast; It is acceptable and occasionally desirable to omit the root, producing an identifier of prefixes only, as in the loop index below: for (int o = 0; o < eFlds.Ct(); ++o) cout << eFlds[o].Name() << endl; Prefix reference ---------------- 'z': (no criterion) ··················· The 'z' prefix has no set meaning. It may be used to resolve name collisions with reserved words or third-party code, or for any other reason. 't': Type ········· The 't' prefix applies to user-defined types and typedefs. It allows the same root to be shared by a type and an instance of that type: tTbl oTbl; It does not apply to template type parameters, for which the 'x' prefix is used instead. 'x': Template parameter ······················· The 'x' prefix applies to template parameters, both type and non-type. By distinguishing template parameters, it clarifies template design: template struct tPt { xNum X, Y; }; 'f': Interface ·············· In languages that support them, the 'f' prefix applies to interfaces. It allows the same root to be shared by an interface and a type or instance implementing that interface. 'm': Macro ·········· The 'm' prefix applies to macros. It warns against side-effects from duplicated parameter expressions, and other preprocessor oddities: // Probably a bad idea: int oMin = mMin(++oX, ++oY); 'n': Namespace ·············· The 'n' prefix applies to namespaces. It helps distinguish namespaces from nested types: // tLog is a type: tLog::tLine oLine; // nStr is a namespace: oText = nStr::gTrim(oLine.Text()); 'g': Global element ··················· The 'g' prefix applies to global objects and functions, and optionally, global types and interfaces. It also applies to global enumerators, which are essentially global constants: enum tTurn { gTurnLeft, gTurnRight }; In C#, the prefix can be omitted from enumerators, since these are qualified with the type name when used. 'y': Element with internal linkage ·································· The 'y' prefix applies to objects, functions, and enumerators — and optionally types and interfaces — with internal linkage (see http://www.anthemion.org/cpp_notes.html#internal_linkage): static int yIDCurr = 0; void tView::Del(int aID) { if (aID == yIDCurr) ... 'c': Protected class member ··························· The 'c' prefix applies to class-protected objects and functions, and optionally, protected types and interfaces. Outside of C#, it also applies to protected enumerators, which are essentially protected constants. 'e': Private class member ························· The 'e' prefix applies to class-private objects and functions, and optionally, private types and interfaces. Outside of C#, it also applies to private enumerators, which are essentially private constants. Distinguishing protected from private entities is helpful when implementing parent classes. 'a': Function parameter ······················· The 'a' prefix applies to function parameters. Though they are essentially local objects, it is useful to distinguish them because modifications to reference parameters change data outside the local scope. This prefix does not apply to macro parameters, as these are not necessarily objects, and macros do not define scopes. 'o': Local element ·················· The 'o' prefix applies to local objects. In languages that support them, it also applies to local functions, and optionally, types. 's': Static class member or local static object ··············································· The 's' prefix applies to static class members and local static objects. It warns that changes to such objects have effects outside the current invocation or instance. It also warns against static initialization and deinitialization (see http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12) order fiascos. The prefix does not apply to elements declared 'static' to create internal linkage. 'v': Virtual function ····················· The 'v' prefix applies to virtual class functions. It warns that a function's behavior may change in subclasses, and prevents virtual functions from being unknowingly called within constructors (see http://www.parashift.com/c++-faq-lite/strange-inheritance.html#faq-23.5). The prefix obviously cannot be applied to virtual destructors. 'u': Union member ················· The 'u' prefix applies to union members. It warns that changes to such objects overwrite other parts of the union. 'r': Non-const reference ························ The 'r' prefix applies to non-const references. It shows that modifications to such objects have effects outside the current scope. Because they cannot be modified, const references do not receive this prefix. In C#, ref and out parameters include this prefix. 'Class references' or 'object references' — which are entirely distinct from C++ references — are prefixed with 'q' instead. 'q': Object reference ····················· In languages that support them, the 'q' prefix applies to 'object references' and object-referenced types. It shows that changes to a referenced instance persist outside the current scope. In C#, it also distinguishes classes, which use the prefix, from structures, which do not. This guards against unwanted copying and boxing. Note that 'references' in C++ differ fundamentally from the 'object references' found in C#, Java, and Delphi. C# does offer C++-like references in the form of ref and out parameters, however. 'p': Data pointer ················· The 'p' prefix applies to data pointers and data pointer types: typedef tFld* tpFld; tpFld opFld = 0; 'd': Function pointer or delegate ································· The 'd' prefix applies to function pointers and function pointer types, including those referencing class functions. In languages that support them, it also applies to delegates. Distinguishing function pointers from data pointers allows the same root to be used by identifiers of both types: typedef tTbl* (* tdTbl)(const string&); tdTbl odTbl = &eTblFromFile; tTbl* opTbl = odTbl("Cust"); 'b': Managing object ···················· The 'b' prefix applies to objects (like auto pointers) that are used to manage other objects. Distinguishing objects from managing objects allows the same root to be used by both varieties of element. It also avoids confusion about the element's type: tAuto obPart(new tPart); // Invokes tAuto::Ck: bool oCkAlloc = obPart.Ck(); // Invokes tPart::Ck: bool oCkPart = obPart->Ck(); 'h': Handle ··········· The 'h' prefix applies to handles and handle types. Though not a language feature, handles are a common means of address, and labeling them allows the same root to be used when a concept is referenced in distinct ways: typedef int thRec; tRec oRec(oTbl.First()); thRec ohRec = oRec.h(); 'i': Iterator ············· The 'i' prefix applies to iterators and iterator types. Though not a language feature, iterators are a common means of address, and labeling them allows the same root to be used when a concept is referenced in distinct ways: tiRec oiRec(oTbl.First()); for (; !oiRec.EOT(); ++oiRec) cout << *oiRec << endl; ------------ Design notes ------------ Scope prefixes and types ------------------------ Some versions of this notation have applied scope prefixes, like 'g', 'c', and 'e', to types and interfaces. Such elements can produce the same name-hiding problems that afflict objects and functions, so it makes sense to label their scopes. On the other hand, types are defined less often than instances, somewhat limiting the chance of a type name collision. Longer prefix strings are also less legible. The decision ultimately may be better left to the programmer. Those who define many nested types may prefer to document type scopes. Those who do not may prefer to prefix types with 't' alone. Scope prefixes and overrides ---------------------------- It is possible to change the access level of a virtual function when overriding it, rendering the scope prefix assigned to that function invalid. There is no way for a naming convention to account for this, however, and the practice is questionable from a design perspective, so it seems best simply to avoid it. Macro parameters ---------------- It would occasionally be helpful to label macro parameters, but the 'a' prefix would be misleading here, and it seems wasteful to dedicate a new prefix to this concern. To prevent name collisions, macro parameters may be prefixed with 'z'. Return values ------------- In earlier versions of the notation, functions were labeled with 'r' or 'p' if they returned non-const references or pointers. This clarified the effect of modifying such return values, but it made function identifiers somewhat difficult to interpret. The current version is simpler, and even without help, it seems unlikely that return value modification could cause much confusion. If the value is used right away, the effect is obvious: IDNext() = oID; Conversely, if the value is stored for later use, the object receiving the return value must bear the appropriate prefixes: int* opID = IDNext(); *opID = oID; Cleanup obligations ------------------- Most resource management can and should be handled with RAII. This approach is sometimes impractical, however, and cannot be implemented in a meaningful sense within C# or Java. It might be useful, therefore, to label types and functions that incur cleanup obligations. This would go somewhat beyond documenting 'language features', however, and prefix strings in C# seem already too long. For now, at least, cleanup obligations remain undocumented.