Refactoring

Or, Object-Oriented Program Algebra and the Art of Elegant Programming
— use program transformations (the algebraic equivalence laws of OOP) to produce cleaner, better code.

The "bible" of refactoring is Fowler's book, Refactoring (click on this link for access to the eBook for JHU students — there are limited licenses so it may not work when you try). It is a very well-written book which I would have made a required text for the course if it didn't cost $50.

Note that Fowler didn't come up with the idea of refactoring, he just popularized it.

Also see refactoring.guru for a very nice interactive description of the individual refactorings.

Refactoring by Example

We will spend a day refactoring the example following Refactoring Chapter 1, also showing how IntelliJ can be used to automate refactorings.

  • A modernized example of the video rental application he uses for his example can be downloaded here;
  • an example of the sources as they appear by the end of lecture may be downloaded here.

IntelliJ and Refactoring

  • IntelliJ (and some other IDEs) have a Refactor menu — use it!!
  • It will automatically perform many of Fowler's refactoring operations for you
  • It includes refactorings not mentioned by Fowler which are also very useful
  • The IDE automatically checks many of the preconditions that a refactoring needs in fact hold (some it can't easily check so its not airtight).
  • We will demo the IntelliJ refactoring menu in lecture.

Foundations of Refactoring

Why Refactor?

  • To improve the quality of code (or design)
  • .. which makes software easier to understand
  • .. and in turn helps in finding bugs
  • .. which overall allows you to complete a project more quickly and reliably.
How to refactor
  • Make many minimal changes in series, each one being fast and quickly testable.
  • The bulk of the refactoring book is a common of common small transformations
When to Refactor?
  1. In order to elegantly incorporate new functionality
    "First make the change easy (by doing a series of refactorings); then make the easy change."
  2. When you run into an anti-pattern aka bad smell (a topic we cover below)
  3. The rule of three: if you do three things the same time in an implementation, its time to refactor to avoid more repetition in the future
  4. Around a consistently buggy area: bugs are a sign that code is bad, and refactoring will improve that code
  5. At code review time in the project lifecycle
    (A code review is a meeting at which the current state of the code is reviewed in detail.)
Everyone has a different coding methodology that requires more or less refactoring.
  • Beethoven was a serious refactorer: he wrote, threw out, redid, patched etc his symphonies
  • Mozart wrote great musical "code" right off the top of his head.
Limitations of Refactoring
  • If you are using a external library, those can't be refactored, limiting the improvements that can be made. (Note however that with shared code you can refactor your team-mates code; this is one of the main strengths of having a shared codebase)
  • Interfaces that are published should change as little as possible, but refactoring often wants to change those interfaces. This is a fundamental tension in software engineering.
  • If the design is awful, its not worth trying to refactor—throw it out!

Testing and Refactoring

  • Testing and refactoring mutually support each other
  • If you don't have enough tests you could introduce bugs in the process of refactoring
  • No tests → afraid to refactor because of fear of bug introduction → little refactoring → codebase is bad
Here is the sequence of events to follow to test in the context of a refactoring:
  1. Make sure you have unit tests covering the code you are about to change
  2. Make sure you are 100% compliant with tests
  3. Perform one step of refactoring
  4. Re-run tests and fix any fails to get back to 100%

Fowler's library of refactorings

The bulk of the refactoring book is a library of useful refactorings; we will review the following ones in lecture (these are also the only ones we will ask about on HW). Links below are to the refactoring on refactoring.guru.

Code Smells aka anti-patterns

  • A code smell/anti-pattern is a pattern of coding that smells (and is) wrong.
  • See the Wikipedia anti-pattern list and Fowler's smells at refactoring.guru.

General Anti-Patterns

We will review some of the general anti-patterns from the above Wikipedia page.

  • Analysis Paralysis — going round and round on planning (overthinking it) without building anything.
  • Bike Shedding — wasting too much time on trivial aspects of an important project
  • Big Ball of Mud — code with poor separation of concerns, commonly produced by coders only putting out the next fire and not well-trained in good design. Beware, there is a MASS of mud out there.
  • Input Kludge — lack of validation of input to make sure it is proper.
  • BaseBean — don't inherit when relation is not is-a, delegate instead. Related to LSP and replace inheritance with delegation above.
  • Call Super — don't require subclass overrider to call super method, its a bad interface boundary. Note this one is contentious, its a trade-off.
  • Circle-Ellipse Problem — we covered this as square/rectange under the Liskov Substitution Principle (LSP).
  • Sequential Couping — methods with an implicit constraint on which one must be called first. Occasionally this is required, if so make the contract clear in API docs; or, code defensively (raise exception if violated).

Some of Fowler's bad smells with refactorings to fix

The names below link to the refactoring.guru page on the various smells.

  1. Duplicate Code AKA DRY
    — extract out the common bits into their own method (extract method) if code is in same class
    — if two classes duplicate code, consider extract class to create a new class to hold the shared functionality.
  2. Long Method
    extract method!
  3. Long Parameter List
    replace parameter with method call: don't pass data the receiver can get on its own accord — receiver explicitly asks for data itself. You may need to pass an object as argument that the callee can then fetch all the goodies from.
    OR: if the parameters are a natural, self-contained grouping, introduce parameter object.
    Example: day month, year, hour minute second parameters ==> date parameter
    — both of these changes are turning passive objects into more active objects, a Good Thing.
  4. Divergent Change (a.k.a. SRP violation)
    If you have a fixed class that does distinctly different things consider separating out the varying code into varying classes (extract class) that either subclass or are contained by the non-varying class. This smell is closely related to the Single Responsibility Principle.
  5. Shotgun Surgery (more details)
    The smell: adding a new feature requires very similar code to be added to a bunch of places, and copy/paste is likely involved.
    — try to move method and move field to put the common bits into one place so no copy/paste needed.
  6. Feature Envy
    Method in one class uses lots of pieces from another class.
    move method to move it to the other class.
  7. Data Clumps
    Data that's always hanging with each other (e.g. name/street/zip). If its clumping, its a concept
    extract class to make a new class just for that data. Will also help trim argument lists too since name street zip now passed as one address object. Closely related to long parameter list above.
  8. Switch (case) statements
    Yes, any switch can potentially smell, and especially instanceOf ones.
    — Use inheritance and polymorphism instead (replace conditional with polymorphism or replace type code with state/strategy)
    — we covered an example of this in Fowler Chapter 1; one of the more difficult refactorings
  9. Lazy Class
    Class doesn't seem to be doing anything. Get rid of it!
    • collapse heirarchy if a subclass is nearly vacuous.
    • inline class (stick the class' methods and fields in the class that was using it and get rid of original class).
  10. Speculative generality
    Class designed to do something in the future but never ends up doing it. Thinking too far ahead. Or you though you needed this generality but you didn't.
    — like above, collapse hierarchy or inline class
  11. Message chains
    Say you want to send a message to object D in class A but you have to go through B to get C and C to get D.
    — In other words, you were writing something like a.getB().getC().getD().messageToD()
    — These are strong violations of object encapsulation / active objects philosophy
    — find a sensible spot in the middle where a logical interface can be inserted.
    — e.g. use hide delegate to hide C and D in B, and add a method to B that does what A wanted to do with D: a.getB().letDKnowThisMessage().
  12. Inappropriate Intimacy (aka Object Orgy)
    directly getting in and munging with the internals of another class.
    — Signifies lack of good interface abstractions (recall design principle, "code to interfaces not implementations")
    — To fix this, move methods, etc, to consolidate the intimate bits.
  13. Data Class aka god class aka Data-centric design
    We have already covered this in the design princinples lecture: in data-centric design, there are some data classes which are pretty much structs: no interesting methods, and one god class that is like the C main program with all the functions in it.
    — look who uses the data and how they use it and move some of that code to the data class via a combination of extract method and move method
    — the Fowler chapter 1 example had several examples of this since it was initially a completely data-centric design
  14. Comments
    Comments in the middle of methods are often deodorant. Refactor so each comment block is its own method — extract method.