Wednesday, November 24, 2010

Can Business People Read Your Code?

Yesterday, I wrote a blog post on how I refactored tests in Ruby to follow the "Data Generated Specs" pattern, making it easy to add specs and read the input and expected output values for the different cases.

The surprising effect for that is being able to review the test cases with a non-technical client, and then the client liking the format so much and requesting a copy of the specs.

I had another similar incident last week that also pleasantly surprised me.

At one point, the client asked me about a detail in the business calculation to verify that I am handling it correctly, and I indicated that I followed the math formula she gave me. Then, to confirm, I had the client come over and look at my implementation in Ruby (changed a bit for IP protection):

copay = benefit.includes_deductible? ? (value - deductible) : value
12*rate + deductible + copay

I made sure to explain the turnery operator while going over the implementation, and the client seemed to get it then cheerfully exclaim: "Great! Looks like you got it exactly the way I wrote it."

I was very happy to hear that, and I wondered if this only became possible because I followed good clean code software development techniques such as TDD and refactoring. I mean, had I not refactored my code to extract the essence of the calculation in one tiny method with less than 5 lines of code, I would have probably confused the client by showing her too much code.

How cleanly is your code written on average? Can business people read your code?

Tuesday, November 23, 2010

Testing Pattern: Data Generated Specs

Just thought of sharing this DRY Ruby testing pattern that facilitates generating a lot of similar specs with different data.

Recently, I wrote specs for a parsing engine that handled each test case with a context describing how the different attribute on a model (Benefit) would be evaluated:

context "Single: $10,000 Group: $20,000" do
describe "single_value" do
it "returns 10000 for Single: $10,000 Group: $20,000" do
benefit = Benefit.new(:value => "Single: $10,000 Group: $20,000")
benefit.single_value.should == 10000
end
end
describe "group_value" do
it "returns 10000 for Single: $10,000 Group: $20,000" do
benefit = Benefit.new(:value => "Single: $10,000 Group: $20,000")
benefit.group_value.should == 20000
end
end
describe "single_value_includes_deductible?" do
it "returns 10000 for Single: $10,000 Group: $20,000" do
benefit = Benefit.new(:value => "Single: $10,000 Group: $20,000")
benefit.single_value_includes_deductible?.should be_true
end
end
describe "group_value_includes_deductible?" do
it "returns 10000 for Single: $10,000 Group: $20,000" do
benefit = Benefit.new(:value => "Single: $10,000 Group: $20,000")
benefit.group_value_includes_deductible?.should be_true
end
end
describe "group_calculation_algorithm" do
it "returns 10000 for Single: $10,000 Group: $20,000" do
benefit = Benefit.new(:value => "Single: $10,000 Group: $20,000")
benefit.group_calculation_algorithm.should == :cap
end
end
end

But, I kept getting more and more business rules for parsing the benefit string values, and I had to add a context block like the one above for every case, so you can imagine how this got out of control very fast and had a ridiculous amount of redundancy:

context "$10,000" do
describe "single_value" do
it "returns 10000 for $10,000" do
benefit = Benefit.new(:value => "Single: $10,000")
benefit.single_value.should == 10000
end
end
describe "group_value" do
it "returns 10000 for $10,000" do
benefit = Benefit.new(:value => "Single: $10,000")
benefit.group_value.should == 10000
end
end
describe "single_value_includes_deductible?" do
it "returns 10000 for $10,000" do
benefit = Benefit.new(:value => "$10,000")
benefit.single_value_includes_deductible?.should be_true
end
end
describe "group_value_includes_deductible?" do
it "returns 10000 for $10,000" do
benefit = Benefit.new(:value => "$10,000")
benefit.group_value_includes_deductible?.should be_true
end
end
describe "group_calculation_algorithm" do
it "returns 10000 for $10,000" do
benefit = Benefit.new(:value => "$10,000")
benefit.group_calculation_algorithm.should == :as_is
end
end
end

context "Single: $10,000 per Member" do
describe "single_value" do
it "returns 10000 for Single: $10,000 per Member" do
benefit = Benefit.new(:value => "Single: $10,000 per Member")
benefit.single_value.should == 10000
end
end
describe "group_value" do
it "returns 10000 for Single: $10,000 per Member" do
benefit = Benefit.new(:value => "Single: $10,000 per Member")
benefit.group_value.should == nil
end
end
describe "single_value_includes_deductible?" do
it "returns 10000 for Single: $10,000 per Member" do
benefit = Benefit.new(:value => "Single: $10,000 per Member")
benefit.single_value_includes_deductible?.should be_true
end
end
describe "group_value_includes_deductible?" do
it "returns 10000 for Single: $10,000 per Member" do
benefit = Benefit.new(:value => "Single: $10,000 per Member")
benefit.group_value_includes_deductible?.should be_true
end
end
describe "group_calculation_algorithm" do
it "returns 10000 for Single: $10,000 per Member" do
benefit = Benefit.new(:value => "Single: $10,000 per Member")
benefit.group_calculation_algorithm.should == :per_person
end
end
end

Finally, I refactored the specs applying what I am calling the "Data Generated Specs" pattern, by writing only one spec as a prototype and feeding in the input and expected output data dynamically through a hash:

{
"Single: $10,000 Group: $20,000" => {
"single_value" => 10000,
"group_value" => 20000,
"single_value_includes_deductible?" => true,
"group_value_includes_deductible?" => true,
"group_calculation_algorithm" => :cap,
},
}.each do |input_value, expected_value|
expected_value.keys.each do |attribute|
describe attribute do
it "returns #{expected_value[attribute]} for #{input_value}" do
benefit = Benefit.new(:value => input_value)
benefit.send(attribute).should == expected_value[attribute]
end
end
end
end

Each test case became represented with a hash key/value block instead of a 30-line code context block, making it much easier to add test cases:

{
"Single: $10,000 Group: $20,000" => {
"single_value" => 10000,
"group_value" => 20000,
"single_value_includes_deductible?" => true,
"group_value_includes_deductible?" => true,
"group_calculation_algorithm" => :cap,
},
"$10,500" => {
"single_value" => 10500,
"group_value" => 10500,
"single_value_includes_deductible?" => true,
"group_value_includes_deductible?" => true,
"group_calculation_algorithm" => :as_is,
},
"Single: $10,500 per Member" => {
"single_value" => 10500,
"group_value" => nil,
"single_value_includes_deductible?" => true,
"group_value_includes_deductible?" => true,
"group_calculation_algorithm" => :per_person,
},
"Single: $10,500 per Member (Deductible not included)" => {
"single_value" => 10500,
"group_value" => nil,
"single_value_includes_deductible?" => false,
"group_value_includes_deductible?" => false,
"group_calculation_algorithm" => :per_person,
},
"See brochure for details" => {
"single_value" => nil,
"group_value" => nil,
"single_value_includes_deductible?" => false,
"group_value_includes_deductible?" => false,
"group_calculation_algorithm" => nil,
},
}.each do |input_value, expected_value|
expected_value.keys.each do |attribute|
describe attribute do
it "returns #{expected_value[attribute]} for #{input_value}" do
benefit = Benefit.new(:value => input_value)
benefit.send(attribute).should == expected_value[attribute]
end
end
end
end

Since specs are generated with their titles, when they fail, you get a nice clear description for the failure that includes the attribute name as well as the input and output values:

1) Benefit attribute single_value returns 0 for Single: $10,500 per Member
Failure/Error: benefit.send(attribute).should == expected_value[attribute]
expected: 10500,
got: 0 (using ==)
# ./spec/models/benefit_spec.rb:464

I generated about 1000+ tests with this technique. While reviewing some of the cases with a non-technical client, she liked reading the spec data so much that she requested a copy for herself to review and validate against the business rules. Specs as acceptance tests FTW!

Monday, November 22, 2010

MagicRuby: Whatever Happened to Desktop Development in Ruby?

I will be giving a talk at the upcoming MagicRuby conference, which begins on Feb 4, 2011.

Title:

Whatever Happened to Desktop Development in Ruby?

Abstract:

While web development is thriving in the Ruby world with Rails, Sinatra, and other frameworks, desktop development is still not very common as a lot of developers rely on Java technologies like Eclipse or straight .NET technologies such as Windows Forms.

This talk will walk attendees through some Ruby desktop development frameworks/libraries, contrasting the pros and cons of each, and mentioning what is missing that discourages developers from relying on Ruby to build desktop applications.

Frameworks/libraries covered will include MacRuby, Shoes, Limelight, and Glimmer.

Attendees will walk out of the session with rudimentary knowledge on desktop development in Ruby as well as an idea of what to expect in the future.

Friday, November 19, 2010

Should You Work Hard or Smart?

About five years ago, a senior developer I met recounted to me stories about how he aced interviews:
"I always like to tell interviewers that some people work hard, and some people work smart, but I like to work HARD AND SMART!"

It was certainly a nice sounding line that had a good ring to it, but is there such a thing as working hard and smart? Or does working hard automatically imply that you're not working as smart as you could be? Wouldn't you be working less hard if you had smarter ways of accomplishing your tasks?

You are welcome to share your opinion in comments, but here is my perspective on the matter.

It certainly depends on the definition of what is "hard" and what is "smart".

If "hard" meant working 50-hour+ weeks, then on average, people get tired after say 8 or 9 hours of continuous work on a day, and their thinking capacity diminishes as a result, resulting in less "smart" work than say at the beginning of the day. So, in that case, working "hard" affects people's ability to work "smart" and the two do not quite go hand in hand.

If "hard" meant working 35-40-hour weeks with extreme concentration, taking the rest of the days off to let the brain detangle itself and get ready for the next day, then on average, this facilitates performing work that is as "smart" as possible per people's thinking capacities. So, in that case, "hard" and "smart" do go together though people who work 60-hour weeks would not consider that "hard" enough, so it ends up just being "smart", and who doesn't like that?i

Now in the software development world, if some developers find themselves working 60-hour weeks to meet a 3-month deadline for a business project, then they certainly are working "hard", but are they working as "smart" as they could be, or should they be ditching this old unproductive framework/library they are relying on and move on to a smarter technology/programming language that offers more productivity? After all, not only will that save them from over-extending themselves, yet also allow their work to be higher quality since it will be done in the majority of hours when their thinking capacity is near its fullest on average.

Doing 12+ hour days is not necessarily bad every once in a while, especially when the developer has a sudden burst of creativity and motivation. However, if done regularly, it is important to be aware of the quality of work coming out of the long hours as sometimes it may give the illusion of accomplishment when the work is actually taking a lot longer to finish with a tired less concentrated mind, and could have been done better when rested.

Wednesday, November 17, 2010

Pain Driven Development

One of the key things I learned from XP and the Agile movement in general is programming for today's requirements not tomorrow's predictions. And, every time you start writing implementation for requirements that may become valid in the future, the Agile folks would shout "YAGNI" (You Aren't Gonna Need It). Applied to code architecture and design, Ward Cunningham summarizes this philosophy nicely with his famous quote "What's the simplest thing that could possibly work?".

But, what happens when today's implementation no longer fulfills today's requirements? In other words, what happens when tomorrow becomes today and requirements grow or change? One example is when 100,000 more users are added to the system, making performance requirements much greater. Another example is when supporting one state is not enough anymore, and the business is now expanding nationally to cover all 50 states.

That is where awareness of pain comes into play. I wrote a blog post about sensitivity to pain a few years back that talks about pain and pleasure when it comes to writing and maintaining code. Developing that awareness of pain is highly important in detecting when to update today's implementation with a higher level of complexity that addresses today's requirements.

Though people have different levels of tolerance to pain, it is a gift that they can feel it as it is often what pushes them toward action. And, in the case of software development, it can point out when today's implementation no longer serves today's requirements and needs to be revised either with a higher level of complexity, or sometimes with a lower level when some requirements are no longer needed.

When I first heard of the YAGNI principle, I remember shuddering a bit and thinking "Isn't it kind of dumb to write code that I will revise in the future when I have to support more states when I could have added in multiple-state support to begin with?"

Well, unfortunately, my thinking was shallow in certain ways. While the argument is logical at one level since following flexible design practices seems to make it easier to handle some future needs, it is much less trivial if I dig a level deeper and include more variables such as whether these future needs ever materialize in the next 2 years, or how much stepping around I am doing while adding new features, mostly because of complexity in code implemented for predicted needs that are not yet valid.

And experience only confirms the concerns I raised above and shows that keeping the code as simple as possible, only addressing today's known business needs, seems to make it easiest to maintain the code and add more features as more needs come up. That is because the code always remains as simple as possible, yet adjusted in complexity only as pain is felt day-to-day.

One example of this that I recently encountered is writing a web feature that relied on data from a web service. At first, the simplest thing that could possibly work was to have it request data from the service synchronously as users hit the site. Later, as requests for data got more complex and time-consuming to fulfill from the service, the implementation became painful to deal with as far as performance, so background caching of service data was added. That is a very good example of what I like to call "Pain Driven Development" :)

Tuesday, November 16, 2010

Faking Paperclip S3 calls with Fakeweb

I recently wrote Cucumber acceptance tests for a project feature that involved uploading images to Amazon S3 via the Ruby Paperclip library. At first, I had them actually hit S3 for real to drive the implementation correctly without any mocking. Over time though, the tests became really slow and fragile because of their dependency on the web service, so I used Fakeweb to fake the upload requests to S3.

Here is the Cucumber step I wrote for that:

When /^(?:|I )attach the image "([^\"]*)" to "([^\"]*)" on S3$/ do |file_path, field|
definition = Image.attachment_definitions[:attachment]
path = "http://s3.amazonaws.com/#{definition[:bucket]}/#{definition[:path]}"
path.gsub!(':filename', File.basename(file_path))
path.gsub!(/:([^\/\.]+)/) do |match|
"([^\/\.]+)"
end
FakeWeb.register_uri(:put, Regexp.new(path), :body => "OK")
When "I attach the file \"#{file_path}\" to \"#{field}\""
end

That prepares the environment for receiving an S3 request for upload, so when the test reaches the Cucumber step for uploading the image (And I press "Upload") the environment can receive the request from the Paperclip-enhanced class (Image) and fake a response for it.

To make that step work, make sure to configure your project with the "fakeweb" gem.

Here is how Paperclip was configured in the Image class:

class Image < Attachment
validates_attachment_content_type :attachment, :content_type => ["image/jpg", "image/jpeg", "image/png", "image/gif"], :if => :attachment_file_name
has_attached_file :attachment,
:storage => :s3,
:styles => {
:medium => "300x300>"
},
:s3_credentials => "#{RAILS_ROOT}/config/environments/#{RAILS_ENV}/amazon_s3.yml",
:path => "#{RAILS_ENV}/images/:id/:style/:filename",
:bucket => "bucketname",
:url => ':s3_domain_url',
:whiny => false
end


Fakeweb ended up cutting about 30 seconds off the time to run all Cucumber tests. Good deal!

Monday, November 15, 2010

What Continuous Integration Is Really About

Recently, I have been encountering a number of environments where developers work in multiple branches and do not integrate their code till the end of the iteration. They end up often spending hours fighting to merge the code in correctly, sometimes resulting in bugs or missed features.

When I see that, I cannot help but remember the pains of integrations in 6-month-long Waterfall projects. I was a junior developer at an environment in the past where developers spent 6 months implementing features in isolation of each other, and then only integrating right before the project deadline. As a result, they would run into enormous integration issues and spend 3 additional months fixing all of them before finally delivering.

Now, developers who integrate at the end of the iteration often end up with a similar result. They miss the deadline sometimes by a day or more, and end up with issues bleeding into the next iteration (e.g. missing features due to bad merge).

When I encounter such environments, and hear that developers branch out at the beginning of every iteration before developing their own features, I shudder and point out that they are not following the Agile practice of Continuous Integration. They immediately shoot back saying something like "We have cruise control setup" or "We do not have the resources to setup a CI server", which only reveals ignorance about what Continuous Integration is really about. What I was actually saying is they are not integrating continuously into one common branch, and thus not resolving integration conflicts on an hourly or daily basis, yet letting them accumulate till the end of the iteration causing an integration snowball effect.

It is an unfortunate matter of human nature to be lazy at acquiring knowledge. You always want the least amount of learning to get you to where you want to go, so often people fail to dig deeper than what they hear and miss out on the deepest essence of what they are learning. For example, a lot of developers learning MVC from frameworks like Struts or earlier editions of Rails know just enough of MVC to get by, but never spent time digging into the true essence of MVC from Smalltalk Applications (or desktop development in general), and thus fail to apply it correctly. You end up with bloated controllers, instead of splitting most of the non-control behavior into Models. In the same token, a lot of developers who hear of Continuous Integration from the marketing lingo of CI servers think that is what Continuous Integration is all about.

Here is how Martin Fowler describes Continuous Integration:
Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.


Notice how the primary emphasis is on members integrating their work frequently; at least daily if not multiple times a day. Also, see how having the automated build is secondary and only there to support the primary goal. So, when developers work in their own branches and do not integrate till the end of their iteration, they are not fulfilling the primary goal of resolving conflicts often before they get big and hard to resolve, and having a CI server does not make them a team that is properly doing Continuous Integration. While a CI server certainly helps them when they integrate at the end of the iteration, they still have to deal with bigger integration issues than if they were integrating daily if not hourly.

Now, working in branches certainly has its place. It is useful when doing a spike, building an experimental feature, performing big architectural changes, or even working on a separate release all together that would not go out till a few months later. Of course, in the case of a separate release, the code would probably not get merged back into master and can be thought of as a separate project (even if it branched off the original project's code base). And, in the case of big architectural changes, it is preferred if possible to have them done in small slices within iterations, and only relying on a branch as a last resort.

Local branches in source code control systems like Git and Mercurial have their place too. You can perform work in a local branch every day if you like as long as you integrate it back to the main branch at the end of the day or every few hours. Used that way, it would still be in line with the practice of Continuous Integration.

Takeaway?

Integrate early and often on the same branch (daily/hourly) and you will leverage the benefits of Continuous Integration on your Agile project by delivering more on time and avoiding big merge/conflict issues.

Tuesday, November 02, 2010