SOA, Java, Software Development: project management

Showing posts with label project management. Show all posts

Thursday, January 17, 2013

SOA Software Guidelines: Service & Data Design, Strategy/Governance

This is a part of the blog series about (SOA) software guidelines. For the complete list of the guidelines (i.a. about design, security, performance, operations, database, coding, versioning) please refer to: http://soa-java.blogspot.nl/2012/09/soa-software-development-guidelines.html

Service Design

Strategy

Use mature technology, become late adopter, follow the major stream/use popular tools/technology. Cutting edge products usually is not as reliable as stable product. Avoid using version 0.x product. Well accepted products usually have better user supports group (blogs, discussion groups etc). Vendors of successful products usually have been growth to big enough to have a resourceful customer supports.
Use well-accepted standard solutions (e.g. OAuth, OpenID, WS-Security, SAML, WS-Policy) for better operability & security, don't reinvent new wheels. Standard solutions are usually resulted from cooperation involving many developers so the design and implementation are better tested by the community. For sensitive topics such as security or distributed transactions, it's difficult & more risky to build a bullet proof code by your own.
In general, avoid premature optimization: so build a prototype fast, then do performance tests, only redesign if the performance doesn't meet SLA. But if the risk-profile of the project deducts that the performance is very critical (e.g. options trading), you need to include optimization since earlier design (e.g. multi threading).
Validate architecture & design early with prototyping and (performance) tests. Know the cost of specific design choices / features, be prepare to cut features / rework areas that do not meet specifications.
Don't be cheap. Think broader in term of ROI: don't be cheap but sacrificing long term scalability, maintenance, productivity and reliability. Man-hours cost much more than hardware, better to buy reliable high-performance hardware that will facilitate your team to work faster and less problematic. Buy better gateway/firewall that can (intelligently) reject DOS attacks and easy to configure. For your developers, buy "state of the art" PCs with 2 monitors and abundant RAM & storage so they can work smoothly.
Don't assume out of the box solution (e.g. my framework will take care everything including my security/scalability/transaction issues), test/verify the new bounds of operation (e.g. verify if the framework really secure your application) and verify the effects to other quality aspects (e.g. the addition of a new framework improve security but now hurt the performance under SLA).
In general scale up (buying better hardware) is easier than scale out (distribute workloads by adding more servers), so try scale up first whenever possible. Be specific about which resource to be scaled up (e.g. CPU, memory, networks). Scale out also adds some problems such as synchronization between servers, distributed transaction, how to define (horizontal) data partition, difficult recovery/fault tolerant implementation.
Optimize the application / query & database design first (looks for problems such as locking problems, missing database index, Cartesian product query) before scale up / scale out or dig in to database tuning. See "Where to prioritize your effort regarding software performance": http://soa-java.blogspot.nl/2012/10/where-to-focus-your-effort-re-g-arding.html
Use separate environments: sandbox/playground, development, test, production.
Reuse design/configurations: if you have a successful design/configuration use it again in another places. Reuse means fewer permutations of configurations thus easier to manage / learn. It's also more robust since it has fewer things to get wrong and has already be tested.
Scope your requirements. One solution for one problem, avoid trying to build application capable of all things for all people. Don't try to anticipate every problems in future since it's difficult to get accurate assumptions about the future.
Design is a combination of science and art. Make decision based on facts, if the variables are not known use educated guess (e.g. based on historical usage data).
Design metrics / measurable objectives to validate your design parameters (e.g. performance throughput Mbps, conversion rate of our web-shop, etc.)
Recognize and remove design contradictions as early as possible if possible. Write the trade off decisions and their implications (e.g. security vs performance).
Beware of firewalls overused: adding frustration & time lost during development, test en production. Lead to unnecessary complex solutions (workarounds via Bastion host) or even render the functional requirement impossible. You might leave low value static contents (e.g. css, static images) without firewall.
Develop in-house vs buying a complex out of the box: http://soa-java.blogspot.nl/2012/10/in-house-vs-buying-complex-out-of-box.html

Principles of Service Orientation (http://www.soaprinciples.com):

· Standardized Contract (e.g. wsdl, xsd schema, ws-policy). Advantages: interoperability, reduce transformation due to consistent data model, self documented (the purpose, capabilities e.g. Qos, message contents)

· Loose Coupling

o Advantages: maintainability, independent (the interfaces can evolve over time with minimum impact to each other), scalable (e.g. easier for physical partitions/clustering)

o Characteristics: decouple service contract/interface from implementation/technology details, asynchronous message based (e.g. JMS) instead of synchronous RPC based, separation of logical layers.

· Service Abstraction: separate service contract/interface from implementation, hide technology & business logic details.

· Reusability

o Advantages: faster and cheaper for realization of new services

o Characteristics: agnostic service, generic & extensible contract. Avoid any messages, operations, or logic that are consumer or technological specific, reuse services (e.g. for compositions), reuse components from other projects (e.g. xsd/common data model, wsdl, operations, queues)

o How do you anticipate this service being reused? How can modifications be minimized?

· Service autonomy (has a high level of control over its runtime environment). Advantages: increase reliability, performance, predictability

· Statelessness. Advantages: scalability, performance, more reusability due to agnostic / less affinity.

· Discoverability (e.g. using UDDI)

· Composability (e.g. using BPEL)

Design principles

Where are the tightest couplings/dependencies with other services, other systems, etc?
Use layered design (e.g. presentation layer, business logic layer) for cohesion, maintainability & scalability.
What patterns have been employed? (see e.g. http://soapatterns.org for SOA patterns or Hohpe's book for messaging patterns)
Aspect oriented programming, separation of cross-cutting concerns from the main codes (e.g. declarative security using policies)
Use progressive processing instead of blocking until the entire result finish, e.g. incremental updates (using JMS topic per update-event instead of bulk update the whole database every night), render GUI progressively with separate threads (using Ajax for example). Use paging GUI (e.g. display only 20 results and provide a "next" button). This strategy will improve performance, user experience, responsiveness and availability.

Simplicity

Simplify the requirement (80/20 Pareto prioritizing), design and implementation.
Cut specifications / features which are unnecessary.
Minimalist design and implementation. Does this service fulfill the goal of the service with minimum efforts?
Minimize numbers of components & connections.
Minimize number of applications & vendors to avoid integration problems.
Avoid duplications of provisioning (e.g. authentication data in both LDAP and database) since then you have extra problem to synchronize them.

Design decisions

For every design decision evaluate its impacts to functional & non-functional requirements (e.g. performance, security, maintainability), impact to projectplan/constraints (cost, deadline, resources & skills).
Prioritize your non-functional requirement for design trade-off (e.g. if performance is above security & reliability you might avoid message encryption & persistence jms)
Which integration style this service uses (e.g. RPC-like web service, file transfer, shared database, messaging)?
Does this service wrap legacy application or database? Does this application/database already provide out of box SOA integration capabilities (e.g. web services, messaging trigger) that I can use? Can I replace the underlying application/database with another vendor/version without much change propagations to other services?
Where the services will be deployed? e.g. cloud providers, internal virtual machines, distributed servers around the world, local PC, etc.
Which trade off do you choose about message structures: rigid contract (wsdl soap) vs flexibility (e.g. no wsdl, REST, generic keys-values inputs) considering security / message validation, performance, extendibility, chains of wsdl/xsd changes?
Avoid concurrency programming if possible since it's error prone. If you decide to use concurrency make sure that multi-threading problems (race, deadlocks) have been addressed (tested ok).
Which transport protocols do you use (e.g. http-soap, http-rest, jms) and why? Do you need to wrap the protocol (e.g. sync http to wrap asynch jms)? Aware that your platform perhaps has non standard protocols that offer better performance (e.g. Weblogic T3, Weblogic SOA-Direct).
Do you consider event driven paradigm (e.g. jms topic)?
Understand the features of your frameworks (e.g. security, transactions, failover, cache, monitoring, load balancing/clustering/parallelizing, logging). Using framework features will simplify your design so you don't have to reimplement those features.) Read the vendor recommendation / best practices documents.

Requirement management

Have you followed the standards & laws? e.g. SOA guidelines document in your organization, privacy laws as Sarbanes-Oxley(US)/Wet bescherming persoonsgegevens (Netherlands), etc.
Is there any real time requirements (e.g. nuclear plant control system/ TUD-IRI)?
What is the functional category of this service?

Business Process e.g. StudentRegistrationBPELService
Business Entity e.g. OsirisService
Business Functions e.g. PublishStudentService
Utility, e.g. EmailService, Scheduler, LoggingService
Security Service (handle identity, authorization)
What is the landscape level of this service: Atomic, Domain, Enterprise? (see The Definitive Guide to SOA by Davies et.al)

Does this service fulfill the functional & non functional requirements defined in the specification document?
Avoid constantly changing requirements. Avoid feature creep.

Asynchronous pattern

The benefits of async messages:

avoid blocking thus improve responsiveness & throughput (for better performance, user experience & availability)
improve reliability / fault tolerant with persistent queue & durable subscriber
loose coupling between producer & consumer (queue) or publisher & subscriber (topic)
defer heavy processing to nonpeak period (improve performance & availability)

The drawbacks of async messages are the complexity of implementation:

· how to persist messages in the queue in case of server fault

· how to handle if the messages is not delivered (e.g. fault in the subscribers)

· how to handle duplicate messages or out of sequence messages

· how to inform the caller about the status of processing (e.g. via a status queue or a status database table)

However with the advances of enterprise integration frameworks (e.g. Oracle OSB), it's becoming easier to deal with these problems.

Beware that some process need direct feedback (e.g. form authentication, form validation) where a synchronous pattern is more appropriate.

Software process / governance

Establish standard / guidelines for your department, summarize them into checklists for design & code review.
Include design & code review in your software development process. See http://soa-java.blogspot.nl/2012/04/software-review.html
Establish architecture policies for your department. Establish a clear role who will guard the architecture policies and guidelines e.g. the architects using design/code review.
For maintainability & governance: limit the technologies used in the projects. Avoid constantly changing technology while still open to the new ideas. Provide stability for developers to master the technology.
Establish change control. More changes means more chances of failures. You might need to establish a change committees to approve the change request. A change request consists of why, risks, back-out/undo plan, version control of configuration files, schedule. Communicate the schedule with the affected parties before.
Use SLA to explicitly write down user expectation: availability (max downtime), hours of services (critical working hours, weekends/holidays), max users/connections, response time/processing rate (normal mode, degradation mode), monitoring/alert, crisis management (procedures to handle failures, who/how to get informed, which services/resources have priorities), escalation policy, backup / recovery procedure (how much to backup, how often, how long keep, how long to recover), limitations (e.g. dependency to external cloud vendor). Quality comes at a price: be realistic when negotiating SLA e.g. 99.99% availability means that you have to guarantee less than 1 hour downtime per year, which is quite tough to achieve.
Have contingency plan / crisis management document ready: procedures to handle failures, failover scripts, how to reboot, how to restart in safe mode, configuration backup/restore, how to turn-on/turn-off/undeploy/deploy modules/services/drivers, who/how to get informed, which services/resources have priorities (e.g. telephony service, logging service, security services). Have this document in multiple copies in printed version (the intranet and printer may not work during crisis). The crisis team should have exercised the procedures (e.g. under simulated DOS attack) and measured the metrics during the exercise (e.g. downtime, throughput during degradation mode). Plan team vacation such that at least one of the crisis team member always available. Some organizations need 24/7 fulltime dedicated monitoring & support team.
Document incidents (root causes, solutions, prevention, lesson to learn), add the incident handling procedures to the crisis management document.
Establish consistent communication channel between architects, developer team, external developers, testers, management, stakeholders/clients (e.g. documentation trac wiki).
Communicate design considerations, issues to the project team/developers and stakeholder.

Data

Data design

Do you use common data model? How to enforce consistent terminologies and semantics between terms used in different systems (databases, Ldap, ERP, external apps, etc.)
Use simple SOAP data types (e.g. int). A new datatype introduces overhead during (de)serialization of the messages. Don't use xsd:any type.
How the data will be synchronized within different databases, ldap directory, external systems?
Use MTOM/XOP instead of SWA or inline content for transmitting attachments / large binaries
Consider claim and check patterns to avoid data processing: http://eaipatterns.com/StoreInLibrary.html
Consider different ways to store data: rdbms-database (for high relationships, ACID), NoSQL/document store (simple key-value data, better scalability/ easier to split), file system (better for read only data).
How will your service handle localization/globalization, different encodings/formats? e.g. unicode for diacritics/Cyrillic/Chinese, the DateTime format in your web service vs the format in the database/ldap/external-webservice. Do you consider the effect of time zones?
If you use file-based integration: is the file permission right (e.g. access denied for apache-user trying to read file generated by weblogic-user)? is the encoding right?
Are transactions required? Are compensations/rollbacks required?
Aware the complexity/scalability of algorithm you use.
Chose data structure based on the usage need (e.g. tree can be faster for search but generally slow for insert/update) and the particular properties (size, ordered, set, hash-key, etc) that motivate you to choose that structure.
Choose the right data format so that the transformation need is minimize

Keep data small

Use short element/attribute names to reduce xml size.
Limit the level of XML nesting.
delete old / low value data, move old data to backup/lower tier storage. Determine until when the old data is keep in the production, until when the old data is keep in the backup / lower tier storage. Storage (& its maintenance) is not free.
reduce the data with transformation (e.g. summary, data cleaning, sampling)
Consider message compression (e.g. define gzip content-encoding in http header)
Structure your XSD such that you minimize the information transmitted: use "optional element", minimize data for a specific purpose (e.g. you don't need to send nationality data if you just want to transmit the email address of a person)
SOAP webservice style: use literal instead of encoding. Encoding style has several problems: compatibility, extra data traffic for the embedded data types
Use simplest/smallest data type (e.g. use short int instead of long int if a short is already good enough)

Data management

Can you categorize the quality level of the data? e.g. sensitive data (encrypted/SSL), non sensitive data (not encrypted nor signed to increase performance), critical data (live backup/redundancy), non critical (backup less often), reliable message (transported with persistent queue for guarantee delivery), time-critical/real-time data (use dedicated high performance hardware/networks).
How the data will be backup? How you secure the backup data?
Are the content correct/accurate? spelling checked?
Logically group data for readability & maintainability (e.g. use person struc to contain data about a person: name, birth date, etc)
Data is not free: aware the cost of data (for network bandwidth, processing, storage, people/power/space to maintain storage, backup cost). Eliminate low value & high cost data, sample the low value en low cost data. http://www.hertzler.com/images/fck/Image/Whitepapers/DataCost-ValueMatrix.jpg
Tiered storage based on value and age.

Source: Steve's blogs http://soa-java.blogspot.com/

Any comments are welcome :)

References:

· Code complete by McConnell

· Service Design Review Checklist for SOA Governance by Eben Hewitt http://io.typepad.com/eben_hewitt_on_java/2010/07/service-design-review-checklist-for-soa-governance.html

· Report review & test checklist, university washington http%3A%2F%2Fwww.washington.edu%2Fuwit%2Fim%2Fds%2Fdocs%2FReportReviewAndTest_Template.docm

· IEEE Standard for Software Reviews and Audits 1028-2008

· The Definitive Guide to SOA by Davies et.al

· Enterprise Integration Patterns by Hohpe

· SOA Principles of Service Design by Erl

· Improving .NET Application Performance and Scalability by Meier et.al.

· Patterns of Enterprise Application Architecture by Fowler

·         http://www.royans.net/arch/brewers-cap-theorem-on-distributed-systems/
·          Hacking Exposed Web Applications by Scambray et.al.
·          OWASP Web Service Security Cheat Sheet
·          OWASP Code Review Guide

·          Improving Web Services Security (Microsoft patterns & practices) by Meier et.al.
·          Improving .NET Application Performance and Scalability by Meier et.al.
·          Concurrency Series: Basics of Transaction Isolation Levels by Sunil Agarwal

Tuesday, January 1, 2013

Risk reduction strategy: early & incremental deliveries

In his book, Alistair Cockburn wrote 12 risk reduction strategies, such as described in his web: http://alistair.cockburn.us/Project+risk+reduction+patterns/v/slim

My favourites are the strategy 1-6 which can be summarised as follows:

1. Early & incremental deliveries.

2. Prototyping / run pilot projects.

3. Start immediately, adjust later.

When you don't understand the matter completely, just start to build something based on what you know & the standard (e.g. even if the final WSDL from the cloud provider is not definitive yet, you can guess which information will be in the messages so you can mock the web-service to support your development.) The work will lead you to the answers as you reveal more information during the work.

Incremental delivery should be supported with regression test, ideally automated (continuous integration), so that a new delivery will not break the features from previous deliveries.

When the project just started, many members of your team & perhaps the stakeholders/management are a bit nervous since they don't really now what to expect to happens. Thanks to the early delivery, as soon as you show the early result (perhaps with demo to the stakeholders & customers) you show to the world that the project roll out. The team morale will improve and your team will gain more trust & buy in from the management & stakeholders.

These principles are inherently incorporated in the agile methodology (e.g. Scrum).

Source: Steve's blogs http://soa-java.blogspot.com/

Any comments are welcome :)

Reference:

Surviving Object-Oriented Projects by Alistair Cockburn

Monday, December 31, 2012

Project schedule risks & best practices

In his book, Steve Mc Connell has made a useful compilation of project mistakes: http://www.stevemcconnell.com/rdenum.htm

From this list he compiled the top 10 project schedule risk and the means of controlling them such as:

Feature creep (never ending requests from customer for new/change features): change board, design for change, incremental development
Requirement gold-plating (wasting time to polish features while the added value of the extra efforts is minimal): scrub requirements, time boxing, cut the feature list based on time/cost limit
Over optimistic schedule: multi estimation (e.g. ask estimate from several people), negotiate the schedule, cut the feature list
Silver bullet syndrome (a fallacy that a software/methodology can solve everything): be sceptical of claims, measure/test the software.
Short charged quality: allow time for QA (review/test) activities
Inadequate design: have explicit design activity, schedule time for design, design review
Contracting failure: check references, access the contractor ability before hiring
Weak personnel: hire top talents, training, teambuilding

He compiled also a list of best practices based on their efficacy (reduce schedule, reduce risk, increase progress visibility) such as reuse, outsourcing, time boxing, evolutionary delivery, prototyping, top 10 risk list.

One of the interesting best practices is the "top 10 risk list":

1. make list of project risks (e.g. feature creep, the personnel with required skills is not on board, the external party hasn't deliver their interface/WSDL specification)

2. rank the list based on the risk exposure = probability occur * impact on schedule e.g. 20% * 2 weeks = 0.4

3. risk monitoring: update the ranked list periodically (e.g. weekly) with resolution and status.

Rank this week	Rank last week	How long this risk has been in the list	Risk	Resolution	Status
1	2	4 weeks	The personnel with Oracle SOA skills has not been acquired.	Hire an external consultant. We're expecting to have the resource on board in the week 22.	The hiring budget has been approved last week. This week we're starting with interviewing candidates.

The illustration below interestingly describes different options for risk resolution:

Any comments are welcome :)

Source: Steve's blogs http://soa-java.blogspot.com/

Reference:

Rapid Development: Taming Wild Software Schedules by Steve McConnell

Sunday, December 30, 2012

How to reduce risk in big projects

A big project has bigger risk, so to reduce the risk a big project requires:

Better project planning & team coordination. A big project has more dependencies between teams so you need to smoothen the workflow. Pay more attention to communications between teams.
Better to avoid technology risk (use familiar technology & development methodology, hire consultants that have experiences with the technology.) You better to try out new technologies/methodologies in smaller pilot projects instead of a big project. The use of familiar technology/design will also prevent hours of argumentations over architecture details.
Change one thing at a time. Don't try to do too much things once. Use incremental development, deliver smaller improvements early & often.
It's more critical to follow best practices (e.g. standard/guidelines, architecture governance, domain modelling, design patterns, design & code review, refactoring, requirement scrub, test driven development). You can't get away with breaking the rules or take shortcuts such as in smaller projects.
Pay more attention for reuse, avoid duplication of works by different teams.
Pay more attention to QA (e.g. continuous integration test, automatic build & test scripts).
Use available tools/framework for your infrastructure (e.g. Trac for project/bug tracking, SOAPUI for web service tests, Selenium for GUI tests, OSB or Spring Integration for ESB). Don't build your own tools since it will add more risk and maintenance burden.

Source: Steve's blogs http://soa-java.blogspot.com/

Any comments are welcome :)

Reference:

Surviving Object-Oriented Projects by Alistair Cockburn

Rapid Development: Taming Wild Software Schedules by Steve McConnell

Monday, October 22, 2012

"In-house" vs buying a complex "out of box" solutions

This is my contribution to the religious debate about self-developed "in house" vs buying "out of box" solutions.

Suppose your manager said enthusiastically that he'd purchase a rich-features "out of box" solution after a visit from a salesman from a famous vendor. Your response is to remind him to not rely on vendor solutions blindly, since a rich-feature third party product might have many disadvantages:

Difficult to configure, difficult to learn.
Expensive total cost of ownership: need big resources, each of the product have complexities its own and need to hire expensive specialists for maintenance, upgrade, integration.
In reality only small amounts of the features are really used/necessary.
Dependency to vendor e.g. if you have an urgent bug but the vendor refuse to create a patch soon.
Vendor locking: you can't easily move your solution to other platform. This can be a problem if the vendor discontinue the product or if the license price becomes too expensive.
Even though the salesman advertised direct out of the box solution, in reality you need to spend works to integrate this external product with your environment (i.e. interfacing with existing ERP systems/databases in your company).
Due to time difference with the vendor's support team, it can be difficult to discuss the problems. Sometime you need to wait for the answer from your question until the next day. Sometime you need to stay out of the normal working hours to debug the problems together with the support.
With in house knowledge the root cause analysis and the solutions are more transparent than if you just buy a black box solution.

One of a good trade off is: open source solutions. You don't have to build from scratch / reinventing the wheel but you have more control over the code and its development.

Similar argument holds for clouds solution. Clouds solution might add dependency to the service provider, it will be more difficult to debug the problems for example in comparison with if you have the code & in house knowledge.

Source: Steve's blogs http://soa-java.blogspot.com/

Any comments are welcome :)

Reference:

• Scalability Rules by Abbott & Fisher

Friday, October 19, 2012

Simple capacity planning

identify actual usage rates, max and seasonality (e.g. business calendar). The "usage rates" here is a performance measure (e.g. throughput, processing rate, #simultaneous transactions, storage/memory, network bandwidth Mbps)
you can determine growth rate based on business assumption (e.g. the marketing manager said that the users of our social-apps will grow 10000% in 2 years) or using forecasting technique (e.g. Holt-Winter to predict trend)
determine the headroom gain by optimization / refactoring projects
add 10-20% usage margin (since machines that run with 100% capacity will be unstable)
measure server peak capacity using load test
compute: capacity needed = usage max + grow - headroom gain + usage margin
If the capacity needed is below server peak capacity, you need to buy a better hardware (scale up) or trying to tune or trying to scale out (clustering).
If you choose to scale out, compute the number of server needed = capacity needed / server peak capacity.

Caveats:

Plan for 2-4 year, not too long since it's difficult to have good assumptions (e.g. growth rate) for long term. In 5 years you might have another marketing manager with different targets & growth assumptions.
The number of server needed equation above is assuming that the cluster management work (load balancing etc) is negligible compared with the main works.
The number of server needed equation above is assuming that the servers are in separated physical hardware. If these servers are virtual machines in one physical server then 1+1 is not 2 anymore.
Understand which resources that need improvement (e.g. if network bandwidth is the problem adding more servers might not alleviate the problem).
If possible try to scale-up first since it's easier, faster to implement and often cheaper than scale-out. But your vendor salesman might try to convince you that the scale-out upgrade packages is easier to integrate to your deployment environments and applications (just a matter of hiring his consultants), well as my wise grandma once said... always do fact checking to what the salesmen and politicians said.

Source: Steve's blogs http://soa-java.blogspot.com/

Any comments are welcome :)

Reference:

The Art of Scalability by Abbott & Fisher

Tuesday, September 11, 2012

Weekly Status report template

• Progress this week (planned & actual begin/end/duration)
• Unplanned activities this week (begin/end/duration)
• Pending: planned this week but not yet completed (and the reason e.g. dependencies to previous not-yet-solved bugs, coding which is not finished yet)
• Planned activities next week (with priority lists)
• Change requests (e.g. requirement change/additional requirements)
• Issues (new issues, pending issues/bugs with severity lists, dependencies, roadblocks, ESCALATION) and risks e.g. unavailability of resources (holidays), extra resources needed, firewall adjustment by infrastructure team.
• Impact to the overall delivery deadline, impact to the planning of next activities (e.g. acceptance test by users)

Remember KISS (Keep it simple) principle, you don't have to write all these items every time. Your manager has no time to read your long weekly report.

Please share your comment.

Source: Steve's blog http://soa-java.blogspot.com

Further reading:
http://workarrow.com/burn-your-weekly-status-report-on-second-thought/

Thursday, April 26, 2012

Software Review

Since I am involved in the software review & guideline team at my work, I've spent sometime to study about review process, which I want to share with you in this blog.

The benefits of software review:
• Increase software quality, reduce bugs.
• Opportunities to learn (for both the code authors and the reviewers), as a mean for knowledge transfer to junior developers.
• To foster communication between developers.
• Various study showed that review process save costs (e.g. $21 million reported by HP). It's cheaper to fix the bugs in the earlier phases (design, development) than in the later phases (QA/test phase, shipped products)
• As a part of best practices/standard e.g. PSP, CMMI3.
• Motivate the developers to improve their code quality in order to avoid "bad scores" during review. This ego-effect still works even when the random-review covers only 30% of the total codes

The disadvantages of review:
• Cost time. Solution: limit time (e.g. max 1-2 hours)
• Developers have to wait for the reviewers, might create delay in the pipeline. Solution: the project manager has to include software review process in the plan including the time & resources (the reviewers for review, the developers for rework).
• The code author feels hurt when someone else points their mistakes. Solutions: be sensitive/friendly when discussing the findings, both the reviewers & authors agree on the positive benefits, the reviewers give also positive feedbacks to the authors, focus on the codes not the authors.
• The developers think that they have better things to do. Solution: support from the management (e.g. enforce the review process formally).

An example of a review process, which is consisting of 3 steps:
1. "Over the shoulder" short session (30min)
The author guide the reviewer through the code: the entry point, most important classes and the relationships between them. He also explains the flow, sequence & concurrency mechanism and the patterns/algorithms used. This session is similar to a pair programming session. However we need to aware of the disadvantages of this method:
• the author has too much control with the scope & pace of the review.
• The reviewer has barely time to check properly.
• The reviewer tends to condone the mistakes after hearing the authors explanations.
That's why we need to keep this session short and follow this session with a private-review session.

2. Private-review (30-90min)
Without involvement of the author, the reviewers check out the code from SCM (e.g. svn), check some documentations (usecase, specs, design), do fast sanity check, perform some test/validation (e.g. soapui), check against checklists & specifications, and read some parts of the codes.

3. Past review activities:
• The reviewer discussing the findings with the author (30 min)
• consult with the product owner, architect, team-lead, project-manager regarding the risks, bug priorities & reworking impact for the plan
• create bug tickets in the trac/bugzilla
• make an appointment for follow up

Some best practices:

Determine the scope / part of the project for review, based on: risk analysis and author's error log, e.g. Based on his personal log, Bob (the author) knew that he often made mistakes with web security, so he advised Alice (the reviewer) to concentrate to the security issues of his web codes.

To improve the process, you need to define metrics in order to measure the effect of the changes. These metrics can be external (e.g. #bugs reported by QA team, #customer tickets) or internal (e.g. #bugs found, time spent, loc, defect density, complexity measure). Based on the #bugs found by 2 reviewers, you can guess the estimated total bugs and review yield.

Checklist is the most efficient tool for reviewer. The checklist should be short (less than one A4) and describe the reasons, risk level (based on damage/impact, popularity, simplicity to perform), references for further information.

Perform self-review using personal checklists (since everybody has a unique tendency for certain mistakes).

Take advantage of automatic tests (e.g. checkstyle, findbugs, pmd in java). Some companies include these tests in their continuous integration test and their metrics then can be showed in a graph to highlight the quality trend (e.g. bug reduction vs sprint cycles).

Review meeting is not the most effective way to review. It costs more man-hours. It's better for the reviewer to be alone concentrating reading the code.

Maintain reviewer's concentration by limiting the time for each review (max 1.5 hour, 400 loc (line of code)/review). Slowing down to read the code carefully (max 500 loc/review).

Having the code authors annotate the code appropriately (e.g. the patterns/algorithms used: visitor pattern, quick sort, etc.).

The code authors provide notes for his/her project: starting point/files to begin, important classes, dependency/class diagram, patterns used, where to find the documentations (use case, specs, design doc, installation guide), test cases (e.g. web service request-response examples). This information is useful not only for the reviewers but also in case the author leaves the company (e.g. a temporary external consultant). You can use a trac/wiki as a collection point for the information.

Verify/follow up if the bugs is really fixed.

Beware not to compare orange with apple: the #bugs found in a code is not only depend on the developer expertise but also:
• how complex the problem is
• how many reviewers, time spent, their expertises
• specification & code maturity (development version, beta version, shipped product)
• programming languages
• tools (IDE, validator, etc)

Review-team lead / process owner responsibilities:
• maintain expert knowledge of the reviewers, arrange trainings if necessary
• establish and enforce review policies
• lead the writing and implementation of the review process and action plans
• define the metrics, make sure that they're collected and used
• monitor the review practices and evaluate their effectiveness

Process assets:
• process description
• guidelines & checklists
• issues tracking system e.g. trac/bugzilla

Where to do the review in a typical software process:

To be continued: http://soa-java.blogspot.nl/2012/09/the-review-process.html

Source: Steve's blogs http://soa-java.blogspot.com/

Any comments are welcome :)

Literatures:

11 Best Practices for Peer Code Review
http://support.smartbear.com/resources/cc/11_Best_Practices_for_Peer_Code_Review.pdf

Best Kept Secrets of Peer Code Review by Jason Cohen
Plus: recently new, supported by scientific analysis of literature & field studies, down to earth advices (instead of management jargons high in the clouds). minus: repetitive advertisements for their review-tools.

Peer Reviews in Software: A Practical Guide
by Karl Wiegers (Paperback)

Seven Truths About Peer Reviews by Karl E. Wiegers
http://www.processimpact.com/articles/seven_truths.html

OWASP code review guide
https://www.owasp.org/images/2/2e/OWASP_Code_Review_Guide-V1_1.pdf