Skip to main content

How to prepare a stable release

· 3 min read

A debate about software release process

In every software project there is a lot of emotional discussions about the release process, branching model, testing environments.

All that discussions usually want to answer a question: how the hell we can make a stable deployment with less stress and less bugs?

You can deploy few times a day or once a month, you can have monolithic application or microservices - it does not matter so much. You can always apply the following rules.

The rules

  1. There was a deployment to production - put aside this version so it is easy to add hot-fixes to it in case of emergency.  Test hotfixes on a dedicated environment. Update production-like version of code after applying hotfix changes to production.
  2. Focus team efforts on preparing next release.
    • Start with merging some finished development tasks. What does it mean finished? It depends on your definition of done. It may include code review done, automated tests passing, metrics matching or just a developer changing status to "Resolved" if you can take more risk
    • Have release candidate tested in dedicated environment where QA team / release manager has the ownership when and what can be added
    • When verifying release candidate only fixes related to the tasks under tests are allowed
    • Any other tasks can be added only after the approval of people responsible for release
    • Remember that adding anything new to release candidate may destabilize the whole version and delay the release
    • Avoid reverts and cherry-picks or even forbid it allowing only for merging changes
    • Have tools to track what commits related to which tasks were added to the release candidate
    • Do not take too much validation work at once. Try to achieve stable release candidate in at most 2 weeks.
  3. When release candidate is stable and everything works as expected you can deploy immediately. If you have strict release process with fixed dates, put this version aside to be waiting for next release. You already know that your release notes won't be empty. It can be only better. Start preparing next release candidate on top of that. If it will be stable before agreed release date, again put aside a bigger version and repeat this process until your release date has come.

Process has several steps but it doesn't have to be cumbersome to follow. If your team is able to perform all the steps above automatically, within minutes or even seconds, it means that you have successful implemented continuous delivery - holy grail for release managers.

The ultimate goal - being reliable

This process can be implemented by adopting GitFlow but this is not the only way. Ultimate goal is to be certain about every deployment to production. Without any surprising changes appearing on production and without any panic on the release day and hours after that.

The scope of the release is usually negotiable. But the trust is usually not. Reliable software teams prefer to make smaller amount of changes or introduce it later rather than risking production system being unstable.

A story about running a software house

· 5 min read

The beginning

In 2013 I've co-founded Epicode software house. Before starting the company we were doing after-hours projects with my business partner. He was handling sales and design. I was handling development. This is also how we split the duties in Epicode. He became CEO and I was CTO. At the beginning, it was quite an easy role in a company of 2 employees.

Working as we liked

We would probably not start Epicode as a full-time job if not the big customer open for cooperation: Grupa Pracuj where we both worked before. Grupa Pracuj was just starting a new startup: emplo.com and needed to build a development team for it. They become our biggest customer and shareholder. It was a good deal. We had financial security and independence. We could do the software business as we liked. Our customers including Grupa Pracuj cared only about results.

Quick start and early success

I've relocated back from Ireland to Poland to start working full time at Epicode. In July 2013 there was me, my partner, emplo.com project starting and 3 other small ecommerce customers we had as a result of our part-time moonlight work before Epicode. Good start to grow. Soon we started hiring and rented an office. After 2 years we were over 20 people: developers, testers and designers.

Growing challenges

Next 3 years has shown us exactly the challenges of growing company over 30 employees. For that you need to delegate and to have middle-management. We were not ready for that. As local "superheros" we felt that we must be directly involved in every project to be sure that it will succeed. We were focusing on project-related work, not on growing the company. There was also a constant feeling that if we hire more people we are not sure if we will manage to get enough customers. On the one hand we always had a big workload, but we were never sure if this workload is stable enough to hire more people.

When running a software house, the biggest challenge is to have proper balance between new projects coming-in and number of hired people. The only cost that matters are salaries. If you hire too many people you can quickly produce lost.  Crucial part is to have predictable sales pipeline so that you can plan at least couple of months ahead.

Learning sales

We did an effort to hire a sales person but it has only shown how unprepared we were to scale. That person reached out to over 500 potential customers but didn't even manage to setup a single meeting. It turned out that sales person did not understand what we were really selling. For us it was obvious: we were listening what problems customer wants to solve and then we were proposing how we would approach that by providing appropriate software solution. It usually worked. Our sales meetings were not about sales, it was about free consultancy after which the customer simply wanted to work with us. But this approach did not scale without proper staff training, marketing content and well-prepared case studies. With an ad-hoc sales person concentrated solely on rates that we offer and deadlines that we can meet, the effect was terrible. We were treated like spammers.

Goal bigger than sales - do what you like doing

But the reality was that we were not convinced if we would really like to focus on marketing, sales and growing the company.  Once we have got a project we were sinking into delivering it, forgetting about searching the next thing to do after current project will be done. When I was working on non-technical stuff I had a feeling of loosing time and energy on something that is not really my pair of shoes. Using social media for promotion? Blah... We are hackers, developers, creators, not marketers. If somebody does not want to work with us, it's them who should regret. I was not realising how arrogant that way of thinking was.

Our company started to drift into games direction and developing our own game. Many people we had onboard wanted to work on own product which would give much more freedom on how to work. Not to relay solely on B2B sales, contracts and timelines, especially that we were not mastering that processes. Developing a game had a chance to become a high-margin business - with higher risk but also with higher gratitude if it will turn out to be successful.

But for me personally it was not something I wanted to do. I was excited about solving real-life problems that companies or end-users have. I didn't want to close myself in a dark room coding a virtual world. My preference was working with people and business processes.

There is no progress without change

After 5 years at Epicode I decided to go my own path. We had great time at Epicode with plenty of success stories and delivered projects. But I needed a fresh context and new fuel. Now I am excited to be a part of scale-up process at Mash.com.  Scaling is something that failed for us in Epicode, that's why I am so enthusiastic about being a part of it at Mash.

Technology and business

· One min read

I'd like to share 2 sentences which came to my mind this evening...

1. Business is easy to understand but hard to master

So many business coaches, so many books and success stories. They say: just do it! But in practice it's so hard to build your own product which would earn 1 little dollar a day.

2. Technology is hard to understand but easy to master

Technology is often abstract or complicated. A lot of jargon around it. It's not easy to understand that jargon without being a part of it. On the other hand the rules are simple. You have the specs, the docs, just learn it and it's almost guaranteed that it will work if you follow the instructions carefully. Everything is predictable and well defined - totally not like in business.

Where do I see myself?

I understand business, but I do not master it. I understand technology and master significant areas of it. In my professional journey I am looking for environments when I can work closely with business and people who master it. My role is to add value by managing and building technical solutions supporting given business model.

Actor model programming in Orleans framework

· 5 min read

I've spent some time recently playing around with Orleans Framework. It's an alternative to Akka.NET offering similar actor-based architecture.

What is actor model?

Actors are called Grains in Orleans. Actor or Grain is a class representing some business entity. It can be instance of a Player, Trader, Customer or Account depending on domain topic.  In practice Grains are implemented like normal C# classes. A constraint that we have is that all methods have to be asynchronous to keep all the communication between grains non-blocking.

Local state

Grains have local state. It means that they live in memory between requests to system. It gives big performance benefits compared to creating entity instance based on data from database for each request. State can be persisted to database to avoid loosing data on system restarts. As programmer you can invoke saving to storage any time when state has changed.

Horizontal scalability

Orleans can run in cluster using many servers. Framework can place grains in all nodes across the cluster, so that each grain is located only in a single node. There can be some exceptions from that rule: when node crashes framework may be not sure when exactly grain has finished its processing. This problem is in general called a split-brain in computing. But this is an edge-case which falls into error handling strategies, overall assumption is that each grain is activated only once.

Grains are exchanging massages between each other. That messages use super-fast .NET binary serialization. Messages can go over network if 2 grains are on separate nodes. So it is important to make grains not too chatty if you care about performance, and you probably care if you are interested in frameworks like Orleans :)

Possibility to run Orleans in a cluster gives beautiful linear scalability.

What problems is actor-model good for?

Actor model is suitable when you have a lot of objects communicating with each other. Example use cases:

  • Real-time trading systems
  • Multiplayer games
  • IoT applications connected to many devices

Grain activations should be distributed randomly and decentralized. Actor-model is not suitable for batch processing or centralized design where some entities have to process most of the requests (so called hot-spots).

Event sourcing

Actors are  a good fit to match with event sourcing pattern. Grain supports that pattern by JournaledGrains. But here comes a disappointment. Available storage mechanisms for event log persistence are poor. The only built in storage provider saves event log for given grain as collection serialized into single state object, so the whole event log needs to be read before recreating grain state. Other built in storage saves only state snapshot without saving event log. Good thing is that there is flexible extensibility point  allowing to write your own provider by implementing just 2 methods for reading and writing events. There is also a community contribution available which integrates Orleans with Event Store but this database is not my favorite. Probably I'm complaining too much and should instead contribute by implementing events log storage based on Cassandra or CosmosDB, it does not look like a hard task, but the next topic is much harder - distributed transactions.

Distributed transactions

Creators of Orleans framework did a great job to formally describe frameworks semantics. You can have a look at how they implemented distributed transactions: here

The algorithm is very interesting but from practical point if view, what I miss is lack of support for transactional communication between JournaledGrains. Again, support for event sourcing pattern seems to have been not a top priority in Orleans so far.

I you would like to jump deeper into other theoretical aspects of actor-based architecture, you may be interested in other Microsoft Research materials: https://www.microsoft.com/en-us/research/project/orleans-virtual-actors/

Message delivery

Orleans can give you one of the guarantees:

  • message will be delivered at most once
  • message will be delivered at least once

There is not guarantee to deliver the message exactly once. We are in distributed system and this problem is not easy to solve without sacrificing performance. This is something to be aware of. It's up to you how to introduce fault tolerance.

Orleans and microservices

You can think of Orleans as of microservices framework. The services are really micro. Each grain is a service. You probably cannot go more micro with microservices  than in actor-based architecture. If you are building a microservices-based system, please have a look at Orleans docs and ask yourself an honest question: have you thought about all that problems that Orleans addresses and solves when building your microservices solution? We often make shortcuts through mud and bush because we do not even know that there is a better way. Please have a look at this presentation to illustrate some examples:

https://www.slideshare.net/JontheBeach/distributed-transactions-are-dead-long-live-distributed-transaction

Summary

I'm very grateful for all contributors who put Orleans into existence because it provides decent ground for building well-defined actor based architecture. Even if this model is not suitable for your needs, Orleans is very educational. Making a deep dive into its architecture and implementation can broaden architectural horizons a lot.

But on the other hand in my opinion you have to be prepared to make quite a lot of custom extensions / contributions on a framework level to build production-class system. There is an interesting initiative called Microdot framework which adds to Orleans many must-have features when building real system. But even with Microdot, this ecosystem looks more like an academical research rather than a shiny framework ready to ship to production out-of-the box. For everyone looking for something more mature with bigger support I would recommend to look at Azure Service Fabric.

But forgetting about production and enterprise readiness, programming model in Orleans is sweet. APIs are well designed and framework offers many extensions points to play with. Worth trying before signing-up for a cloud solution.

Azure Monitor (aka Application Insights)

· 2 min read

So far I was using NewRelic for .NET applications monitoring on production and it is a great product. It has everything that we could expect from APM tool: browser and backend code instrumentation, alerts, error stack traces, requests performance statistics, CPU, memory and disk usage; even showing SQL queries sent to database and Redis query statistics.

Recently I've put some effort to evaluate Azure Monitor as an alternative. I was using it before for basic use cases of monitoring Azure resources but I've never explored it's full capabilities. And that capabilities are enormous!

With NewRelic I was using ELK (Elastic, Logstash , Kibana) as a complementary tool to gather custom application-specific logs and metrics. With Azure Monitor I don't see such need anymore. When hosting applications in Azure, Azure Monitor already covers all the functionalities of both New Relic and ELK in one box.

What I like most about Azure Monitor:

  • Integrates seamlessly with Azure cloud to provide host-level metrics
  • Provides insights into containers running in Azure Kubernetes Service
  • Runs on powerful Azure Data Explorer engine which allows to analyze data in various formats in a consistent way
  • Makes it easy to define custom metrics
  • Supports advanced alerts and automation based on log queries and metrics
  • Easily integrates with .NET Core applications
  • Rich visualization tools including ability to export data to Power BI
  • ... and yes, it provides exception traces, code profiling and web request performance statistics

Apache Ignite as an alternative to Redis cache

· 3 min read

Introduction to Redis

I am quite a big fan of Redis as a distributed in-memory cache. It also acts good as a session storage.

There is a network penalty to communicate with Redis service, so as with talking to database you cannot be too chatty. It's much better to ask for multiple keys in a single request at the beginning of your logic to quickly get all necessary data at hand. But reading the values from Redis should be much quicker than from database. First of all it's a simple key-value store so it's like always reading by primary key . Secondly we benefit from having everything in RAM. It is also possible to run Redis in persistent mode but that's a different use case, when you may not use an SQL database at all.

Cache-aside pattern

RAM is usually limited and cannot store all the data we have. Especially that in Redis you will usually introduce quite a lot of redundancy to keep as much as possible in a single key.  Limited memory space is easily solvable by applying cache-aside pattern.

Updating data in Redis

More difficult problem to solve is refreshing data in Redis when something changes. One solution is to set expiration date after specific time but your users would not be happy. We all live in a real-time instantly updated world. Delay by design? It does not look good. So what is left is to remove old data from Redis as soon as it was changed. First of all you need to identify all the places in your system where given piece of information is modified. In a big legacy system that may be a challenge. If you are more lucky your system may have proper event sourcing implementation allowing for easy change detection by just listening on events. OK so we know that given entity has changed, which keys to remove from Redis now? It is handy if your code is able to generate all the Redis keys under which data from given entity is stored and delete them in a single Redis call. For batch updates you may consider using scan operation for keys pattern-matching.

Updating data in Apache Ignite

Apache Ignite is easier to introduce as a cache layer in a system with SQL database because it supports SQL and Read-through/Write-through pattern. There is out-of-the-box integration with Entity Framework: https://apacheignite-net.readme.io/docs/entity-framework-second-level-cache Unfortunately no version for EF Core is available.

Conclusion

If you use EF >= 6.1 < 7 and would like to introduce distributed cache or you are already fighting with not-updated-cache bugs every week, consider using Apache-Ignite.

How to make password reset link more secure?

· One min read

Sensitive data should not be stored in URLs. A lot has been written about that. URLs are logged in a lot of different servers through which the HTTP request is travelling (web server, SMTP servers, proxies, browser history etc) and sensitive data is stored there.

But there are situations when avoiding having access token in URL is difficult, for example in a password reset link which is sent by email.

In that case we can add more security by implementing a following pattern: 1. Action which handles reset password reads token from GET parameter 2. Token is validated and stored in user session or cookie 3. User is automatically redirected to password reset action which does not have access token in GET parameter anymore. It could be even the same action. After the redirect we don't have the access token anymore in URL.

Note that if we have any external link on our password reset page (e.g. social media), all GET parameters would be also accessible from that 3rd party servers as a HTTP-Referrer request header after the user follows a link.

Also remember to add expiration time to password reset links and make it single-use.

How to blog with a baby?

· One min read

When you are taking care of a toddler who is just trying to walk, you need to write fast or your articles need to be short. Thank you.

The cost of transparent recruitment process

· 2 min read

Transparency in organizations has tremendous benefits. But it doesn't come for free. Let's take recruitment process as an example.

If there are only 2 autonomous decision makers, then the process is simple. They screen CV, meet the candidate, have conversation, check the tasks that the candidate was assigned, may have a quick follow-up to and that's it. Usually this is enough to make the decision. Feedback can be given to candidate even immediately if recruiters are experienced.  In case of 1 decision maker it is even simpler, but it's usually good to have at least one more opinion.

But what if we would like to make that process transparent? Score the candidates by measuring them somehow to show where the decision comes from? It may be easy to measure A/B/C test score, financial expectations and years of experience, but not everything is so quantitative. Measuring personality, value of the experience for the company, attitude, personal growth potential or some creative task may be tricky. It may require to create complicated recruitment process and metrics to justify the final decision transparently.

What are your opinions? Is it worth to invest in transparency in this case? Or it's better to trust the decision makers and just have informal notes about the candidates? Have you ever wondered what should be the proper balance?

Good and bad technical debt

· 5 min read

Albert Einstein said that everything should be made as simple as possible but not simpler. Simplicity is the holy grail of IT and also of business. It needs smart thinking and often experience to make complex things simple. But how about the other situation, when things are simpler than they should? In that case we are creating technical debt.

In finance debt is not always bad. When debt contributes to generating higher income and cost of interest is lower than that income, then the debt is healthy. For example when company makes a loan to modernize its equipment to be more competitive and generate bigger revenue - then we consider that loan as a good investment. Of course there is always risk involved and we usually cannot be sure about the future revenue but that's another story.

How about IT projects? Can technical debt be also good?

Examples of good technical debt

Debt is a shortcut to generate revenue quicker. IT projects are no different. If a quick and dirty implementation will be enough to acquire the customer which will bring income - debt is acceptable. It is better to have quick and dirty implementation which generates revenue than missed deadline and lost deal. However, income should be high enough to pay the debt later. Introducing big technical debt to have little income is probably not a good move.

An example of good debt may be building a prototype. Prototype is something that we should afford to throw away when it turns out that the idea is not worth continuing or there is a better approach to continue it. Prototypes are great for demos and idea testing. However, when the prototype is approved we should keep in mind that it will usually require a significant refactoring or even building from the scratch before it becomes the final product. In other words - the debt will need to be paid back. Good software development practices may help to reuse significant part of the prototype in the final product.

Another example of a good debt may be hardcoding some logic. Hardcoding is always a concern for software developers because it means that the created solution is not flexible and may need more work in the future to introduce changes. But over-engineering is as bad as over-investing. When we don't need flexibility, let's not introduce it just in case. Postpone any work as long as possible. It may require some rework in the future but we may also have bigger budget in the future. Paradox here is that what means debt for technical team may be savings for business team. To avoid technical debt in codebase, business may need to make real debt to cover development costs. Usually it is better to have debt in codebase rather than on bank account. Good software architecture will allow to pay technical debt much easier that to pay real money back to the lender.

Debt management

Technical debt as any other debt has to be manageable. Stakeholders and the team need to be aware where the debt exists, how big it is and what is the interest rate. Interest from technical debt is paid by having lower productivity. Teams spend time on investigating how system really works, fixing bugs, manual configuration and manual testing. The more time is needed for those activities, the higher is the interest rate. It is really bad when the cost of this time is not covered by the revenue that the product generates. But even when that cost is covered, companies should be paying back technical debt systematically to have proper level of productivity which allows to stay competitive.

How to manage technical debt?

1. Measuring how much time is spent on system maintenance.

Management teams are often not aware how much time is spent in this area and therefore don't have numbers to understand how much it costs to pay the interest for technical debt.

2. Using tools to measure code quality and automated tests coverage.

That tools are usually easy to integrate with development pipeline and help to identify technical debt. They provide deep insights into codebase but are not saying if the identified debt is good or bad.

3. Mapping reported issues to specific areas of the codebase.

It helps to identify which parts of the system generate highest interest for technical debt. Some parts of the system may have poor code quality but there are no issues in terms of maintenance.  It is like having 0% interest loan. Other parts may have great code quality and 100% test coverage but they are generating a lot of issues because of wrong logic, missing requirements or lots of manual actions involved. It shows that interest can be paid not only because of poor codebase but also because of poor requirements specification and business analysis. Usually it goes together - when requirements are messy then the source code becomes messy.

Sumamry

What are your opinions? Do you agree that technical debt can be good? Can you give any more examples of good technical debt and how to manage technical debt?