Blog | Software Architecture Recipes

Technology and business

March 7, 2019 · One min read

I'd like to share 2 sentences which came to my mind this evening...

1. Business is easy to understand but hard to master

So many business coaches, so many books and success stories. They say: just do it! But in practice it's so hard to build your own product which would earn 1 little dollar a day.

2. Technology is hard to understand but easy to master

Technology is often abstract or complicated. A lot of jargon around it. It's not easy to understand that jargon without being a part of it. On the other hand the rules are simple. You have the specs, the docs, just learn it and it's almost guaranteed that it will work if you follow the instructions carefully. Everything is predictable and well defined - totally not like in business.

Where do I see myself?

I understand business, but I do not master it. I understand technology and master significant areas of it. In my professional journey I am looking for environments when I can work closely with business and people who master it. My role is to add value by managing and building technical solutions supporting given business model.

Actor model programming in Orleans framework

March 5, 2019 · 5 min read

I've spent some time recently playing around with Orleans Framework. It's an alternative to Akka.NET offering similar actor-based architecture.

What is actor model?

Actors are called Grains in Orleans. Actor or Grain is a class representing some business entity. It can be instance of a Player, Trader, Customer or Account depending on domain topic. In practice Grains are implemented like normal C# classes. A constraint that we have is that all methods have to be asynchronous to keep all the communication between grains non-blocking.

Local state

Grains have local state. It means that they live in memory between requests to system. It gives big performance benefits compared to creating entity instance based on data from database for each request. State can be persisted to database to avoid loosing data on system restarts. As programmer you can invoke saving to storage any time when state has changed.

Horizontal scalability

Orleans can run in cluster using many servers. Framework can place grains in all nodes across the cluster, so that each grain is located only in a single node. There can be some exceptions from that rule: when node crashes framework may be not sure when exactly grain has finished its processing. This problem is in general called a split-brain in computing. But this is an edge-case which falls into error handling strategies, overall assumption is that each grain is activated only once.

Grains are exchanging massages between each other. That messages use super-fast .NET binary serialization. Messages can go over network if 2 grains are on separate nodes. So it is important to make grains not too chatty if you care about performance, and you probably care if you are interested in frameworks like Orleans :)

Possibility to run Orleans in a cluster gives beautiful linear scalability.

What problems is actor-model good for?

Actor model is suitable when you have a lot of objects communicating with each other. Example use cases:

Real-time trading systems
Multiplayer games
IoT applications connected to many devices

Grain activations should be distributed randomly and decentralized. Actor-model is not suitable for batch processing or centralized design where some entities have to process most of the requests (so called hot-spots).

Event sourcing

Actors are a good fit to match with event sourcing pattern. Grain supports that pattern by JournaledGrains. But here comes a disappointment. Available storage mechanisms for event log persistence are poor. The only built in storage provider saves event log for given grain as collection serialized into single state object, so the whole event log needs to be read before recreating grain state. Other built in storage saves only state snapshot without saving event log. Good thing is that there is flexible extensibility point allowing to write your own provider by implementing just 2 methods for reading and writing events. There is also a community contribution available which integrates Orleans with Event Store but this database is not my favorite. Probably I'm complaining too much and should instead contribute by implementing events log storage based on Cassandra or CosmosDB, it does not look like a hard task, but the next topic is much harder - distributed transactions.

Distributed transactions

Creators of Orleans framework did a great job to formally describe frameworks semantics. You can have a look at how they implemented distributed transactions: here

The algorithm is very interesting but from practical point if view, what I miss is lack of support for transactional communication between JournaledGrains. Again, support for event sourcing pattern seems to have been not a top priority in Orleans so far.

I you would like to jump deeper into other theoretical aspects of actor-based architecture, you may be interested in other Microsoft Research materials: https://www.microsoft.com/en-us/research/project/orleans-virtual-actors/

Message delivery

Orleans can give you one of the guarantees:

message will be delivered at most once
message will be delivered at least once

There is not guarantee to deliver the message exactly once. We are in distributed system and this problem is not easy to solve without sacrificing performance. This is something to be aware of. It's up to you how to introduce fault tolerance.

Orleans and microservices

You can think of Orleans as of microservices framework. The services are really micro. Each grain is a service. You probably cannot go more micro with microservices than in actor-based architecture. If you are building a microservices-based system, please have a look at Orleans docs and ask yourself an honest question: have you thought about all that problems that Orleans addresses and solves when building your microservices solution? We often make shortcuts through mud and bush because we do not even know that there is a better way. Please have a look at this presentation to illustrate some examples:

https://www.slideshare.net/JontheBeach/distributed-transactions-are-dead-long-live-distributed-transaction

Summary

I'm very grateful for all contributors who put Orleans into existence because it provides decent ground for building well-defined actor based architecture. Even if this model is not suitable for your needs, Orleans is very educational. Making a deep dive into its architecture and implementation can broaden architectural horizons a lot.

But on the other hand in my opinion you have to be prepared to make quite a lot of custom extensions / contributions on a framework level to build production-class system. There is an interesting initiative called Microdot framework which adds to Orleans many must-have features when building real system. But even with Microdot, this ecosystem looks more like an academical research rather than a shiny framework ready to ship to production out-of-the box. For everyone looking for something more mature with bigger support I would recommend to look at Azure Service Fabric.

But forgetting about production and enterprise readiness, programming model in Orleans is sweet. APIs are well designed and framework offers many extensions points to play with. Worth trying before signing-up for a cloud solution.

Azure Monitor (aka Application Insights)

February 2, 2019 · 2 min read

So far I was using NewRelic for .NET applications monitoring on production and it is a great product. It has everything that we could expect from APM tool: browser and backend code instrumentation, alerts, error stack traces, requests performance statistics, CPU, memory and disk usage; even showing SQL queries sent to database and Redis query statistics.

Recently I've put some effort to evaluate Azure Monitor as an alternative. I was using it before for basic use cases of monitoring Azure resources but I've never explored it's full capabilities. And that capabilities are enormous!

With NewRelic I was using ELK (Elastic, Logstash , Kibana) as a complementary tool to gather custom application-specific logs and metrics. With Azure Monitor I don't see such need anymore. When hosting applications in Azure, Azure Monitor already covers all the functionalities of both New Relic and ELK in one box.

What I like most about Azure Monitor:

Integrates seamlessly with Azure cloud to provide host-level metrics
Provides insights into containers running in Azure Kubernetes Service
Runs on powerful Azure Data Explorer engine which allows to analyze data in various formats in a consistent way
Makes it easy to define custom metrics
Supports advanced alerts and automation based on log queries and metrics
Easily integrates with .NET Core applications
Rich visualization tools including ability to export data to Power BI
... and yes, it provides exception traces, code profiling and web request performance statistics

Apache Ignite as an alternative to Redis cache

January 29, 2019 · 3 min read

Introduction to Redis

I am quite a big fan of Redis as a distributed in-memory cache. It also acts good as a session storage.

There is a network penalty to communicate with Redis service, so as with talking to database you cannot be too chatty. It's much better to ask for multiple keys in a single request at the beginning of your logic to quickly get all necessary data at hand. But reading the values from Redis should be much quicker than from database. First of all it's a simple key-value store so it's like always reading by primary key . Secondly we benefit from having everything in RAM. It is also possible to run Redis in persistent mode but that's a different use case, when you may not use an SQL database at all.

Cache-aside pattern

RAM is usually limited and cannot store all the data we have. Especially that in Redis you will usually introduce quite a lot of redundancy to keep as much as possible in a single key. Limited memory space is easily solvable by applying cache-aside pattern.

Updating data in Redis

More difficult problem to solve is refreshing data in Redis when something changes. One solution is to set expiration date after specific time but your users would not be happy. We all live in a real-time instantly updated world. Delay by design? It does not look good. So what is left is to remove old data from Redis as soon as it was changed. First of all you need to identify all the places in your system where given piece of information is modified. In a big legacy system that may be a challenge. If you are more lucky your system may have proper event sourcing implementation allowing for easy change detection by just listening on events. OK so we know that given entity has changed, which keys to remove from Redis now? It is handy if your code is able to generate all the Redis keys under which data from given entity is stored and delete them in a single Redis call. For batch updates you may consider using scan operation for keys pattern-matching.

Updating data in Apache Ignite

Apache Ignite is easier to introduce as a cache layer in a system with SQL database because it supports SQL and Read-through/Write-through pattern. There is out-of-the-box integration with Entity Framework: https://apacheignite-net.readme.io/docs/entity-framework-second-level-cache Unfortunately no version for EF Core is available.

Conclusion

If you use EF >= 6.1 < 7 and would like to introduce distributed cache or you are already fighting with not-updated-cache bugs every week, consider using Apache-Ignite.

How to make password reset link more secure?

October 2, 2018 · One min read

Sensitive data should not be stored in URLs. A lot has been written about that. URLs are logged in a lot of different servers through which the HTTP request is travelling (web server, SMTP servers, proxies, browser history etc) and sensitive data is stored there.

But there are situations when avoiding having access token in URL is difficult, for example in a password reset link which is sent by email.

In that case we can add more security by implementing a following pattern: 1. Action which handles reset password reads token from GET parameter 2. Token is validated and stored in user session or cookie 3. User is automatically redirected to password reset action which does not have access token in GET parameter anymore. It could be even the same action. After the redirect we don't have the access token anymore in URL.

Note that if we have any external link on our password reset page (e.g. social media), all GET parameters would be also accessible from that 3rd party servers as a HTTP-Referrer request header after the user follows a link.

Also remember to add expiration time to password reset links and make it single-use.

How to blog with a baby?

October 1, 2018 · One min read

When you are taking care of a toddler who is just trying to walk, you need to write fast or your articles need to be short. Thank you.

The cost of transparent recruitment process

October 1, 2018 · 2 min read

Transparency in organizations has tremendous benefits. But it doesn't come for free. Let's take recruitment process as an example.

If there are only 2 autonomous decision makers, then the process is simple. They screen CV, meet the candidate, have conversation, check the tasks that the candidate was assigned, may have a quick follow-up to and that's it. Usually this is enough to make the decision. Feedback can be given to candidate even immediately if recruiters are experienced. In case of 1 decision maker it is even simpler, but it's usually good to have at least one more opinion.

But what if we would like to make that process transparent? Score the candidates by measuring them somehow to show where the decision comes from? It may be easy to measure A/B/C test score, financial expectations and years of experience, but not everything is so quantitative. Measuring personality, value of the experience for the company, attitude, personal growth potential or some creative task may be tricky. It may require to create complicated recruitment process and metrics to justify the final decision transparently.

What are your opinions? Is it worth to invest in transparency in this case? Or it's better to trust the decision makers and just have informal notes about the candidates? Have you ever wondered what should be the proper balance?

Good and bad technical debt

September 30, 2018 · 5 min read

Albert Einstein said that everything should be made as simple as possible but not simpler. Simplicity is the holy grail of IT and also of business. It needs smart thinking and often experience to make complex things simple. But how about the other situation, when things are simpler than they should? In that case we are creating technical debt.

In finance debt is not always bad. When debt contributes to generating higher income and cost of interest is lower than that income, then the debt is healthy. For example when company makes a loan to modernize its equipment to be more competitive and generate bigger revenue - then we consider that loan as a good investment. Of course there is always risk involved and we usually cannot be sure about the future revenue but that's another story.

How about IT projects? Can technical debt be also good?

Examples of good technical debt

Debt is a shortcut to generate revenue quicker. IT projects are no different. If a quick and dirty implementation will be enough to acquire the customer which will bring income - debt is acceptable. It is better to have quick and dirty implementation which generates revenue than missed deadline and lost deal. However, income should be high enough to pay the debt later. Introducing big technical debt to have little income is probably not a good move.

An example of good debt may be building a prototype. Prototype is something that we should afford to throw away when it turns out that the idea is not worth continuing or there is a better approach to continue it. Prototypes are great for demos and idea testing. However, when the prototype is approved we should keep in mind that it will usually require a significant refactoring or even building from the scratch before it becomes the final product. In other words - the debt will need to be paid back. Good software development practices may help to reuse significant part of the prototype in the final product.

Another example of a good debt may be hardcoding some logic. Hardcoding is always a concern for software developers because it means that the created solution is not flexible and may need more work in the future to introduce changes. But over-engineering is as bad as over-investing. When we don't need flexibility, let's not introduce it just in case. Postpone any work as long as possible. It may require some rework in the future but we may also have bigger budget in the future. Paradox here is that what means debt for technical team may be savings for business team. To avoid technical debt in codebase, business may need to make real debt to cover development costs. Usually it is better to have debt in codebase rather than on bank account. Good software architecture will allow to pay technical debt much easier that to pay real money back to the lender.

Debt management

Technical debt as any other debt has to be manageable. Stakeholders and the team need to be aware where the debt exists, how big it is and what is the interest rate. Interest from technical debt is paid by having lower productivity. Teams spend time on investigating how system really works, fixing bugs, manual configuration and manual testing. The more time is needed for those activities, the higher is the interest rate. It is really bad when the cost of this time is not covered by the revenue that the product generates. But even when that cost is covered, companies should be paying back technical debt systematically to have proper level of productivity which allows to stay competitive.

How to manage technical debt?

1. Measuring how much time is spent on system maintenance.

Management teams are often not aware how much time is spent in this area and therefore don't have numbers to understand how much it costs to pay the interest for technical debt.

2. Using tools to measure code quality and automated tests coverage.

That tools are usually easy to integrate with development pipeline and help to identify technical debt. They provide deep insights into codebase but are not saying if the identified debt is good or bad.

3. Mapping reported issues to specific areas of the codebase.

It helps to identify which parts of the system generate highest interest for technical debt. Some parts of the system may have poor code quality but there are no issues in terms of maintenance. It is like having 0% interest loan. Other parts may have great code quality and 100% test coverage but they are generating a lot of issues because of wrong logic, missing requirements or lots of manual actions involved. It shows that interest can be paid not only because of poor codebase but also because of poor requirements specification and business analysis. Usually it goes together - when requirements are messy then the source code becomes messy.

Sumamry

What are your opinions? Do you agree that technical debt can be good? Can you give any more examples of good technical debt and how to manage technical debt?

Introduction to Redis​

Cache-aside pattern​

Updating data in Redis​

Updating data in Apache Ignite​

Conclusion​

Introduction to Redis

Cache-aside pattern

Updating data in Redis

Updating data in Apache Ignite

Conclusion