Design real-time platform in microservices ecosystem

Microservices here, Microservices there… There a lot of knowledge sharing and best practices out there.

On this post, I will share my real experience using microservices under real-time requirements, that add significant challenges to a distributed architecture.

What would be the definition of “real-time”? I consider real-time as a timeline of request/response that takes no longer than 50ms. Indeed – that’s not real-time it’s closer to “near real time.” But, it’s still gotta be fast enough. Add to this million requests per second – now we are talking eh?

Challenges required from us: 1. Low Latency 2. High Scale. When you combine those two requirements together you understand that you are in trouble. Why? because most solutions out there (Queue providers, Frameworks, Databases, etc..) focus on low latency response or high scale. Having both together creates more challenges. I will add one more requirement to this equation – 3. Persistence. Every transaction must be tracked and handled whether it’s a success, failure it can not be lost

Let’s assume you have an API gateway service that receives HTTP requests. Those requests are processed and communicated internally between multiple services. Once the “magic” has done a response shall be returned to the invoker as soon as possible.

First challenge -> Being fast. The system must response asap. Think about that – you get a request and response must be immediate within dozens of milli’s else you’ll be timed out. That’s a feeling of someone is constantly chasing you with a knife and never let go. Our solution? We separate the flow in the system into two flows: Real-time flow and Offline-flow (will be detailed later on). The real time flow designed to be fast using non blocking frameworks and databases like Redis and an optimized queuing system to dispatch the requests as soon as possible. Additionally we leverage an asynchronous frameworks to keep the flow non-blocking or delaying any process. When we detect such incident we act immediately to release the bottleneck or re-design it

Marking messages – Everytime we process a request each service “mark” to itself that message was processed this give us Idempotence capabilities (usages will be detailed laten on)

We are measuring our real-time flow all the time. We have CI/CD system, which make our life easier, that deploy versions to separate LTE environnement to spot latency bottlenecks. We do measure constantly

We taking business decisions together with our customers to determine what should be on the real-time flow and what shouldn’t. For example if a transaction need to be cancelled (for any reason) does the customer want it to be cancelled right away or its OK if we will let them know after processing the events via offline reports? Business decisions can help a lot to minimize complexity – always listen to your project demands. Ref to other post which I discuss this topic at:

Second challenge -> High Scale. We designed each of our services to scale from day one by keeping them stateless and recoverable. We created a backoff mechanism( details out of this post scope) which is tolerated to high-peaks. When service starting to feel in “trouble” it will slow down its events consuming in order not to die. Once Max threshold is reached an alert will be invoked and an auto-scaling process will create new instance right away. Once the backoff mechanism feels – “Back to normal” it will turn-on the service to consume messages normally.

Third Challenge -> Persistent. I wrote previously on our Real-time flow now I will detail the Offline one. The offline flow is more “Relaxed”. It is responsible to persist the data into different data-sources to ensure each transaction is persisted. Relaxed in our perspective meaning that messages can be consumed slowly in bigger batches when no timeout is “chasing them”. We tuned our kafka topics to be less “real-time” on that flow to maximize our resources. We are using different data-sources to denormalize our data for report and statistic queries. Additionally the offline flow persist each transaction data to allow us future cancellations and status checks. The separation of those flows give us more flexibility to handle events by it’s specific requirement

Inter-Communications. Our services based on Choreography architecture which enable each service to be an event subscriber. Events are flowing in the system and each service consume the events it was targeted to. This enabling us to easily maintain old and new business logic and additionally do easy modifications by versioning the topics and the events themself. Iam fan of small decoupled concern-domain services. I always encourse to split and merge services as the business evolve. I believe architecture is something that growing up with the business and must constantly maintained. This enabling us flex architecture that speed things up as there are are no ‘blocking’ goddess services around.

We have event expiration mechanism. There is no usage to process an event which is “too-old” and is candidate for being timed out. In case event is considered as timed-out it will forwarded to an error processing logic.

Supervising transactions. Since I mentioned that each of our transactions must be persisted we need to supervise them(and still to ensure we don’t add additional latency). In the real world events can get lost. Why? Consumer does not function as it suppose to, The queue provider is down or lagged, Machine dying, etc. That’s why we must supervise each transaction using an external component but still keep the architecture flex enough not making it as a bottleneck. How we do that? every time a new request coming up we send “copy” to our almighty Supervisor Service. the supervisor will listen to “Start” and “End” transactions events. in case a Start was consumed but never ended the supervisor tracker logic will recognize such scenario and notify the system: “Hey transaction XYZ never ended and will be considered as a candidate for a time-out transaction. Then we take an action -> Cancel or retry depends on the scenario.

Retrying transactions. Retrying is complicated topic. Things you need to consider: When stop retrying? when is it worth to retry? How to distinct between “retry-able” events and not, what happens if we re-try an already completed state (Idempotency) now lets add into this the technology barrier it is a challenge. We created libraries that enable each service to retry itself. We pushed this library as part of our infrastructure which enable each service to be focused on its business logic and have this capability out of the box.  We believe in the method: “Each service is on it’s own”. Meaning I care for myself none else need to. Service understands when it shall retry, how many times and when to give up. Additionally service knows to rollback to its previous state but then what it means in respect to the other services and the transaction? This is a broad topic which I will detail on separate post.

So we talked about real-time platform challenges and a way to solve them. Also added more tips how to treat failures and challenging scenarios within microservice echo-system. Each topic is broad by itself but I hope this could give you jumpstart when you face the challenges. I will be happy to answer questions in comments

Next posts will demonstrate in details how we wrote our re-try libraries and the backoff mechanism leveraging our consumers methodology to keep things fast and tolerate to failures .



Your Project Rolls out

Rolling out is a topic I have always wanted to talk about. Especially focusing on when you roll out the first company project. Although there is no difference whether you’re startup or not. When it comes to rolling out a project you need to plan it carefully and make sure you keep the pace with your time to market deadline.


Timeline should be the first thing you should keep on mind towards the rollout. It will give you an indication what should be done and how effectively you going to use your resources to get it on time.

There is no second chance for being late to the rollout deadline.  If you’re not focused it won’t get close to the finish line. When your business depends on that (e.g your first customer), it could be deadly for your company’s future.

Finalize your rollout features

Before rolling out, make sure all features which have been previously agreed upon about  the product going are settled. Make sure all stakeholders know the finalized version content before the rollout. That will enable you to estimate your rollout plan. This isn’t the time to start changing things, unless they are real rollout breakers.

Is it necessary for the rollout?

Being a perfectionist before your rollout can play against you. You should pay attention to avoid those thoughts right before going live:  “We can optimize this logic,” “We can squeeze that feature,” “Lets switch to that technology”, etc.

You must avoid this by all means. Always ask yourself, “Is what am I doing necessary for rollout?” New technology, or any new feature that isn’t necessary should backlogged but for sure shouldn’t be included in the rollout tasks. You must keep things simple and targeted. During the rollout period every discussion with my team regarding a feature or new implementation focuses on asking this question over and over again: “Is it necessary for rollout?” the answer always puts us back on track. If it is not necessary -> backlog !

Simulate day one

It’s important to understand that despite all your planning, after going live you’ll be exposed to new surprises. Mocking real user’s behavior is almost impossible. Things that we haven’t thought about or didn’t expect have the  tendency to arise on production. Therefore, you must simulate real usage with your target audience right before going live. Force your team, management, and anyone else in the company to use the App/Service as much as possible before you go live. You’ll be surprised by the new things you’ll find. Things like no-connectivity, half-baked requests, weird UI behaviour and so on. Make sure you are semi-live some time before the first users enter to catch issues as early as possible.

Production support plan

So you are new startup or new big project within a stable company.  Your baby is coming alive and you need to support it. If something goes wrong(e.g production incidents) who is taking responsibility? How will you support real-time issues? If your project is already based on big company this will probably be easier, as big companies already have their NOC team or any other production support tiers.

So what happens with new startups??

As always when you have to do something new and you are not sure how to tackle it I pick the “lean” way. Start small and improve on time and by demand.

Let’s assume your team is built by 5-8 people, you’ll need to take shifts. (I am already assuming that you have some monitoring and alerting system – if not, that’s out of scope of this post). Weekly shifts could be a good start. Each person has their turn to be the one who will accept incident issues after working-hours. The best way to manage this is to use 3rd party services like PagerDuty, OpsGenie, etc. You have no time to build this type of management system – better pay a few bucks instead. Make sure you fine tuned your incident alerts to be effective and minimize false alarms.

Incident day simulation

What I like to do before rollout is to plan an “incident day” with the team. On that day we mess with the system in order to practice real incidents situations. Each team member is selected randomly and tries to fix the issue. For example, we take the DB down which would raise up incident issues. the team members will practice how to investigate and react to these issues. That will help them to experience on ‘dry-mode’ in preparation for real time incidents.

Production is a holy place

I am sure that before your rollout you will use production for other matters (Testings, Demos, etc.) Not any more! Production will be dedicated to real users only. No more testings or experiments on it. If you have customers testing your system move them into other env(Demo env). Make sure you clean-up everything including old users and start fresh. Additionally, make sure production has its own infrastructure and own resources. Don’t share it with other env’s it will bite you. For example, if you using ELK stack, create separate instances just for prod. Last thing you need is to have log issues because someone on other env overloaded your logging system. I would do the same for any other infra matters(Queues, storage, etc..)

Rolling out is stressful stage but it can also be a lot of fun, so remember to enjoy 🙂


Bootstrap your startup – Part 2 (“Get your engines started”)

In my previous post I discussed preparing and getting ready for your “first day.” Today, we are going to get our hands dirty and move towards basic infra and the team planning that will continue during the rollout phase and beyond. In this stage I assume you already have a solid MVP spec and already have an idea about how you are going to structure your tight-knit, dynamic, hands-on team.

1. Know your enemy
Every startup has an “enemy.” The enemy can change its form. It could be new technology that doesn’t exist, real tough integrations points, a very challenging algorithm which will take you months to nail down on top of making it scalable, or It could look like a lot of other things. Once you are able to identify your specific enemy you are moving in the right direction. Your design and architecture are going to be planned around this. It will assist you to minimize surprises. You better understand it early, as your architecture and time estimation depended on it.

2. Architecture, Architecture

Architecture is the first step towards your shiny system. I won’t speak about how to do it in this post but, I can point out who you should be sharing it with. Which is, practically – everyone! Once you finish drawing it make sure you put everyone in one room for few hours and start discussing it. It doesn’t matter if your crowd doesn’t have technology background because architecture isn’t about technology, it’s about design and flow. Talking technology and patterns on this step is all wrong. Once you speak about your design, you’ll be surprised about the feedback. Even better, you will realize new things you didn’t think about. Get ready for tough times as those discussions are intense and sometimes get aggressive but that’s the fun part! Share it with colleges, with other friends, and stakeholders. Your company need to know how your overall plan is going to look like.

3. Budget planning

So you got your seed, cool! Now let’s spend some cash! Who wants new chair? Who wants a big office? not me! Remember this – Your money is running out. You woke up this morning and your money is running out. You went to lunch, your money is running out. I’m not trying to scare you or give you paranoia, there is an illusion that if you raise $1M you have all the time in the world, you don’t. The money is running out and you’re not a profitable company (not yet anyways). Be aware of it. Now, let’s be practical about how to be aware. First, know your main expense, which is your development team and…heh that’s practically 80% of your overall budget consumers. Lawyers, graphics, office expenses, and Nespresso cups won’t affect your budget as much as your development team. Let’s drill down regarding how the development team will consume your cache:

a. Developers
Your team will consume salaries. We’re talking about the monthly salary of developer. It could be Backend, Devops, frontend, Fullstack- it won’t be cheap. I’ll keep this content for dedicated topic but I could add offshore and outsource recruits. Anyhow, developers are the building blocks of your company and without them you won’t be able to do much.You better keep in mind that most of the budget is going to be spent on developers and it should. They are going to set the company DNA and initial success.

b. Third party services
I assume you are going to use Cloud services, at least for Dev purposes. AWS? Google? Doesn’t matter. They going to consume the most. It looks cheap it feels cheap, but hell no, once you eat with them you are busted. Best you can do is to monitor it. You upgraded your machines from small to medium. Pay attention to this fact. You are working on microservices, and you think you were able to get away with them because everything is dockerized – think again. AWS does not charge you by machine any more, they charge you by traffic and time. So, it doesn’t really matter how you spread your services. Adopt a new habit that once a week look at your monthly payment prediction (very easy with AWS) and look at where your payments are going. You will be very surprised to find some unused/oversized machines, unused load-balancers and VPC’s that nobody uses. You should get rid of it all. Dont think on production yet. Use the smallest machines and once you get to production I assume you will have client on sight. That is when you should consider upgrading your overall system. Until this happens be humble and keep your spending under control.


4. Development flow planning

There is bad habit for startups to work in ‘Jungle-Mode’. I don’t understand this. At the end of the day it will bite you and it’s going to hurt really bad. I believe that when things getting rough you should have a System! Yes – A System! One that everyone follows. A system that will help you to make order in times of chaos.
I am a big fan of flows. Flows have starting and ending points. Assume you have 3-4 developers in your team, you should create a Release Scope Routine. Call it agile, call it scoping, call it whatever you want-Just make sure you got it. I personally like short sprints and fast deliveries.
Here is an example from my work:
My personal tool for small teams (3-20 members) is Trello. It’s simple, you don’t need to configure much and it’s very effective once you get a good grip on it. You can read previous post of mine how I am using Trello.
All tasks should take 4 hours. Everyone should be aware that they need to deliver by the end of the sprint. If a task takes longer it should raise ‘red-flag’. You have two options: Divide your task into another one or add another pair of eyes. Don’t wait for the end of the race if you won’t be able to make it to the finish line announce asap. I alway instruct my teams – “Don’t wait for the last minute.” Last minute means pressure and chaos. If we have enough lead time we can re-plan the sprint or change the priorities to help each individual succeed. In the end of each sprint we are ready to plan the next one. Remember, short deliveries and small winnings are the key for showing results. The best part is the feedback, if you deliver in small parts you get feedback sooner and better insights into your work.

5. Infra

Infra is very wide topic. So instead of talking about how to do it, I’ll speak about what you should focus on, especially for new startups or teams. When I say infra I am not referring to the technology you are going to use or the programming language. I’m talking about the sandbox your team is going to work on. This sandbox should be planned as soon as possible. Believe me, once it’s setup you don’t touch it. In many cases this sandbox is with you for a very long time and it works well because of the simplicity. Let’s get down into the details: As I mention in my last topic (Development flow planning) I am a fan of fast deliveries and helping your team getting there.
Your team is writing the code that needs to be deployed somewhere (Dev env, stage env, etc..) Create CI/CD as soon as possible. In short, automate your teams development process. Automation could be very wide, but let’s stay focused. I don’t want to automate everything, I’m talking about basic infra.

Flow: Developer code followed by pushing his/her code into repository, under the hood the build being compiled tested and deployed. That’s it. Start from there and you already got 70% of your automation. Keep it simple, clean and work with your favorite tools (I personally like Jenkins). Once the version got into dev env it’s already on-air.
The second point of Infra I want to mention is alerts. It’s important to alert and update your team for specific events. Alerting the development team is essential if the building process fails. Updating the development team is essential for new deployments. You can use Slack, Email whatever, just make it happen. Everyone should know and be aware what’s going on in dev and share the same methodology. Don’t be aggressive with automation. Remember, you need basic things that won’t block your team while continuing to achieve your target. There is a great Russian proverb I like: “Don’t try to get into hell ahead of [your] father”. First things first: Full-fill your MVP. Once you get customers that’s a new story.

6. Recruiting

Recruiting is one of the stages you want to get over with but, it never ends. It’s a long process that always consumes efforts. You need your spartans with you for the winning. Don’t compromise. This process is challenging. But again, If you stick to a system and have pre-ordered steps you can make things go effectively. Spend 1 hour a day for gathering candidates. They are the building blocks for your future teams. The DNA of your company is determined by the first 2-7 people.
This is my recruiting flow:
Phone Interview, Personal interview, Architecture overview, Hands-on code exercise, Second interview with another colleague from your company and finally Contract proposal.
Before you start interviewing set your expectations. If you are looking for a Junior Developer know what skills you are after in your candidate. Things like growth potential or personal intelligence. If it’s Senior you should look for more experience.
A few words about working remotely. It depends on your current startup status and your current team size. If you have already recruited your core team you can start looking for remote developers offshore. Personally, in the beginning, I like my teams close to me physically. There is nothing more effective than discussing work matters in person. Especially when your project has just started. When new team members are working remotely it is challenging to create the personal connections and communication necessary. Perhaps in my future posts I will discuss how to manage remote developers and techniques to keeping them effective.
In my next post I will speak about which technologies you should keep in-house and which you should take from 3rd party provides. I will also talk about the importance of keeping your company product in loop, how to make your teams independent and how to start planning your Rollout towards your upcoming customer.

Bootstrap your startup – Part 1 (The “First Day”)

Bootstrap your startup – Part 1 (The “First Day”)
During the earliest stages of a business, a “startup” business, do you ever wonder what would it take from you, as the initial technological authority in the the company.
It’s always exciting to be a part of something new, especially when you start from scratch. It is an amazing feeling.
You are responsible for the care of a new baby – your company. You are the first who will pave the path for the journey. You set the DNA, the processes and procedures and the culture. Unfortunately, it’s not going to be easy. It comes with a price, responsibility. The company now relies on you and it’s destiny depends on you. You need to understand that every step you choose along the way could lead the company in a different direction.
I had the opportunity to be the ‘First One’. I learned from my mistakes and successes. The bad news is that every time you start over it doesn’t get easier. The good news is, it’s exciting and challenging as this process begins again.
So let’s try to imagine your first day: you wake up in the morning, step into the office, warehouse or wherever your “base” is, and need to give birth to our product and turn it into something real. You make yourself a cup of coffee (if you are lucky to have some) sit in front of your laptop and… Now what??
So many things to do, so many plans, ideas, directions. Where to start? Infra? Product- spec? Teams? Servers? To cloud or not to cloud? CI/CD now or later? Services? Microservices? Recruiters? Where? Linkedin? Private?  Maybe Outsource? Now? Later?  Which teams? Local team? Offshore team? How to structure? Documents Now? Later? Office? To rent now? Later? VPN? Technology? AWS? Google cloud? which Gear? .@#$#@@@@#@^&&$$
First – Relax! Will you?
I will create series of posts, starting with this one, which will provide information and guidance on how to deal with your upcoming “baby” and will share my experience on navigating different aspects of Technology, Infrastructure, Architecture, and Management.
You must keep this phrase in mind:
Eat or be Eaten!
Prepare, prepare, prepare. Stepping into the office on your first day without doing any preparations you have already started it all wrong. So you won’t get paid for the period before your contract signature or perhaps your contract date hasn’t even started – but hey.. You chose a startup. This is your business and you will probably get some equities. Start to think this way or leave now – because Wake up and smell the coffee, startups are not for you. You need to have the willingness to give a lot. That’s the game.
So what do you need to start preparing?
First day preparation points:
1. Business requirements
First, make sure that you understand the product. You don’t really have any product during this stage but, you do have some business clues about where the company is heading. (I’m skipping all the company idea and seed steps as it’s out of this post’s scope).
Make sure you understand the business requirements that you need to fill. If it’s a new industry you aren’t familiar with, take some time to learn it. Search the web for related content and ask your stakeholders for meetings to make sure you understand the environment.
Understand the product integration points. Know if the product is SaaS or PaaS. Perhaps you need to expose API and/or require an integration with future customers. All those factors will influence your timeline and your MVP definitions(upcoming section).
2. MVP
Without MVP definitions you are a dead man, seriously. MVP spec is the light at the end of the tunnel. It’s the set of definitions which you will need to meet for the short-term. I assume you set your development budget. Usually after seed (not first round) you are expected to fill up some MVP requirements before going to recruit another round of budgeting. Pay attention so that everybody agrees on the spec and make sure expectations are set. You must keep it flexible and define the baseline spec.
Focus, focus, focus! It’s very easy to get lost with scope. If you haven’t set baseline you (or your product) didn’t finalize the targets yet. Iterate on that process till it’s polished. It’s your responsibility to demand a nice and shiny MVP draft to work on.
3. Team structure planning
Plan your team structure to fill up your MVP. If you went through my earlier points, at this stage you have good insights about your product and it’s business requirements.
Now you have a clue on how to structure your team. If your product is web application based you need a web development guy or perhaps a full-stacker for the UI and backend. If your product concentrates on backend logic, start recruiting a backend guy and later add devops.
Pick an easy programming language. Don’t be too clever or the coolest guy on this stage. You need to move and move fast. Start with the language that you feel comfortable using. Keep the design flex, it will be easy to change or add new languages in time (who said microservices architecture?:))
Team structure examples:
Backend project: You + backend + Devops/IT(only towards productions)
Web application project: You + Frontend/Fullstracker
Start with a small crew and recruit based on your needs and the company progress.
*Sometimes it’s worth it to outsource some parts of your team at the very beginning. It really depends on your MVP requirements. It’s different from one project to another (outsourcing – when? who? – is out of the scope for this post)
4. Everybody codes
Make sure everyone in the office that needs to code is coding. Including yourself. Each part of the current company must code or contribute to the dev team. Assign tasks to everyone (Programming, Testing, Documentation, etc..). Otherwise, it’s a waste of resources.
Now that we have things ready you can wake up, drink your coffee, and be ready to start your first day!
In future posts  I’ll talk about preparing development infrastructure, keeping an eye on your dev budget, defining the development process to move faster and giving you a few tips on recruiting your next employees.

How I'am using Trello to keep dev effectively on track

Is Your team Distributed? You got short deliver times? Are you a Startup or having sub-project of 1-30 people? Here is how I hacked Trello to be my “Project-Saver”.
Let’s take an average team structure: Client, Backend, Product
Basic project requirements for success:

1. Simple progress structure

2. Tasks Assigning

3. Time coordinate

4. Keep everyone in loop

5. Minimus Bureaucratic hassle

I have tried different tools on different projects. I got the conclusion that in the end of the day having a Kanban board structure is very simple and efficient to boost productivity of small-mid dev teams .
However you still need to do several ‘hacks’ for your Kanban in order to squeeze good productivity results.
I will demonstrate each of the steps above by adding an issue and how I hack the solution.
Why Trello?
Keep it simple is my golden rule and thats What I liked in Trello. It provides an outstanding simplicity.
This is not a Kanban’s tools over-view post so ill just jump right to my favorite tool and demonstrate how I executed my targets using it.
1.  Simple progress structure:

Since we choosing the Kanban’s way it’s very easy to divide your board into 3-4 columns (dont add more than that else you might get it out of control)

Screen Shot 2016-05-07 at 10.45.21 AM

Backlog is my “To-Do” list. but dont abuse it. make sure you actually planning to do those items. maintain and keep it thin. The rest is pretty trivial.

Issue: Our team structure contains 3+ small teams. Maintaining the tasks of all teams on the same board will quickly overload and mix it – focus will be lost.

Hack: Create separate board for each team (e.g Client,Backend,Devops, Product). On Trello you can share cards between boards (for cross-team assignments).

This way you’ll be able to focus each team on it’s dedicated board which containing the relevant items. Additionally, switching between boards in Trello is very easy and   will enable you to keep the team progress view very simple.

2. Tasks Assigning:

Each team member must be assigned to one task at a time. Make sure you never forget that. You achieve two things:

1. Each team-member is focused and know exactly what to do.

2. Its very easy from a bird’s eye view to understand what your colleges do. The fact that you know if you cross someone’s task will boost productivity (gather requirements) and self-aware.

issue: What if more than one person is assigned to do the same task?

hack: Split the item into two or more (assign each team-member) and detail each item with relevant sub-tasks. Using Trello you can define subtasks using checklists:

Screen Shot 2016-05-07 at 11.16.17 AM

3. Time coordinate:

This one is very important. Make sure every task has ETA. Dont have tasks hanging without time estimation. It will keep you and your team focused. This one is a bit tricky since you need to understand ETA’S quickly. During time team will adopt and get used to it.

Screen Shot 2016-05-07 at 8.01.07 PM

issue: What if we cant estimate the ETA quickly enough? (Shall we create item to define Item’s ETA’s?:) )

hack: If you struggling to ETA your task you probably assigned wider task for ‘eta’ing. This would be red alert. the whole point is to assign small tasks. If you or your team member cannot give ETA quickly enough you must re-think about the task and probably re-define or split it.

4. Keep everyone in loop:

If you follow the previous steps you get this one ‘on the house’. Since all members assigned to tasks with their ETA’S it’s very easy to check each other status and understand your current release status.

issue: What if two members not on the same board? will be annoying to switch between boards.

hack: Short daily’s having Trello infront will keep everyone updated. Since each team member is assigned to one task with eta it’s very easy to quick brief everyone’s status.

5. Minimus Bureaucratic hassle

Thats the main point. We as develop team like get things done by go straight to work(probably code). However we still need to maintain minimal “progress-bar”.

Trello gives you the ability to quickly update your status and start working. Discipline is still required but we surely minimize the hassle around it.

Happy productivity,

Grab yourself a Graph

Here is a story about Graph Database and the motivation for choosing it. Firstly, I think it’s very important to understand if you need a Graph datasource. Not sure it’s right for you and ofcourse shouldnt go there if you just think it’s cool. Graph has it’s cost. Graph’s are common when your business requirement demands are social, relation-connections, dependencies between hops(will cover later), Transitive relations.
My Startup’s major requirement was to give the ability to connect between entities using relations and properties between them. Assume we want to create followers data model. Which means if user X follow user Y and user Y follow user Z -> X follows Z aswell. Let’s start simple: Relational:

We all aware to the standard solutions for “connection between entities” using relational datasource (One-To-Many, Many-To-One, etc..). If you go relational you have to start thinking about JOIN table that probably holds foreign keys of both participating tables which further increases join operation costs:

> SELECT * FROM followers;
| user_id     | follower_id  |
| 1           | 2            |
| 1           | 9            |
| 4           | 4            |
| 9           | 5            |

Now, Let’s have this business case: Is user 1 following user 5?

First we will need to find all followers of user 1(X) and then checking if that list(Y) contains user 5

After that we need to iterate on all the followers of the followers of user 1(Z) and check if they happen to be user 5.

What’s going to happen here when we double the size of this records? Index wont help you dude (Still gotta find the values in the Index tree).

Let’s try different datasource model..  Key, Value? here we go…

I tried to model our relations with Redis(Known as a fast key,value store).

Things actually started to look better than relational from performances aspect, even when we increased the number of records. BUT as soon as we added more than 2 hops to the equation than the nightmare starts.

Assume following Key->Value structure.  For example: A->B meaning A follows B

A-> B,D,C,T,R

B-> D,E


So in that example we can conclude that A follows B,D,C,T,R

But what happens if we do modifications to our values?  we need to make sure we iterate on all influenced keys and modify them correspondingly.

What if I want to have C follow K. So C’s new state:


That means we must update A aswell (because A follow C):

A-> B,D,C,T,R,K

B-> D,E


What about delete action? I found myself writing crazy algorithms how to handle all side use-cases. On that point it felt like we didn’t choose the right design. Maybe there is a convenient way to model it on key,value datas-source if you find that way please share:)

Graph for the rescue:

Graph datasources’s building blocks based on relations between entities from the very beginning. When we query the graph it looks only at nodes which are directly connected. (The power of this datasource relays on the ability to iterate and query second and third tier ring that connected to our nodes – so called Hops).

Screen Shot 2016-03-19 at 7.40.28 PM

As long as nodes are not connected (aka related) the search will never hit them unless you add additional relationships between them and the query node.

If we conclude our performances: When we stay on low number of nodes(1,000) probably you wont see the effectiveness of graph datasource solution but as soon as you going to have 1,000,000 records thats when Graph API shows it’s power against any other datasource that gotta provide a relational business requirement as that.

As a “bonus” section I will expose you our findings about different graphs solutions and implementations currently on the market: The market currently seem to have 3 major datasources solutions. we will start one by one:
1. TitanDB:

TitanDB considered as distributed graph database which has very good scaling capabilities. I POC that datasource.

As an open-source addicted first important thing for me is community and documentation. For some reason it felt like I couldn’t find one decent place with a proper “Lets start guide”. After a while I think I found out why. TitanDB is built on 3rd party solutions. For example the backend data could be built on Cassandra, Berkeley DB, Amazon’s Dynamo DB and some others. In addition the Indexing mechanism also built on 3rd party like Solr or Elasticsearch.

So it means that I actually need a Devops team behind me just to put this thing alive (and what about production optimizations?)

Overall: Less practical for startup teams. Hard to crush-start but looks promising from performances and scaling aspects.

2. Neo4j:

Neo4j is among us for many years now. I think it maybe the most popular Graph solution on the market. First, docs are very neat. Community is large. It has great language called Cypher. I managed to run Neo4j instance within 10 mins (Just clicked Next->Next->Next) including having my coffee.

Seems like we found our solution! but wait.. not everything is for free. First of all Neo has it’s limitations. it’s not scaling easily. You cant shard it like you could with TitanDB. If your project requires significant writes (compared to reads) you’ll need to super optimize and maybe re-design your architecture (but thats something we will keep for another POST). Another disadvantage is License cost. To enable Neo4j Enterprise Edition( Enabling HA, Clustering, etc..) you going to find this line: “Please call us for farther information” – which means you are in trouble. Found out that you have to pay insane amounts of money per year for licenses. Not good for a startup as well.

Overall: Very practical, Easy going, Expensive for startups when growing up, need to put some efforts to get better performances which are not out of the box.

3. OrientDB:

That would be the last one we tried. OrientDB also promising a distributed graph solutions. However it’s community is not wide as the other solutions. Seems like docs are find but for some reason I still sensed that this solution was less popular than others.

Overall: Easy get going, less popular, docs are proper in place, Distributed.

In the end I think it’s a great solution but it wont come up for free. We choose one of solutions I mentioned but thats up for another POST:) stay tuned for additional updates on my next posts about graph’s. Idan.

Business Oriented Programming: Your key for rapid development

I believe that the key to build a successful project with great time-to-market is first to understand the business requirements. Many developers have their first tendency to think on the technology or the architecture design. I cant blame them. Thats one of the most fun parts. But If one wont understand the project/company demands he will run out of focus quickly. The project’s scope complexity outlines the critical stage of the business understanding.
When I am talking about  “business” I refer to the project requirements and motiviation. It doesnt matter on which industry you working on: Forex, Gaming, Advertisements, Banking etc.. First you need to understand the motivation.  How we do that?
1. Meet all stakeholders:

This is your entry point. Get all the information you can get. schedule couple of meetings with marketing, product or even customers. Questions, Questions Question.. Ask alot of questions and make sure you understand the demands. Sometimes clients struggling expressing their demands it’s your job to listen, direct and focus.

Sometimes the knowledge is spread out over a couple of departments. After concentrating all the knowledge it will be easier to understand the expectations of the project.

2. Write everything down:

Document, Document, Document. If it isnt written down, It doesnt Exist.

One of my clients used inside jokes and called me “Technical Writer”. I am not ashamed doing that. We are code-drifters, someday someone going to take ownership on your code/project or maybe you’ll need to bring a dead project alive. When that day comes the other person will have better understanding and the company will benefit the outcome of short overlappings.

When you have everything written down (Google Docs, Wiki, Notes, etc.) You will avoid confusion and misunderstanding in the future.

3. Cut into phases:

Once you understand the project scope and your client demands it will be easier to predict your project phases.  You need to border your phases. When you understand your domain and the business targets it will be easier to scope your upcoming phases. Make sure each phase’s target is clear and defined. Schedule each phase and your(or your team) development process afterwards . Once you put clear borders to each phase you are ready to go into design and implementation.

4. Small winnings:

Time to market, Lean, Production ready – Those words repeated on every project. I believe the key for a successful projects is to deliver within small pieces.

Avoid this: “Give me X months and ill get it done”, “I understand what’s gotta be done ill contact you when we finish”. It’s not possible to understand in advanced what’s gotta be done.

Software development is a consistent build-up process. The requirements, resources and priorities changing frequently. You must persistently stay in contact with the project’s stakeholders and tolerate the changes. If you’ll be “blind” to the project progression you’ll find yourself(and your team) spending lots of re-writing or adding out-of-scope/not-relevant/not-prioritized features.

Deliver small pieces. It will keep you up with demands and minimize CR’S on early stages.

5. Program with business vision in mind:

The part of developing comes easier when you have the business vision in mind. You can design and predict the business components and reflect them straight on your code.

It will be easier for you to extend and maintain your code according to the requirements. Since you already understand what’s coming next it will allow you to build the right infrastructure and flexibility for a fast delivery time.

6. “It’s Alive”:

Go to production asap!

When Iam writing go to “production” I dont necessary mean that you should announce on your ready-to-market stage. I mean that you should have something “pumping”.

Remember its always easier to extend something working than waiting till everything works. I am big fan of lean-programming. Get what’s need to be done in order to execute your software asap into “production-mode”. Even if it’s not totally fulfills all the project demands It will give you great perspective of the roadmap. Beside of that your customer will be happy. It’s always great news to hear that something is actually alive and breathing.

7. “Ego-Free”:

Last but not least. If you dont understand stuff or if you struggle gathering business knowledge dont be shy. Put your Ego aside and do whatever it takes to seek after the knowledge you need. It might “cost” you to ask extra questions or speak with different factors in your organization. You better understand the demands on early stages than facing misunderstandings later on.

The process of understanding and building great software projects from business aspects down to technology and code proved me many successful projects.
Next time when you have the eager on early stages to start talking technology or patterns I recommend to put it aside for the right time and start with your business understanding.

Vertx and Spring Powerful couple for the Bigdata masses

The story of those two start this way:
My requirement was to have an app that will be able to handle thousands of concurrent connections requests for the first phase.
Now as an experienced approach the first things come on mind is:
Let’s bring up couple of Tomcat’s instances backed up by Spring having load-balancer before them and let the party begin.
But we know thats not a free lunch.  The costs, the maintenance, High availability and got damn it we do need lots of instances.. Next Idea.. let’s go the Reactor way.
Today we have NodeJS(and some others) as a very known impl for the reactor pattern. It’s ability to maintain thousands of concurrent connections simultaneously on one node is amazing!!
I love the JVM (athough I had great experience with NodeJS) and I wanted to try Vert.x.
Vertx is the Java’s “answer” to NodeJS backup by real cool features like “Workers” which easily offload time-consuming work from the main Event-looper (if you got no idea what I am talking about please read abit on Reactor pattern).
Ok enough talking.. some Hands-on:
Let’s run Vertx:
Import via gradle the Vertx libs:

compile  'io.vertx:vertx-core:3.0.0'

Run the server easily:

public class VertxHttpServerVerticle extends AbstractVerticle {
    private HttpServer httpServer = null;
    public void start() throws Exception {
        httpServer = vertx.createHttpServer();

We got a rocking HTTP server listening on 8080 ready to receive thousands of requests.
Now what? Let’s have our busniess logic inside.. The “hard” part. And here it begins.. Configurations, DAO’S, Data accessors, Dependencies.. Am I going now to event the wheel?
In vertx you have the Eventbus which assisting you to pass messages between Verticles. But if you go there you need to keep all your classes concrete. Even if we go there (Instead of having Dependency Injection) we still got other matters to think of.
My experience involved lots of Spring container apps. I am used to Spring because it stitching all-around for us. From Dependency injection to Configurations and lots of other production ready integration libs.
Wouldn’t be good idea to match those two?
That way we can benefit the power of Vertx by handling million of concurrent connections behind a simple Vert.x Cluster and and in addition avoiding the pain of a boilerplate code inside our lovely app which might include configurations, integrations or any other common stuff.
Let’s bring Spring inside:
Since all we need is spring-starter lib (Having Spring boot packed with us):

compile "org.springframework.boot:spring-boot-starter:${ver.spring}"

Now init vertx with Spring:

ApplicationContext context = new AnnotationConfigApplicationContext(SpringConfiguration.class);

//Get property:
int VertxHttpServerPort = Integer.valueOf(context.getEnvironment().getProperty("http.port"));

Now that we hold Spring context we can pass it thru our Verticles(The “beans” of Vertx)
Want to have Spring beans ready to be used in our Verticles? No prob:
Configure new Spring-Bean

public MyService myServieImpl() {
    return new MyServiceImpl();

All is left to pass the Context into our Verticle and retrieve the spring-bean inside:

vertx.deployVerticle(new VertxHttpServerVerticle(context))

There is abit more but I am sure you getting the idea!

How to use geolocation with Redis 3.2

Recently I was looking for a solution how to tell if couple of Longitudes, Latitudes actually in the same Radius.
So in the beginning I was thinking using Elasticsearch to store all my geo-locations and use their mechanism in order to search whether a point is within the requested Radius of other points.
Elastic-search is great product but I wasn’t sure if I wanted to maintain it only for our geolocation service.
Our project already using Redis.
p.s I assume you already familiar with Redis.
I found our that on the new Redis version I actually could use all geo-location features.
Redis like in Redis -> Everything goes fast!!
On their new upcoming version(not stable yet) Redis 3.2 new GEO API features is provided.
Redis is orginized with Geo-Sets (backed by sorted sets).
Let’s assume we have one set that is identified by a key, and that holds some members that are associated with geo locations:

GEOADD geoset 8.663877549.5282537 "location1" 8.379628148.9978127 "location2 "8.665351,49.553302 "location3"

Now let’s suppose that I want to check which of my entries is close to a specific geo-point 
Let’s assume my input geo-point is:

8.6582361, 49.5285495

And I want to have all my stored locations with-in 10 km Radius:
We do it this way:

GEORADIUS key 8.6582361, 49.5285495 10 km

 The result will return the key-names(Location1, Location2,..) which actually close by 10 km Radius
Cool eh? how does it work? Redis encode the longtitude, latitude to a digit using technique called geo-hash. They forming a unique 52 bit integer.
So for example to determine if two lat,lon points are close we can geo-hash them first:
Tel Aviv (32.0663, 34.7664) ->
Netanya (32.334, 34.8578) ->
Pay attention to the prefixes.
By that we can tell how close the points. The bigger the prefix the closer they are.
Happy Redis,

Protect your Micro-service architecture using Netflix-Hystrix and Spring-Boot

I am a fan of microservices architectures and In my last architecture design I wanted to give a shot to the recent Netflix project called Hystrix.
Hystrix is Netflix implementation for the circuit-breaker design.
My microservice architecture built by Spring Boot components. NetFlix and Spring has new project Spring Cloud Netflix.
In that project you can find various Netflix components built-in.
So when should I use Hystrix?
Let’s say you service has API calls. We all know that API calls could be risky. specially when we have flows that depends on them(Connection Timeout, Connections Hangs, various failures)
Now I am paranoid when it comes to failures. Sometimes I spend more time thinking about the failures than the invocation itself.
Hystrix actually helps me to make my app safer.
We do know how to deal with errors but Hystrix made it easier for us.
Bit of Hands on:
Let’s say we got this API call:
[code language=”java”]
public String invokeRemoteService(String input) {
//invoking remote Service
Now we probably going to make our code dirty by adding timeout to the connections, surrounding it with catch exceptions – maybe even forget to do any failure handling.
Hystrix giving us the ability to have a fallback method.
The fallback method can work with many modes(Async,Sync,observable, Invoke on the same/other Thread and more)
What’s nice here is that it force you to think within a failure-mode. So it’s not just the easy implementation but more like a methodology.
I am going to demonstrate the basic mode – the sync mode.
I am using Gradle. So first Added dependencies to your build.gradle file:
[code language=”java”]
compile ‘’
compile ‘’
Now Annotate your configuration with:
[code language=”java”]
public class Application {
public static void main(String[] args) {, args);
System.out.printf("Started My service app");
After that let’s decorate the invoking method with HystrixCommand annotation:
[code language=”java”]
@HystrixCommand(fallbackMethod = "defaultInvokeRemoteService")
public String invokeRemoteService(String input) {
//invoking remote Service
public String defaultInvokeRemoteService(String input) {
//raise alert
//getNextServer from Eureka and invoke the method
//return data from a local cache
Thats easy. All other modes adds abit more logic. But I am sure you’ll be just fine.
Let’s not forget Hystrix dashboard:
With that dashboard we can actually monitor all our api calls and see if any of them is actually a circuit breaker.
Hystrix will monitor your API’s and will decide (by default/custom criteria wither an API call is a circuit breaker or not)
There are additional features. I just demonstrated the main idea and the basic ones.