Sunday, November 29, 2020

Why I am starting to believe that time is an illusion

We have refined our understanding of gravity from being a force of nature that attracts 2 masses, to gravity being merely the curvature of space-time to gravity being interpreted as time slowing down in space.

The other important ongoing discovery has to do with dark matter and dark energy which constitutes around 95% of our observable universe. Commensurate with evolving insights of dark matter and energy, dark energy at least seems to be the omnipresent energy in the vacuum of space, spread out over such large spaces that its detection is faint due to its density being very less.

I imagine, that dark matter may also on similar lines differ from ordinary matter, in that, it is much less dense and spread out over the vast expanse of inter galactic spaces. The current tug of war between our expanding universe v/s gravity, could then be seen as a war of equilibrium between normal matter/energy vs dark matter/energy, and clearly normal matter is getting rarefied, on the overall scale of the universe.

Next, lets for a moment assume that the nature of reality(matter and energy), truly is made up of E8 crystals or any other unit of reality, for that matter. So ordinary matter could be nothing but "densely packed E8 crystals" and dark matter could be "very rarefied" E8 crystals. 

Time as we know it, is perceived by us, as beings made up of matter. Our perception of time primarily is from the point of view of how it appears in the "ordinary matter" portion of the universe. In the vacuum of space and in the dark matter part of the universe, is time just as relevant?

The perception of time is linked to causality, as experienced by us in the macro universe. 

Time in the macro universe might seem ubiquitous but at sub atomic and quantum scales, where wave nature, duality, uncertainty and randomness are the norm, does the linear flow of time, as in the macro universe even hold relevance?

Time needs a frame of reference and an observer, without these, being clear to us, over the expanse of the universe and dark matter, is time a relevant variable for study, of particle physics itself?

Now can you blame me for tending to think, that time might be an illusion of our perception of the macro universe.


-GG

Friday, August 21, 2020

Throttling Kafka Consumption - Dispelling Myths

Kafka as a messaging broker was designed for high throughput publish and subscribe. At times this very fact can prove to be an issue, when being used in more traditionally designed transactional systems. For example if an application is consuming messages from a kafka topic and inserting, updating in a relational database, the high rate of consumption of kafka messages, can put pressure on transactional persistence in the RDBMS and cause exceptions. Hence there arises a need to throttle message consumption from a kafka topic.

Starting out, lets assume, that we are only considering the message consumption inside a single instance of a consumer (which may be a part of a bigger consumer group). So we have a single consumer instance, picking up kafka messages from a single partition of a topic.

Most kafka client frameworks, will implement a decent kafka consumer poll-loop, for reading kafka messages from the partition as soon as possible and then hand these messages over to client application, as fast as possible, but without waiting/blocking for the client application to process each message.

For single thread technologies like node js, the above can be an issue,in the sense that, the result is that the application starts processing large number of messages in parallel. Many will resort to kafka configuration parameters like below, to control the rate of consumption of messages from kafka, with very little success, for the reasons I mention below:

fetch.min.bytes - This property allows a consumer to specify the minimum amount of data that it wants to receive from the broker when fetching records. If a broker receives a request for records from a consumer but the new records amount to fewer bytes than fetch.min.bytes, the broker will wait until more messages are available before sending the records back to the consumer. 

fetch.max.wait.ms - By setting fetch.min.bytes, you tell Kafka to wait until it has enough data to send before responding to the consumer.If you set fetch.max.wait.ms to 100 ms and fetch.min.bytes to 1 MB, Kafka will receive a fetch request from the consumer and will respond with data either when it has 1 MB of data to return or after 100 ms, whichever happens first.

max.partition.fetch.bytes - This property controls the maximum number of bytes the server will return per partition.

session.timeout.ms - The amount of time a consumer can be out of contact with the brokers while still considered alive defaults to 10 seconds.

max.poll.records - This controls the maximum number of records that a single call to poll() will return.

receive.buffer.bytes and send.buffer.bytes - These are the sizes of the TCP send and receive buffers used by the sockets when writing and reading data.

If we manipulate above values, messages will be fetched more frequently OR less frequently. But if  already 1000s of messages are lying in the topic, it wont matter a lot, given that fetch from topic to consumer is very quick(micro seconds). The consumer will still feel a steady supply of almost parallel messages.

Even playing around with commit offsets, will never block consumption of messages but only influence recovery, guaranteed delivery, once only delivery, etc. This will not be able to restrict super-fast, reading of messages by the consumer. (Its not like late acknowledgements, block, consumption of messages in traditional message brokers like JMS, MQ, etc)

The bottom line is that, the challenge of parallel consumption of messages in a kafka instance consumer, needs to be handled by the process/application and not by kafka configurations.

Consumer Side Technology

A much more important factor while dealing with throttling of kafka messages, is which tech/architecture you use, on the consumer side application.

Using Java / SpringBoot based consumer, will ensure that the consumer process runtime will be multi-thread-based and will be synchronous first or synchronous by default. The use of thread-pools in Java/Spring and given the fact that almost all calls in Java are synchronous by default, it is much easier to control the kafka message consumption in Java

Node JS based consumers are inherently single threaded an hence asynchronous first or asynchronous by default, using await, promises and callbacks in a way, that near-parallel consumption of kafka messages will be challenge, especially for less-experienced developers.


Monday, November 18, 2019

Process Mining - Practical Uses

Process mining is a family of techniques that support the analysis of business processes based on event logs. During process mining, specialized data mining algorithms are applied to event log data in order to identify trends, patterns and details contained in event logs, to auto-discover and paint process definitions of processes. Process mining can be used to discover processes from events or improve process efficiency by identifying bottlenecks and exceptions

By itself process mining has several applications in large enterprises, to discover or validate business processes (in finance, manufacturing and retail) merely from events and timestamps, but the exciting thing is that the same technique can be used to get insights into user/customer behaviors on your website and also aid in customer segmentation.

Consider a simplistic apache web log (though much more detailed application logs can also be used)
127.0.0.1 - peter [9/Feb/2017:10:34:12 -0700] "GET /login.html HTTP/2" 200 1479
127.0.0.1 - peter [9/Feb/2017:10:34:12 -0700] "GET /catalog.html HTTP/2" 200 1479
127.0.0.1 - peter [9/Feb/2017:10:34:12 -0700] "GET /checkout.html HTTP/2" 200 1479
127.0.0.1 - peter [9/Feb/2017:10:34:12 -0700] "GET /checkstatus.html HTTP/2" 200 1479

The important pieces of information are the user id "peter", the date timestamp and the URL visited which can proxy up for the user's intended action. The above log can be easily transformed into

Sample CSV file

<user>,<action>,<action date>, <user attribute 1>, <user attribute 2>, <user attribute n>
UserId,Action,StartTimeStamp,EndTimeStamp,PremiumCustomer,UserLocation,UserOccupation
Peter,Logged In,12-20-2017  10:20:00 AM,12-20-2017  10:25:00 AM,premium_customer,US,architect
Bob,Logged In,12-20-2017  10:20:00 AM,12-21-2017  10:25:00 AM,premium_customer,US,architect
Peter,Browsed Catalog,12-20-2017  10:35:00 AM,12-20-2017  10:40:00 AM,premium_customer,US,architect
Alice,Browsed Catalog,12-21-2017  10:35:00 AM,12-21-2017  10:40:00 AM,premium_customer,US,architect
Bob,Browsed Catalog,12-21-2017  10:35:00 AM,12-21-2017  10:40:00 AM,premium_customer,US,architect


The above CSV data can now to be fed into a process mining tool to generate the "user/customer's" interaction process, as he navigates and uses the website. This can give many insights into how customers are using our websites and what we can do to make things more convenient for them.


Statistics like what time duration is spent by each user in any of the activities can also be readily obtained as a report from the tool. The major deviations in the main process flow can also be denoted. Animations of the process flow, over a period of months can be played out within minutes to get insight into the activities that are bottle-necking the overall process.

Instead of discovering processes for individual users we can generalize processes for say "platinum customers" by filtering the data within the process mining tool itself.

Process Mining deals with Petrinet and BPMN notations of process definitions and some popular algorithms for process mining are alpha miner, inductive visual miner and fuzzy miner.


Sunday, November 10, 2019

Notes on Enterprise Application Development Strategy

At the end of the day, technology is an enabler for extracting business value. Some sure fire ways to burn money needlessly, during enterprise application development are noted below. Use it as a checklist to guard against, in your own enterprise development. As always, use your own context, while weighing-in, any generic advice.
  • Don't care why and what the end user actually wants. Its more important to impress top management, with a good-looking demo
  • Using a particular tech because it is popular at google, netflix or facebook, irrespective of its applicability within your own context, especially NFR context
  • No clarity about the WHY, obsessed with only the HOW
  • Not able to contain scope creep and needless escalation of complexity
  • First thought that comes up becomes the definitive design, no choices, no alternatives considered
  • Resume driven or Ego driven, development by tech leads
    • Why use a suitable tech, when a tougher, riskier, newer tech is at hand, no matter how irrelevant
  • Underestimating integration and dependencies
  • Committing to technology choices without understanding the full scope of functionality and the non-functional requirements(NFR) in detail
  • Forgetting that no application is an island. Its important that an application can "fit into" the enterprise landscape in terms of security standards, IDAM (identity and access management), audit standards, deployment, reliability and business continuity standards
  • No thoughts given for migration path from existing application/s
  • Disproportionate emphasis on UI look and feel rather than UX and information models
  • Re-invent the wheel, under the guise of tech bravado
  • Dev Teaming mistakes
    • Thinking that employing 10 mediocre developers, instead of 3 crack devs, will be better
    • Believe the myth that application devs, still write framework level code
    • Never acknowledge, that app devs now, require to use proper frameworks and APIs, rather than belt out algorithms and complex programs